WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Explode returns null on empty or NULL lists #5752

@srilman

Description

@srilman

Describe the bug

Take the following example:

import daft
assert daft.from_pydict({"A": [[], None, [1]]}).explode("A").to_pydict()["A"] == [None, None, 1]

While Polars exhibits similar behavior:

import polars as pl
assert pl.from_dict({"A": [[], None, [1]]}).explode("A").to_dict()["A"].to_list() == [None, None, 1]

PyArrow does not:

import pyarrow as pa
assert pa.array([[], None, [1]]).flatten().tolist() == [1]

Nor does DuckDB:

import duckdb
assert duckdb.sql("SELECT UNNEST(A) FROM (VALUES ([]::INT[]), (NULL), ([1]::INT[])) AS t(A);").to_df()["unnest(A)"].to_list() == [1]

Nor does PySpark (although don't have a repro for it).

What behavior should we support?

To Reproduce

No response

Expected behavior

The Polars behavior, while uncommon, allows us to partially push a limit before explodes since we can guarantee that every row will produce at least 1 row. However, what is the expected behavior.

Component(s)

Other

Additional context

Related to #5292

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions