-
Notifications
You must be signed in to change notification settings - Fork 354
Open
Labels
Description
Describe the bug
Take the following example:
import daft
assert daft.from_pydict({"A": [[], None, [1]]}).explode("A").to_pydict()["A"] == [None, None, 1]While Polars exhibits similar behavior:
import polars as pl
assert pl.from_dict({"A": [[], None, [1]]}).explode("A").to_dict()["A"].to_list() == [None, None, 1]PyArrow does not:
import pyarrow as pa
assert pa.array([[], None, [1]]).flatten().tolist() == [1]Nor does DuckDB:
import duckdb
assert duckdb.sql("SELECT UNNEST(A) FROM (VALUES ([]::INT[]), (NULL), ([1]::INT[])) AS t(A);").to_df()["unnest(A)"].to_list() == [1]Nor does PySpark (although don't have a repro for it).
What behavior should we support?
To Reproduce
No response
Expected behavior
The Polars behavior, while uncommon, allows us to partially push a limit before explodes since we can guarantee that every row will produce at least 1 row. However, what is the expected behavior.
Component(s)
Other
Additional context
Related to #5292