Skip to content

physical-expr: Nullability of Literal is not determined by surrounding context #15394

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
rkrishn7 opened this issue Mar 24, 2025 · 1 comment · May be fixed by #15603
Open

physical-expr: Nullability of Literal is not determined by surrounding context #15394

rkrishn7 opened this issue Mar 24, 2025 · 1 comment · May be fixed by #15603
Labels
bug Something isn't working

Comments

@rkrishn7
Copy link
Contributor

Describe the bug

It was found in #15242 that the nullability of Literal is simply based on its scalar value and does not take its nullability in the input schema into account.

To Reproduce

This issue cannot be seen in the CLI, as it doesn't do any strict type checking of record batches when displaying results. However, it can be observed in SLTs where normalize::convert_batches ensures that each record batch's schema is the same as that of the first batch.

Take the following query for example:

select x, y, z from t3 union all by name select z, y, x, 'd' as zz from t3;

UNION is a case where record batch streams from each input will be merged into an output stream.

zz should be considered as nullable in the output schema because it is not found in the lhs. However, the current behavior is that record batches streamed from the rhs do not have zz as null because of the current implementation of PhysicalExpr::nullable for Literal.

Expected behavior

The nullability of Literal should be determined by its surrounding context and value, not only its value.

Additional context

No response

@rkrishn7 rkrishn7 added the bug Something isn't working label Mar 24, 2025
@Omega359
Copy link
Contributor

I've been trying to find where to resolve this issue but my understanding of the core of DF is currently too limited to uncover a solution. I've created a test though that exhibits the issue

#[tokio::test]
async fn union_by_name_literal_is_null_and_not_null() -> Result<()> {
    let str_array_1 = StringArray::from(vec![Some("a1")]);
    let str_array_2 = StringArray::from(vec![Some("a2")]);
    let str_array_3 = StringArray::from(vec![Some("b1")]);

    let batch_1 = RecordBatch::try_from_iter(vec![
        ("a", Arc::new(str_array_1) as ArrayRef),
    ])?;
    let batch_2 = RecordBatch::try_from_iter(vec![
        ("a", Arc::new(str_array_2) as ArrayRef),
        ("b", Arc::new(str_array_3) as ArrayRef),
    ])?;

    let ctx = SessionContext::new();
    ctx.register_batch("union_batch_1", batch_1)?;
    ctx.register_batch("union_batch_2", batch_2)?;

    let df1 = ctx.table("union_batch_1").await?;
    let df2 = ctx.table("union_batch_2").await?;

    let batches = df2.union_by_name(df1)?.collect().await?;
    let schema = batches[0].schema();

    for batch in batches {
        // Verify schema is the same for all batches
        if !schema.contains(&batch.schema()) {
            return Err(DataFusionError::Internal(
                format!(
                    "Schema mismatch. Previously had\n{:#?}\n\nGot:\n{:#?}",
                    &schema,
                    batch.schema()
                ),
            ));
        }
    }

    Ok(())
}

Above is based upon #15489

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants