Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use storageAssighmentPolicy for casts in Delta DML commands #1938

Conversation

olaky
Copy link
Contributor

@olaky olaky commented Jul 25, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Follow spark.sql.storeAssignmentPolicy instead of spark.sql.ansi.enabled for casting behaviour in Delta UPDATE and MERGE. This will by default error out at runtime when an overflow happens because the default for storeAssignmentPolicy ins ANSI while ANSI is not enabled by default.

How was this patch tested?

Tests for casting behavior added

Does this PR introduce any user-facing changes?

The UPDATE and MERGE command will error out by default if a cast that is on the write path overflows. The spark.databricks.delta.updateMergeLegacyCasting.enabled can be used to revert to the old behaviour.

}
}

case class CheckOverflowInTableWrite(child: Expression, columnName: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be almost the same as CheckOverflowInTableInsert in Spark, could we inherit it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was indeed looking into it, but there are some problems with that

  • This case is a case class and CheckOverflowInTableInsert is one as well, so Scala does not like that
  • I cannot refactor CheckOverflowInTableInsert because it is part of the OSS Spark imports. This also inhibits me from making methods like getCast protected or the error throwing function configurable
  • Even if I could overwrite methods, with doGenCodeWithBetterErrorMsg and eval I would have to overwrite the main ones so most of the code would not be shared (this is more of a minor thing though)

@tdas
Copy link
Contributor

tdas commented Jul 26, 2023

Oh wow! We were letting overflows happen silently. This is bad. So glad you caught it and fixed it. This is definitely a tiny bit breaking change...but for the better. I really think we should get this into Delta 3.0 since major releases are good opportunities to do such breaking changes.

vkorukanti pushed a commit that referenced this pull request Aug 7, 2023
This is a follow up to #1938 that fixes the Delta version mentioned in the error message and adds more tests for the implicit casting behaviour in MERGE and UPDATE

Closes #1944

GitOrigin-RevId: afed6bb66c1832120731b9633fe9dcbbcbdb18d7
vkorukanti pushed a commit to vkorukanti/delta that referenced this pull request Sep 6, 2023
delta-io#1938 changed the casting behavior in MERGE and UPDATE to follow the value of the `storeAssignmentPolicy` config instead of the `ansiEnabled` one, making the behavior consistent with INSERT.

This change breaks MERGE and UPDATE operations that contain a cast when `storeAssignmentPolicy` is set to `STRICT`, throwing an internal error during analysis.

The cause is the `UpCast` expression(s) added by `PreprocessTableMerge` and `PreprocessTableUpdate`  when processing assignments. `UpCast` is meant to be replaced by a regular after performing checks by the `ResolveUpCast` rule that runs during the resolution phase **before** `PreprocessTableMerge` and `PreprocessTableUpdate` introduce the expression.

The fix is to run the `ResolveUpCast` rule once more after `PreprocessTableMerge` and `PreprocessTableUpdate`  have run.

Missing tests covering cast behavior for the different values of`storeAssignmentPolicy` for UPDATE and MERGE are added, covering:
- Invalid implicit cast (string -> boolean), valid implicit cast (string -> int), upcast (int -> long)
- UPDATE, MERGE
- storeAssignmentPolicy = LEGACY, ANSI, STRICT

Closes delta-io#1998

GitOrigin-RevId: 5b18006b9cb8efa522cda0a4cfa6e981a0b40d28
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants