-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-51903][SQL] Validate data on adding a CHECK constraint #50839
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: master
Are you sure you want to change the base?
Conversation
cc @aokolnychyi |
* The logical plan of the ALTER TABLE ... ADD CONSTRAINT command for Check constraints. | ||
*/ | ||
case class AddCheckConstraint( | ||
override val table: LogicalPlan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not really a table, but a validation query plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either not extend AlterTableCommand
for AddCheckConstraint
, or have both the table
and validationPlan
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code is fragile as we have code like the following
case a: AlterTableCommand if a.table.resolved && hasUnresolvedFieldName(a) =>
val table = a.table.asInstanceOf[ResolvedTable]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
The code in CheckAnalysis
looks a bit ugly now.
Do we consider remove the final
from
trait AlterTableCommand extends UnaryCommand {
def changes: Seq[TableChange]
def table: LogicalPlan
final override def child: LogicalPlan = table
}
@@ -4120,6 +4120,12 @@ | |||
], | |||
"sqlState" : "07501" | |||
}, | |||
"NEW_CHECK_CONSTRAINT_VIOLATION" : { | |||
"message" : [ | |||
"The new check constraint (<expression>) is violated by the existing data in the table <tableName>." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would reword to something like The check constraint (<expression>) cannot be added because ...
.
case _ => | ||
null | ||
} | ||
Seq(TableChange.addConstraint(constraint, validatedTableVersion)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we set validation status to VALIDATED?
What changes were proposed in this pull request?
This PR adds support for enforcing CHECK constraints when executing the
ALTER TABLE ... ADD CONSTRAINT
statement on V2 tables. Specifically, before adding a new CHECK constraint, Spark will validate the constraint against existing table data to ensure data integrity and compliance with the constraint definition.Why are the changes needed?
CHECK constraint enforcement in the
ALTER TABLE ... ADD CONSTRAINT
statement is part of the ANSI SQL standard. Supporting this feature ensures Spark's compliance with ANSI SQL and enhances data integrity by preventing invalid constraints from being added to existing tables.Does this PR introduce any user-facing change?
No
How was this patch tested?
New UT
Was this patch authored or co-authored using generative AI tooling?
No