Skip to content

[SPARK-51903][SQL] Validate data on adding a CHECK constraint #50839

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

This PR adds support for enforcing CHECK constraints when executing the ALTER TABLE ... ADD CONSTRAINT statement on V2 tables. Specifically, before adding a new CHECK constraint, Spark will validate the constraint against existing table data to ensure data integrity and compliance with the constraint definition.

Why are the changes needed?

CHECK constraint enforcement in the ALTER TABLE ... ADD CONSTRAINT statement is part of the ANSI SQL standard. Supporting this feature ensures Spark's compliance with ANSI SQL and enhances data integrity by preventing invalid constraints from being added to existing tables.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New UT

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label May 9, 2025
@gengliangwang gengliangwang requested review from viirya and cloud-fan May 9, 2025 00:10
@gengliangwang
Copy link
Member Author

cc @aokolnychyi

* The logical plan of the ALTER TABLE ... ADD CONSTRAINT command for Check constraints.
*/
case class AddCheckConstraint(
override val table: LogicalPlan,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not really a table, but a validation query plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either not extend AlterTableCommand for AddCheckConstraint, or have both the table and validationPlan fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is fragile as we have code like the following

      case a: AlterTableCommand if a.table.resolved && hasUnresolvedFieldName(a) =>
        val table = a.table.asInstanceOf[ResolvedTable]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.
The code in CheckAnalysis looks a bit ugly now.
Do we consider remove the final from

trait AlterTableCommand extends UnaryCommand {
  def changes: Seq[TableChange]
  def table: LogicalPlan
  final override def child: LogicalPlan = table
}

@@ -4120,6 +4120,12 @@
],
"sqlState" : "07501"
},
"NEW_CHECK_CONSTRAINT_VIOLATION" : {
"message" : [
"The new check constraint (<expression>) is violated by the existing data in the table <tableName>."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reword to something like The check constraint (<expression>) cannot be added because ....

case _ =>
null
}
Seq(TableChange.addConstraint(constraint, validatedTableVersion))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we set validation status to VALIDATED?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants