Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

throw error on dimension drop in schema evolution. #4958

Merged
merged 1 commit into from
May 16, 2024

Conversation

DimitrisStaratzis
Copy link
Contributor

@DimitrisStaratzis DimitrisStaratzis commented May 9, 2024

Adding an error message when trying to drop a non existent attribute or a dimension with the Schema Evolution API. + Test

[sc-30774]


TYPE: BUG
DESC: Throw error on dimension drop in schema evolution.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validation check happens too late. This code has it happening when the operation runs, not when it's first specified.

Doing this correctly means implementing validation checks in the two places that the attributes_to_drop_ can gain new members:

  • In drop_attribute, which is connected to the C API.
  • In attribute_names_to_drop, which is used only by capnp serialization.

In addition, the validation check itself isn't quite right. See drop_attribute for a check that's not made in the new code. Furthermore, the validations for incremental changes through the API are not identical to those made with serialization, where we can validate everything together more clearly and efficiently than making the same checks incrementally.

I would be satisfied if we only tightened up drop_attribute in this PR. We can leave validation for capnp for later. That means at minimum three things:

  • A story to add the requisite validation checks for capnp.
  • Documentation of the invariants in the class members. The descriptions in the current docs for the relevant member variables are inadequate. Here are two of them; neither is currently documented explicitly in the code.
    • Each element in attributes_to_drop_ must be an attribute of the array schema
    • attributes_to_drop_ and the attributes to add must be disjoint
  • Validation in drop_attribute that the invariants are satisfied. This is where the validation check that this PR seeks to add belongs.

Note the ordering. Getting the code in place is lower priority than the others. If you have the others, the code will eventually get right, but not vice-versa.

auto evolution = ArraySchemaEvolution(ctx);

// try to drop dimension d1
evolution.drop_attribute("d1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the statement that should fail. Avoiding a failure here only defers detection of the problem until later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments Eric. I agree that the check should happen further up in drop_attribute() however the ArraySchema object (which is required for the use of has_attribute()) is not available in drop_attribute(). It is only passed as a parameter in ArraySchemaEvolution::array_evolve() in the form of the array_uri which is then used to create the schema object

Here is the current order:

Array::create(array_uri, schema);
auto evolution = ArraySchemaEvolution(ctx);
evolution.drop_attribute("d1");
evolution.array_evolve(array_uri));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ArraySchema object [...] is not available in drop_attribute()

Yes, I see that now. I didn't realize the whole design was broken in this particular way.

@@ -152,9 +152,12 @@ shared_ptr<ArraySchema> ArraySchemaEvolution::evolve_schema(
for (auto& attr_name : attributes_to_drop_) {
bool has_attr = false;
throw_if_not_ok(schema->has_attribute(attr_name, &has_attr));
if (has_attr) {
throw_if_not_ok(schema->drop_attribute(attr_name));
if (!has_attr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not quite right. The issue is with capnp deserialization. As with all input that comes in from the outside, here from the network, we need to validate inputs. We may not assume that a capnp message was generated by libtiledb necessarily; it might have been formed by other, likely malicious, software. We don't need to care why it was created, only that it's detected and rejected.

There's a check in drop_attribute to see if the attribute is the name of an attribute that was previously added. In this case, there's no check to see if the dropped attribute here also appears in the list of added attributes.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for expediency.

This code does not correct the validation issues with capnp.

@KiterLuc KiterLuc changed the title throw error on dimension drop in schema evolution throw error on dimension drop in schema evolution. May 15, 2024
@DimitrisStaratzis DimitrisStaratzis force-pushed the dstara/don't_allow_dimension_drop_on_schema_evol branch from 5d3b2b4 to 856329e Compare May 15, 2024 17:10
@KiterLuc KiterLuc merged commit c28e084 into dev May 16, 2024
60 checks passed
@KiterLuc KiterLuc deleted the dstara/don't_allow_dimension_drop_on_schema_evol branch May 16, 2024 05:59
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants