Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Rest support for current domain. #5136

Merged
merged 6 commits into from
Jul 9, 2024
Merged

Conversation

robertbindar
Copy link
Contributor

@robertbindar robertbindar commented Jun 21, 2024

This PR adds REST serialization support for TileDB Current Domain and expansion via schema evolution.

[sc-48253]


TYPE: FEATURE
DESC: Rest support for current domain.

Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned offline, I think the CurrentDomain::type capnproto field is redundant, given that capnproto already encodes which union field is being used. I made some suggestions that might obviate its need.

Comment on lines 157 to 160
auto crd = array_schema_evolution->current_domain_to_expand();
auto current_domain_builder =
array_schema_evolution_builder->initCurrentDomainToExpand();
current_domain_to_capnp(crd, &current_domain_builder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if the schema evolution does not expand the current domain? Should we call initCurrentDomainToExpand only then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch, thanks!

tiledb/sm/serialization/current_domain.h Outdated Show resolved Hide resolved
tiledb/sm/serialization/current_domain.h Outdated Show resolved Hide resolved

switch (crd->type()) {
case CurrentDomainType::NDRECTANGLE: {
current_domain_builder->setType(current_domain_type_str(crd->type()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
current_domain_builder->setType(current_domain_type_str(crd->type()));

Comment on lines +92 to +104
auto type = current_domain_type_enum(current_domain_reader.getType().cStr());
switch (type) {
case CurrentDomainType::NDRECTANGLE: {
if (!current_domain_reader.isNdRectangle()) {
throw std::runtime_error(
"The current domain to deserialize has an unexpected type field "
"given the union type");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto type = current_domain_type_enum(current_domain_reader.getType().cStr());
switch (type) {
case CurrentDomainType::NDRECTANGLE: {
if (!current_domain_reader.isNdRectangle()) {
throw std::runtime_error(
"The current domain to deserialize has an unexpected type field "
"given the union type");
}
switch (current_domain_reader.which()) {
case CurrentDomain::Which::ND_RECTANGLE: {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as is so we can merge this PR today.

Comment on lines +1348 to +1353
type @1 :Text;
# The type of CurrentDomain (e.g. NDRECTANGLE)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type @1 :Text;
# The type of CurrentDomain (e.g. NDRECTANGLE)

# The type of CurrentDomain (e.g. NDRECTANGLE)

union {
emptyCurrentDomain @2: Void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

tiledb/sm/serialization/current_domain.cc Show resolved Hide resolved

#ifdef TILEDB_SERIALIZATION

void current_domain_to_capnp(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to remove all shared_ptr from this file since we don't need to share ownership so we don't need to pay the performance cost or confuse the reader. We can pass const references or raw pointers in to_capnp functions and return actual objects in from_capnp functions. (make_shared and memory tracker can just move to the caller).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment below about performance concerns in this serialization code paths also about this being a bit more tedious given copy and move constructors being disabled for these classes. Overall I don't think it's worth doing it, I'll do it only if you insist 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine here as the resulting object ends up storing the current domain as a shared_ptr in the array schema.

shared_ptr<NDRectangle> ndr,
capnp::NDRectangle::Builder* ndrectangle_builder) {
auto& ranges = ndr->get_ndranges();
if (!ranges.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it's empty? is that an error case, should we throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty NDRectangle set on a current domain is incorrect behavior, this is caught during Array::create. You are correct though, we can catch it early before doing the REST request, I changed the code and added a rest test check for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this and treated it like a bug in NDRectangle. The constructors now throw if the ranges vector passed is empty or if the domain has no dimensions. This way we should never get an empty ranges vector here.

}

auto type = current_domain_type_enum(current_domain_reader.getType().cStr());
switch (type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not an expert on Capnp unions but is there a reason to have the type stored ? can't we just tell from the union member that was set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a common comment below for you and @teo-tsirpanis to review.

shared_ptr<MemoryTracker> memory_tracker) {
auto version = current_domain_reader.getVersion();
if (current_domain_reader.isEmptyCurrentDomain()) {
return make_shared<CurrentDomain>(HERE(), memory_tracker, version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as suggested in the general comments wrt shared pointers, here we can just return CurrentDomain and construct the pointer on the caller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  DISABLE_COPY(CurrentDomain);
  DISABLE_MOVE(CurrentDomain);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, we are going to store in the schema in a shared_ptr so doing it here is fine.

@@ -1150,6 +1153,13 @@ shared_ptr<ArraySchema> array_schema_from_capnp(
name = schema_reader.getName().cStr();
}

auto crd = make_shared<CurrentDomain>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd like this to be in an else case, otherwise if there's a serialized domain you construct twice and just throw away the first one.
But IMO this "empty"/default object construction should be happening inside the ArraySchema constructor itself, if the caller is passing an uninitialized shared_ptr (aka nullptr). Otherwise we leave the door open to someone not knowing about this and just passing a nullptr for current_domain when constructing an ArraySchema opening the door to segfaults.. (didn't see any checks for null current_domain in ArraySchema class)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an easy change, I adjusted the code. In general, I'm shooting for code simplicity in this area, unless I'm doing something with disk access that could be in the same order of magnitude with the wire access, I'm not even thinking of performance.
About the empty current domain construction, there is another constructor for ArraySchema that does just that, this particular one is used [only]? in serialization and assumes we pass the correct things, the argument with passing nullptr by mistake can be extended to all other shared_ptr args we pass to this constructor.

@@ -138,6 +138,14 @@ class NDRectangle {
return domain_;
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this more robust:

  • Add checks before every usage of domain_ in ndrectangle.cc that the pointer is not null.
  • Add comment on domain_ declaration that it can be null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

TEST_CASE_METHOD(
RestCurrentDomainFx,
"C API: Current Domain basic behavior",
"[current_domain][rest]") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for those tests!!

@robertbindar
Copy link
Contributor Author

robertbindar commented Jul 8, 2024

@ypatia @teo-tsirpanis thanks a lot for the great reviews! I left comments inline above.
Regarding passing the type in capnp, I chose to do it this way because the switch case on the type will be identical to the other switch cases from sm::CurrentDomain, sm::NDRectangle, etc, and then I can go and add an apply_with_type like function that will keep the switch on shape type in a single place.
Using the union type will just add a different switch that we'll need to maintain. I don't think this is a big deal either way, if any of you has a strong preference on removing the type, I'll just go and do it.

@robertbindar robertbindar force-pushed the rbin/ch48253/current_domain_rest branch from 0218ee7 to fe6b894 Compare July 8, 2024 10:37
tiledb_dimension_t* d1;
int rc = tiledb_dimension_alloc(
ctx_c_, "d1", TILEDB_UINT64, &dim_domain[0], &tile_extents[0], &d1);
CHECK(rc == TILEDB_OK);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a dimension with string type here?

@@ -1833,6 +1833,8 @@ void ArraySchema::expand_current_domain(
new_current_domain->check_schema_sanity(this->shared_domain());

current_domain_ = new_current_domain;

current_domain_->ndrectangle()->set_domain(this->shared_domain());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment as to why we need this, that we don't serialize the domain etc.

* Array Schema domain. This can be nullptr during
* array schema evolution on REST when we construct with a nullptr domain_
* and set it later during ArraySchema::expand_current_domain to avoid
* serializing the domain on the evolution object */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* serializing the domain on the evolution object */
* serializing the domain on the evolution object
*/


#ifdef TILEDB_SERIALIZATION

void current_domain_to_capnp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine here as the resulting object ends up storing the current domain as a shared_ptr in the array schema.

shared_ptr<MemoryTracker> memory_tracker) {
auto version = current_domain_reader.getVersion();
if (current_domain_reader.isEmptyCurrentDomain()) {
return make_shared<CurrentDomain>(HERE(), memory_tracker, version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, we are going to store in the schema in a shared_ptr so doing it here is fine.

Comment on lines +92 to +104
auto type = current_domain_type_enum(current_domain_reader.getType().cStr());
switch (type) {
case CurrentDomainType::NDRECTANGLE: {
if (!current_domain_reader.isNdRectangle()) {
throw std::runtime_error(
"The current domain to deserialize has an unexpected type field "
"given the union type");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as is so we can merge this PR today.

@teo-tsirpanis
Copy link
Member

Re. the explicit type field in capnp, it's a question of whether we want to expose an idiomatic capnp schema or match the Core's internals as closely as possible.

Generally I am a big proponent of having clean data representation boundaries between software layers and would always choose the former, but I can understand the practical considerations (in this case not as much the code variation but the time pressure to have this PR merged) that would lead us to pick the latter option, and I am fine with either option.

Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@KiterLuc KiterLuc closed this Jul 8, 2024
@KiterLuc KiterLuc reopened this Jul 8, 2024
@KiterLuc KiterLuc force-pushed the rbin/ch48253/current_domain_rest branch from 7efc109 to 175f52e Compare July 8, 2024 20:02
@KiterLuc KiterLuc changed the title Rest support for current domain Rest support for current domain. Jul 9, 2024
@KiterLuc KiterLuc merged commit acca149 into dev Jul 9, 2024
63 checks passed
@KiterLuc KiterLuc deleted the rbin/ch48253/current_domain_rest branch July 9, 2024 08:43
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants