-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Rest support for current domain. #5136
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned offline, I think the CurrentDomain::type
capnproto field is redundant, given that capnproto already encodes which union field is being used. I made some suggestions that might obviate its need.
auto crd = array_schema_evolution->current_domain_to_expand(); | ||
auto current_domain_builder = | ||
array_schema_evolution_builder->initCurrentDomainToExpand(); | ||
current_domain_to_capnp(crd, ¤t_domain_builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if the schema evolution does not expand the current domain? Should we call initCurrentDomainToExpand
only then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great catch, thanks!
|
||
switch (crd->type()) { | ||
case CurrentDomainType::NDRECTANGLE: { | ||
current_domain_builder->setType(current_domain_type_str(crd->type())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current_domain_builder->setType(current_domain_type_str(crd->type())); |
auto type = current_domain_type_enum(current_domain_reader.getType().cStr()); | ||
switch (type) { | ||
case CurrentDomainType::NDRECTANGLE: { | ||
if (!current_domain_reader.isNdRectangle()) { | ||
throw std::runtime_error( | ||
"The current domain to deserialize has an unexpected type field " | ||
"given the union type"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto type = current_domain_type_enum(current_domain_reader.getType().cStr()); | |
switch (type) { | |
case CurrentDomainType::NDRECTANGLE: { | |
if (!current_domain_reader.isNdRectangle()) { | |
throw std::runtime_error( | |
"The current domain to deserialize has an unexpected type field " | |
"given the union type"); | |
} | |
switch (current_domain_reader.which()) { | |
case CurrentDomain::Which::ND_RECTANGLE: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this as is so we can merge this PR today.
type @1 :Text; | ||
# The type of CurrentDomain (e.g. NDRECTANGLE) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type @1 :Text; | |
# The type of CurrentDomain (e.g. NDRECTANGLE) |
# The type of CurrentDomain (e.g. NDRECTANGLE) | ||
|
||
union { | ||
emptyCurrentDomain @2: Void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
|
||
#ifdef TILEDB_SERIALIZATION | ||
|
||
void current_domain_to_capnp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to remove all shared_ptr
from this file since we don't need to share ownership so we don't need to pay the performance cost or confuse the reader. We can pass const
references or raw pointers in to_capnp
functions and return actual objects in from_capnp
functions. (make_shared
and memory tracker can just move to the caller).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment below about performance concerns in this serialization code paths also about this being a bit more tedious given copy and move constructors being disabled for these classes. Overall I don't think it's worth doing it, I'll do it only if you insist 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine here as the resulting object ends up storing the current domain as a shared_ptr in the array schema.
shared_ptr<NDRectangle> ndr, | ||
capnp::NDRectangle::Builder* ndrectangle_builder) { | ||
auto& ranges = ndr->get_ndranges(); | ||
if (!ranges.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it's empty? is that an error case, should we throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty NDRectangle set on a current domain is incorrect behavior, this is caught during Array::create
. You are correct though, we can catch it early before doing the REST request, I changed the code and added a rest test check for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this and treated it like a bug in NDRectangle. The constructors now throw if the ranges vector passed is empty or if the domain has no dimensions. This way we should never get an empty ranges vector here.
} | ||
|
||
auto type = current_domain_type_enum(current_domain_reader.getType().cStr()); | ||
switch (type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an expert on Capnp unions but is there a reason to have the type stored ? can't we just tell from the union member that was set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a common comment below for you and @teo-tsirpanis to review.
shared_ptr<MemoryTracker> memory_tracker) { | ||
auto version = current_domain_reader.getVersion(); | ||
if (current_domain_reader.isEmptyCurrentDomain()) { | ||
return make_shared<CurrentDomain>(HERE(), memory_tracker, version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as suggested in the general comments wrt shared pointers, here we can just return CurrentDomain
and construct the pointer on the caller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DISABLE_COPY(CurrentDomain);
DISABLE_MOVE(CurrentDomain);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, we are going to store in the schema in a shared_ptr so doing it here is fine.
@@ -1150,6 +1153,13 @@ shared_ptr<ArraySchema> array_schema_from_capnp( | |||
name = schema_reader.getName().cStr(); | |||
} | |||
|
|||
auto crd = make_shared<CurrentDomain>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'd like this to be in an else
case, otherwise if there's a serialized domain you construct twice and just throw away the first one.
But IMO this "empty"/default object construction should be happening inside the ArraySchema
constructor itself, if the caller is passing an uninitialized shared_ptr
(aka nullptr
). Otherwise we leave the door open to someone not knowing about this and just passing a nullptr
for current_domain
when constructing an ArraySchema
opening the door to segfaults.. (didn't see any checks for null current_domain
in ArraySchema
class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an easy change, I adjusted the code. In general, I'm shooting for code simplicity in this area, unless I'm doing something with disk access that could be in the same order of magnitude with the wire access, I'm not even thinking of performance.
About the empty current domain construction, there is another constructor for ArraySchema
that does just that, this particular one is used [only]? in serialization and assumes we pass the correct things, the argument with passing nullptr by mistake can be extended to all other shared_ptr
args we pass to this constructor.
@@ -138,6 +138,14 @@ class NDRectangle { | |||
return domain_; | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this more robust:
- Add checks before every usage of
domain_
inndrectangle.cc
that the pointer is not null. - Add comment on
domain_
declaration that it can be null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
TEST_CASE_METHOD( | ||
RestCurrentDomainFx, | ||
"C API: Current Domain basic behavior", | ||
"[current_domain][rest]") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those tests!!
@ypatia @teo-tsirpanis thanks a lot for the great reviews! I left comments inline above. |
0218ee7
to
fe6b894
Compare
tiledb_dimension_t* d1; | ||
int rc = tiledb_dimension_alloc( | ||
ctx_c_, "d1", TILEDB_UINT64, &dim_domain[0], &tile_extents[0], &d1); | ||
CHECK(rc == TILEDB_OK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a dimension with string type here?
@@ -1833,6 +1833,8 @@ void ArraySchema::expand_current_domain( | |||
new_current_domain->check_schema_sanity(this->shared_domain()); | |||
|
|||
current_domain_ = new_current_domain; | |||
|
|||
current_domain_->ndrectangle()->set_domain(this->shared_domain()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment as to why we need this, that we don't serialize the domain etc.
* Array Schema domain. This can be nullptr during | ||
* array schema evolution on REST when we construct with a nullptr domain_ | ||
* and set it later during ArraySchema::expand_current_domain to avoid | ||
* serializing the domain on the evolution object */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* serializing the domain on the evolution object */ | |
* serializing the domain on the evolution object | |
*/ |
|
||
#ifdef TILEDB_SERIALIZATION | ||
|
||
void current_domain_to_capnp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine here as the resulting object ends up storing the current domain as a shared_ptr in the array schema.
shared_ptr<MemoryTracker> memory_tracker) { | ||
auto version = current_domain_reader.getVersion(); | ||
if (current_domain_reader.isEmptyCurrentDomain()) { | ||
return make_shared<CurrentDomain>(HERE(), memory_tracker, version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, we are going to store in the schema in a shared_ptr so doing it here is fine.
auto type = current_domain_type_enum(current_domain_reader.getType().cStr()); | ||
switch (type) { | ||
case CurrentDomainType::NDRECTANGLE: { | ||
if (!current_domain_reader.isNdRectangle()) { | ||
throw std::runtime_error( | ||
"The current domain to deserialize has an unexpected type field " | ||
"given the union type"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this as is so we can merge this PR today.
Re. the explicit type field in capnp, it's a question of whether we want to expose an idiomatic capnp schema or match the Core's internals as closely as possible. Generally I am a big proponent of having clean data representation boundaries between software layers and would always choose the former, but I can understand the practical considerations (in this case not as much the code variation but the time pressure to have this PR merged) that would lead us to pick the latter option, and I am fine with either option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
7efc109
to
175f52e
Compare
This PR adds REST serialization support for TileDB Current Domain and expansion via schema evolution.
[sc-48253]
TYPE: FEATURE
DESC: Rest support for current domain.