Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix array latest schema selection for same MS timestamps schemas. #5143

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

KiterLuc
Copy link
Contributor

@KiterLuc KiterLuc commented Jul 1, 2024

PR #4549 attempted to fix the latest array schema selection. Unfortunately, it didn't take into account that some array schemas might have the same timestamps. This change uses the fact that array schema URIs are lexicographically sorted (names generated in the same timestamps use extra sorting bits in the UUID) to chose the latest array URI whilst keeping the original intent of #4549 and checking against the end timestamp.

[sc-49806]


TYPE: BUG
DESC: Fix array latest schema selection for same MS timestamps schemas.

PR #4549 attempted to fix the latest array schema selection. Unfortunately, it didn't take into account that some array schemas might have the same timestamps. This change uses the fact that array schema URIs are lexicographically sorted (names generated in the same timestamps use extra sorting bits in the UUID) to chose the latest array URI whilst keeping the original intent of #4549 and checking against the end timestamp.

---
TYPE: BUG
DESC: Fix array latest schema selection for same MS timestamps schemas.
Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to indicate that we are missing explicit test coverage for schema evolution within the same timestamp -- please make sure that is added in follow-up.

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Jul 2, 2024

Just FYI I recently 'adopted' a CRAN package wrapping ULID (via a small C++ library which actually needed a small patch to work correctly with milliseconds via std::chrono, see here) but once you use that you get nicely sorted identifiers, and they can be unmarshaled back too. Resolution is millies as we do:

> ids <- ulid::ulid(3)
> ids
[1] "01J1S4AWSGXJXSMCPB8XMMBBVP" "01J1S4AWSGS2RHQQRBAZAKK264" "01J1S4AWSGGZQXRH38HV3GBJNB"
> ulid::unmarshal(ids)
                   ts              rnd
1 2024-07-02 02:07:52 XJXSMCPB8XMMBBVP
2 2024-07-02 02:07:52 S2RHQQRBAZAKK264
3 2024-07-02 02:07:52 GZQXRH38HV3GBJNB
> 
> ulid::unmarshal(ids)[,1]     # show milliseconds
[1] "2024-07-02 02:07:52.496 CDT" "2024-07-02 02:07:52.496 CDT"
[3] "2024-07-02 02:07:52.496 CDT"
> 

It is based on a pretty small MIT licensed library if you want to poke.

@KiterLuc
Copy link
Contributor Author

KiterLuc commented Jul 2, 2024

This seems to indicate that we are missing explicit test coverage for schema evolution within the same timestamp -- please make sure that is added in follow-up.

This fix is coming from tests failing in the nightlies.

@KiterLuc KiterLuc merged commit e4035af into dev Jul 2, 2024
62 of 63 checks passed
@KiterLuc KiterLuc deleted the lr/array-schema-latest-same-ms/ch49806 branch July 2, 2024 07:54
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants