Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support reading tables with type widening in default engine #335

Merged

Conversation

johanl-db
Copy link
Contributor

@johanl-db johanl-db commented Sep 13, 2024

Type widening allows changing the type of an existing column in a Delta table to a wider type.
See https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md

This change improves the default engine to enable upcasting conversions on read that are required to read tables that have a type change applied: data files written before the widening type change will contain data stored using a narrower type that must be upcast to the wider type in the current table schema.

Changes:

  • Allow conversions:
    • int8 -> int16 -> int32 -> int64
    • float32 -> float64
    • date -> timestamp without timezone
    • integers -> float64
    • integers and decimals -> decimals, as long as the target precision and scale are sufficient to store all values without losing precision

This doesn't cover casting inside nested struct/array/map, which doesn't seem to be properly supported yet.

Testing:

  • Added tests reading a table with various type changes applied, generated using delta spark 4.0 preview.
  • Added unit tests for can_upcast_to_decimal, returning whether a given type can be upcast to a decimal.

@johanl-db johanl-db force-pushed the type-widening-default-parquet-reader branch from 0ee9940 to ec8600e Compare September 13, 2024 14:26
Copy link

codecov bot commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 98.55072% with 1 line in your changes missing coverage. Please review.

Project coverage is 74.09%. Comparing base (ed1919a) to head (9230d4b).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/arrow_utils.rs 98.46% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #335      +/-   ##
==========================================
+ Coverage   73.86%   74.09%   +0.23%     
==========================================
  Files          43       43              
  Lines        8078     8154      +76     
  Branches     8078     8154      +76     
==========================================
+ Hits         5967     6042      +75     
- Misses       1732     1733       +1     
  Partials      379      379              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely surgical -- just extend the cast compat check and let the existing casting machinery do its job.

kernel/src/engine/arrow_utils.rs Outdated Show resolved Hide resolved
kernel/src/features/mod.rs Show resolved Hide resolved
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with one nit. thanks!

kernel/src/engine/arrow_utils.rs Outdated Show resolved Hide resolved
@zachschuermann zachschuermann merged commit 1a66fb9 into delta-io:main Sep 19, 2024
12 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants