Releases: dask-contrib/dask-deltatable
Releases · dask-contrib/dask-deltatable
Dask-deltatable v0.3.3
Fix imports to work with deltalake 0.20 #84
Dask-deltatable v0.3.2
Bug fix
Fix the arguments order in method to_deltalake called in the example (#77)
Fix up some mypy errors (#76)
Sort filenames (#71)
Fix mypy
error (#62)
Project hygiene
Synchronise with dask-expr, newer Dask and newer deltalake (#69)
Support auto-setting AWS credentials for storage options (#78)
Compatibility with latest dask, pyarrow and deltalake (#68)
Add path to tokenization (#67)
Clarify readme for reading in deltalake (#66)
Add conda installation instructions to README (#6)
Add URL to setuptools metadata (#60)
Dask-deltatable v0.3.1
This version contains a patch that fixes a problem when reading datasets on a distributed cluster.
Dask-deltatable v0.3
New Features and Enhancements
- More efficient Dask Graph generation (#24)
- Transactional write support for append-only write operations with
to_deltalake
(#29) - Reader now supports partition pruning to only load files that match the provided filters (#30)
- DAT reader acceptance testing against spark generated data (#47)
Breaking changes
- Removed
vaccum_table
(#16) andhistory
(#17) commands. Instead, please use nativedelta-rs
functionality, see https://delta-io.github.io/delta-rs/python/usage.html#vacuuming-tables and https://delta-io.github.io/delta-rs/python/usage.html#history - Minimal supported python version is now 3.9
- Renamed
read_delta_table
toread_deltatable
Dask and delta-rs integeration
This release builds a wrapper around the Rust package called delta-rs
and uses dask for parallel reading.
Features:
- Reads the parquet files based on delta logs parallelly using the dask engine
- Supports all three filesystems like s3, azurefs, gcsfs
- Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters
- row filter
- partition filter
- Query Delta commit info - History
- vacuum the old/ unused parquet files
- load different versions of data using DateTime.
DeltaTable reader using Dask
DeltaTable reader using Dask
- Reads delta table parallelly using dask
- As an Ability to read from different filesystems like S3, Azurefs, gcsfs.
- Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters like row and partition filters.