-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Tracking issues of OpenDAL API changes #356
Comments
cc @Fokko @liurenjie1024 @ZENOTME to take a look. |
I agree that we should refactor the Input/Output API for best performance, but we should be careful not to expose opendal's api to end user. |
Yep, I have a plan to refactor the I have met a use case that users will parse storage related input by themselves and don't uses iceberg's own args. |
I'm ok with this change, but it's breaking. Let's see what others think. cc @Fokko @sdd @marvinlanhenke @tustvold |
I'm not seeing a concrete proposal on which to comment, but the broad theme of moving closer to the actual object store APIs is IMO a good idea. That being said, I feel I should point out that such an abstraction already exists, is natively supported by both the Rust arrow and Datafusion projects, and already has OpenDAL bindings, namely object_store... Perhaps this is therefore already a solved problem? I can understand why Iceberg Java felt the need to provide FileIO, as the ecosystem there is still very wedded to filesystem APIs, but the Rust ecosystem has largely avoided this mistake? I dunno I am far from impartial, but the motivation for object_store is to provide this exact abstraction |
...with the disclaimer, that I have little to no experience regarding the intricacies of object store APIs; I (also) think we should avoid "reinventing the wheel" here and use already existing solutions; with that in mind if ...hope this makes any sense at all. |
I guess I was suggesting to just not have a FileIO trait, and just use |
I think a
|
Closing now. I thought we would have huge breaking API changes, but it turned out to be a simple, small PR. I'll track the optimization in other issues instead. |
Hi, iceberger! OpenDAL's coming v0.46 release will have API changes that could affect our project.
New Features
OpenDAL Reader now has concurrent support
OpenDAL Reader now has concurrent support that can read multiple chunks concurrently.
The
buf
here will be fetched in 4 concurrent requests.To read non-contiguous buffers, please use our
fetch
API:OpenDAL will merge close ranges and read them concurrently.
The detailed upgrade guide could be found here.
OpenDAL v0.46 is not related yet so those changes are still possible to be altered. I will try my best to keep this issue update.
API Changes
I list the major changes that we need to take care:
OpenDAL Reader doens't impl
AsyncRead + AsyncSeek
anymoreOpenDAL's Reader now transformed into range based read.
Users can transform into
AsyncRead + AsyncSeek
by usinginto_futures_async_read
:But please note:
opendal::Reader
adopts zero-cost abstraction, no extra bytes copy and allocation happened.opendal::FuturesAsyncReader
is the same as our old reader, it might have extra bytes copy.OpenDAL Writer doens't impl
AsyncWrite
anymoreJust like Reader,
opendal::Writer
doesn't implAsyncWrite
anymore. Users could use opendal's nativeBuffer
for both contiguous and non-contiguous buffers support.Users can transform into
AsyncWrite
by usinginto_futures_async_write
:Tasks
Although it's possible to simply convert opendal's
Reader
andWriter
intoAsyncXxx
-based structures, I aim to prepare Iceberg for the most efficient IO methods. In the near future, we will support compilation-based IO and vectorization. TheAsyncXxx
-based traits do not integrate well with these methods.I believe only read side needs to do some changes. write side should be simple to update.
Related
The text was updated successfully, but these errors were encountered: