-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
asyn_wrapper: avoid creating async version of open method #1769
Conversation
The `_open` method is part of an AbstractFileSystem interface, and it is supposed to be synchronous.
excluded_methods = {"open"} | ||
for method_name in dir(self.sync_fs): | ||
if method_name.startswith("_"): | ||
if method_name.startswith("_") or method_name in excluded_methods: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, this can be changed to following,
for method in dir(AsyncFileSystem):
if not private.match(method):
continue
if not asyncio.iscoroutinefunction(getattr(AsyncFileSystem, method, None)):
continue
smethod = method[1:]
f = getattr(self.sync_fs, smethod, None)
if callable(f) and not asyncio.iscoroutinefunction(f):
mth = async_wrapper(f, obj=self)
setattr(self, method, mth)
if not mth.__doc__:
mth.__doc__ = f.__doc__
and maybe moved to fsspec.asyn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is similar code in https://github.com/fsspec/filesystem_spec/blob/master/fsspec/asyn.py#L915 to differentiate methods defined in Async versus AsbtractFilesystem.
To be clear: I am happy with the changes as given, I can merge it. I wonder if there will be other methods that need adding to the list in the future. |
I see that the current implementation async-ifies APIs that are not in If the latter, then this might be a good-enough workaround. If the former, I can push the aforementioned patch. In any case, I am only interested in the |
It's to allow using a Sync filesystem in a context where only async calls are being made. This is only used currently with in zarr (e.g., zarr-developers/zarr-python#2533 ), and its store implementation only calls a subset of the available methods. The implementation, applying automatically to many methods, is I think the simplest, and that was the main motivation; it also naturally follows the existing code in fsspec.asyn to do the inverse, providing end-user sync methods for async filesystems. |
There is a bug in `fsspec==2024.12.0` that causes the `ReferenceFileSystem` to incorrectly make `fs._open` return a coroutine object instead of a file-like object. (See a proposed PR to fix this issue: fsspec/filesystem_spec#1769.) We have a test for the expected behavior (`test_arrow_generator_partitioned` in `tests/unit/lib/test_arrow.py`) running in the CI environment. But that does not fail because the latest version of `fsspec` does not get installed in the CI due to the upper limit set by the `datasets` library. The `datasets` library is only installed as part of the `hf` and `tests` extras, so the default installation of `datachain` will encounter this issue. Fixes #806.
@martindurant, what do you suggest? Are you okay with the PR or do you want me to try the alternative way as discussed in #1769 (comment)? |
I think the PR is fine as is - if you say it's ready. The rest was only suggestions of how it could be made more complete maybe, which can be left for the future or not done at all. |
Then I think this should be ready. :) |
The
_open
method is part of anAbstractFileSystem
interface, and is supposed to be synchronous, even in case ofAsyncFileSystem
.