Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

asyn_wrapper: avoid creating async version of open method #1769

Merged
merged 1 commit into from
Jan 27, 2025

Conversation

skshetry
Copy link
Contributor

@skshetry skshetry commented Jan 9, 2025

The _open method is part of an AbstractFileSystem interface, and is supposed to be synchronous, even in case of AsyncFileSystem.

The `_open` method is part of an AbstractFileSystem interface, and it is
supposed to be synchronous.
Comment on lines +60 to 63
excluded_methods = {"open"}
for method_name in dir(self.sync_fs):
if method_name.startswith("_"):
if method_name.startswith("_") or method_name in excluded_methods:
continue
Copy link
Contributor Author

@skshetry skshetry Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this can be changed to following,

for method in dir(AsyncFileSystem):
  if not private.match(method):
      continue
  if not asyncio.iscoroutinefunction(getattr(AsyncFileSystem, method, None)):
      continue
  smethod = method[1:]
  f = getattr(self.sync_fs, smethod, None)
  if callable(f) and not asyncio.iscoroutinefunction(f):
      mth = async_wrapper(f, obj=self)
      setattr(self, method, mth)
      if not mth.__doc__:
          mth.__doc__ = f.__doc__

and maybe moved to fsspec.asyn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is similar code in https://github.com/fsspec/filesystem_spec/blob/master/fsspec/asyn.py#L915 to differentiate methods defined in Async versus AsbtractFilesystem.

@martindurant
Copy link
Member

To be clear: I am happy with the changes as given, I can merge it. I wonder if there will be other methods that need adding to the list in the future.

@skshetry
Copy link
Contributor Author

To be clear: I am happy with the changes as given, I can merge it. I wonder if there will be other methods that need adding to the list in the future.

I see that the current implementation async-ifies APIs that are not in AsyncFileSystem (eg: lexists, read_bytes, etc). What's the use-case of AsyncFileSystemWrapper - is it to imitate AsyncFileSystem or is it to provide async methods for all sync methods of a filesystem (which is the current implementation)?

If the latter, then this might be a good-enough workaround. If the former, I can push the aforementioned patch.

In any case, I am only interested in the _open API at the moment. So, I'll leave that up to you to decide. :)

@martindurant
Copy link
Member

martindurant commented Jan 13, 2025

What's the use-case of AsyncFileSystemWrapper

It's to allow using a Sync filesystem in a context where only async calls are being made. This is only used currently with in zarr (e.g., zarr-developers/zarr-python#2533 ), and its store implementation only calls a subset of the available methods. The implementation, applying automatically to many methods, is I think the simplest, and that was the main motivation; it also naturally follows the existing code in fsspec.asyn to do the inverse, providing end-user sync methods for async filesystems.

skshetry added a commit to iterative/datachain that referenced this pull request Jan 16, 2025
There is a bug in `fsspec==2024.12.0` that causes the `ReferenceFileSystem` to
incorrectly make `fs._open` return a coroutine object instead of a file-like object.
(See a proposed PR to fix this issue: fsspec/filesystem_spec#1769.)

We have a test for the expected behavior (`test_arrow_generator_partitioned` in `tests/unit/lib/test_arrow.py`)
running in the CI environment.
But that does not fail because the latest version of `fsspec` does not get installed in the CI
due to the upper limit set by the `datasets` library.

The `datasets` library is only installed as part of the `hf` and `tests` extras,
so the default installation of `datachain` will encounter this issue.

Fixes #806.
@skshetry
Copy link
Contributor Author

@martindurant, what do you suggest? Are you okay with the PR or do you want me to try the alternative way as discussed in #1769 (comment)?

@martindurant
Copy link
Member

I think the PR is fine as is - if you say it's ready. The rest was only suggestions of how it could be made more complete maybe, which can be left for the future or not done at all.

@skshetry
Copy link
Contributor Author

Then I think this should be ready. :)

@martindurant martindurant merged commit 3763661 into fsspec:master Jan 27, 2025
11 checks passed
@skshetry skshetry deleted the no-open-asyn-wrap branch January 27, 2025 16:40
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants