You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bug in the implementation, and is supposed to work as you suggest. The following change fixes it.
--- a/fsspec/implementations/http.py+++ b/fsspec/implementations/http.py@@ -107,7 +107,7 @@ class HTTPFileSystem(AbstractFileSystem):
return list(sorted(out))
def cat(self, url):
- r = requests.get(url, **self.kwargs)+ r = self.session.get(url, **self.kwargs)
r.raise_for_status()
return r.content
Note that reusing sessions does present a small thread safety issue: it is possible for the pool to run out of connections when called from multiple threads simultaneously, resulting in connections being evicted and closed.
This issue originally came up in zarr-developers/zarr-python#536.
HTTPFilesystem is slow to fetch many files, because it does not reuse a connectionpool. Example:
As you can see,
SSL_do_handshake
is called 20 times and takes most of the time.If we do basically the same thing with requests
In this case, because we reused the session, it goes much faster:
Could
HTTPFileSystem
be configured to reuse a session in a similar way?The text was updated successfully, but these errors were encountered: