Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Issue with get_mapper #5

Open
observingClouds opened this issue Dec 7, 2022 · 4 comments
Open

Issue with get_mapper #5

observingClouds opened this issue Dec 7, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@observingClouds
Copy link
Owner

Originally posted by @wachsylon in #3

    Just adding things from the chat:

Would be nice if mappers could work:

import fsspec
import os
if "slk" not in os.environ["PATH"]:
    os.environ["PATH"]=os.environ["PATH"]+":/sw/spack-levante/slk-3.3.67-jrygfs/bin/:/sw/spack-levante/openjdk-17.0.0_35-k5o6dr/bin"
SLK_CACHE="/scratch/k/k204210/INTAKE"
%env SLK_CACHE={SLK_CACHE}

a=fsspec.get_mapper("slk:///arch/ik1017/cmip6/CMIP6/")
b=fsspec.get_mapper(SLK_CACHE)
target_name="AerChemMIP_002.tar"
b[target_name]=a[target_name]

TypeError                                 Traceback (most recent call last)
Cell In [8], line 2
      1 target_name="AerChemMIP_002.tar"
----> 2 b[target_name]=a[target_name]

File ~/.conda/envs/slkspecenv/lib/python3.10/site-packages/fsspec/mapping.py:163, in FSMap.__setitem__(self, key, value)
    161 key = self._key_to_str(key)
    162 self.fs.mkdirs(self.fs._parent(key), exist_ok=True)
--> 163 self.fs.pipe_file(key, maybe_convert(value))

File ~/.conda/envs/slkspecenv/lib/python3.10/site-packages/fsspec/spec.py:737, in AbstractFileSystem.pipe_file(self, path, value, **kwargs)
    735 """Set the bytes of given file"""
    736 with self.open(path, "wb", **kwargs) as f:
--> 737     f.write(value)

File ~/.conda/envs/slkspecenv/lib/python3.10/site-packages/fsspec/implementations/local.py:340, in LocalFileOpener.write(self, *args, **kwargs)
    339 def write(self, *args, **kwargs):
--> 340     return self.f.write(*args, **kwargs)

TypeError: a bytes-like object is required, not '_io.BufferedReader'

Originally posted by @wachsylon in #3 (comment)

@observingClouds
Copy link
Owner Author

Hi @wachsylon,
I just pulled your issue over here to declutter the PR a bit.

@observingClouds
Copy link
Owner Author

This was actually working in the initial release. As the error message suggests, we need to return a byte-string and not a class.

@observingClouds observingClouds added the bug Something isn't working label Dec 8, 2022
@wachsylon
Copy link

Using the mapper turned out to be bad for 100GB tars because the entire content is loaded into memory. With

import shutil
with fs.open("slk:///arch/ik1017/cmip6/CMIP6/AerChemMIP_002.tar","rb") as s:
    with fs.open("/scratch/k/k204210/INTAKE/AerChemMIP_002.tar","wb") as d :
        shutil.copyfileobj(s,d)

I now had success with recent updates from the forked repo.

@observingClouds
Copy link
Owner Author

observingClouds commented Dec 13, 2022

slkspec does not understand the content of the files. Combining it with other protocols make it however possible to load only parts of a file. `

The following should work for example:

from intake import Catalog
from intake.catalog.local import LocalCatalogEntry
mycat = Catalog.from_dict({'testcat': LocalCatalogEntry('testfile',
                          'some showcase testfile', driver='netcdf',
                          args={'urlpath': 'tar://download_compressed.nc::slk:///arch/mh0010/m300408/showcase/download_compressed.tar'
                          , 'xarray_kwargs': {'engine': 'h5netcdf'}})})
ds = mycat.testcat.to_dask()

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants