Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

change compression to compressor in netCDF.3.translate zarr.create_dataset calls #535

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wrongkindofdoctor
Copy link

Replaces 'compressionwith correctcompressor`argument in netCDF3.translate zarr.create_dataset calls.
Fixes #534
Tested with python 3.12 and kerchunk v.0.2.7 on RHEL8 OS

@wrongkindofdoctor wrongkindofdoctor marked this pull request as ready for review January 13, 2025 16:51
@martindurant
Copy link
Member

This is right, but I don't understand why no one hit it before! Is it possible to add a test in https://github.com/fsspec/kerchunk/blob/main/tests/test_netcdf.py which would have failed, but with this change passes?

@wrongkindofdoctor
Copy link
Author

@martindurant I'll see if I can update the netcdf unit tests to capture this behavior.

@wrongkindofdoctor
Copy link
Author

@martindurant It looks like test_netcdf.test_unlimited should catch the error if the zarr version is 3.0.0 or later. If I run it independently with Zarr v3 (I added a print statement to show the zarr version), I get the following output:

/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/bin/python3.12 /net/jml/pycharm-2024.3/plugins/python-ce/helpers/pycharm/_jb_pytest_runner.py --target test_netcdf.py::test_unlimited 
Testing started at 10:38 AM ...
Launching pytest with arguments test_netcdf.py::test_unlimited --no-header --no-summary -q in /net/jml/kerchunk/tests

============================= test session starts ==============================
collecting ... collected 1 item

test_netcdf.py::test_unlimited 

======================== 1 failed, 2 warnings in 1.12s =========================
FAILED                                    [100%]
Running with Zarr 3.0.0
tests/test_netcdf.py:79 (test_unlimited)
unlimited_dataset = '/tmp/pytest-of-Jessica.Liptak/pytest-3/test_unlimited0/test.nc'

    def test_unlimited(unlimited_dataset):
        fn = unlimited_dataset
        expected = xr.open_dataset(fn, engine="scipy")
        h = netCDF3.NetCDF3ToZarr(fn)
>       out = h.translate()

test_netcdf.py:84: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../kerchunk/netCDF3.py:194: in translate
    arr = z.create_dataset(
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/typing_extensions.py:2853: in wrapper
    return arg(*args, **kwargs)
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/group.py:2395: in create_dataset
    return Array(self._sync(self._async_group.create_dataset(name, **kwargs)))
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/sync.py:187: in _sync
    return sync(
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/sync.py:142: in sync
    raise return_result
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/sync.py:98: in _runner
    return await coro
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <AsyncGroup memory://22386488263744>, name = 'lat', shape = (10,)
kwargs = {'chunks': (10,), 'compression': None, 'dtype': dtype('>f4'), 'fill_value': None}
data = None

    @deprecated("Use AsyncGroup.create_array instead.")
    async def create_dataset(
        self, name: str, *, shape: ShapeLike, **kwargs: Any
    ) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:
        """Create an array.
    
        .. deprecated:: 3.0.0
            The h5py compatibility methods will be removed in 3.1.0. Use `AsyncGroup.create_array` instead.
    
        Arrays are known as "datasets" in HDF5 terminology. For compatibility
        with h5py, Zarr groups also implement the :func:`zarr.AsyncGroup.require_dataset` method.
    
        Parameters
        ----------
        name : str
            Array name.
        **kwargs : dict
            Additional arguments passed to :func:`zarr.AsyncGroup.create_array`.
    
        Returns
        -------
        a : AsyncArray
        """
        data = kwargs.pop("data", None)
        # create_dataset in zarr 2.x requires shape but not dtype if data is
        # provided. Allow this configuration by inferring dtype from data if
        # necessary and passing it to create_array
        if "dtype" not in kwargs and data is not None:
            kwargs["dtype"] = data.dtype
>       array = await self.create_array(name, shape=shape, **kwargs)
E       TypeError: AsyncGroup.create_array() got an unexpected keyword argument 'compression'
/net/Jessica.Liptak/miniconda3/envs/_MDTF_dev/lib/python3.12/site-packages/zarr/core/group.py:1169: TypeError

Process finished with exit code 1

You'll see that I am using the kerchunk conda package. I have not specified the Zarr in this test environment, so Zarr 3.0.0 is installed by default.

@martindurant
Copy link
Member

I think this should be fixed in the latest release, which now only supports zarr3.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AsyncGroup.create_array() got an unexpected keyword argument 'compression' in netcdf3.py module
2 participants