In Memory netcdf subdatasets do not persist order when buffer is closed #1388

lagamura · 2024-11-17T17:37:41Z

To report a non-security related issue, please provide:

the version of the software with which you are encountering an issue
netcdf4 1.7.1 nompi_py311hae66bec_102 conda-forge
environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
OS: Almalinux-9.3, python: 3.11
a description of the issue with the steps needed to reproduce it:
When writing subdatasets to a netcdf in-memory, the subdatasets change index order when the buffer is written as a netcdf file at the end. Following a minimal example:

import numpy as np
from netCDF4 import Dataset
from osgeo import gdal

list_of_subds = ["first_subdataset", "c_subdataset", "b_subdataset"]

ds = Dataset(
    "dump_ds.nc", mode="w", memory=1028, format="NETCDF4"
) 

ds.createDimension("lon", 100)
ds.createDimension("lat", 100)
ds.createDimension("time", None)

for subds in list_of_subds:

    data = ds.createVariable(
        subds,
        "f8",
        ("time", "lat", "lon"),
        zlib=True,
        fill_value=-1,
    )
    data[0, :, :] = np.arange(100)

print(ds)
nc_buf = ds.close()
with open("dump_ds.nc", "wb") as f:
    f.write(nc_buf)

print(gdal.Info("dump_ds.nc"))

In print(ds) we still have ordered subdatasets:

root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): lon(100), lat(100), time(1)
variables(dimensions): float64 first_subdataset(time, lat, lon), float64 c_subdataset(time, lat, lon), float64 b_subdataset(time, lat, lon)
groups:

printing gdal.Info after dumping the nc file:

Subdatasets:
SUBDATASET_1_NAME=NETCDF:"dump_ds.nc":b_subdataset
SUBDATASET_1_DESC=[1x100x100] b_subdataset (64-bit floating-point)
SUBDATASET_2_NAME=NETCDF:"dump_ds.nc":c_subdataset
SUBDATASET_2_DESC=[1x100x100] c_subdataset (64-bit floating-point)
SUBDATASET_3_NAME=NETCDF:"dump_ds.nc":first_subdataset
SUBDATASET_3_DESC=[1x100x100] first_subdataset (64-bit floating-point)

jswhit · 2024-11-17T23:18:00Z

I don't know how gdal chooses how to order to variables - maybe alphabetical? Don't believe this is a bug in netcdf4-python.

jswhit · 2024-11-17T23:28:42Z

Looks like the order of the variables does change when the memory buffer is written out and re-read (ncdump shows the same thing as gdal). I don't know if the order should be preserved - perhaps @DennisHeimbigner would know.

lagamura · 2024-11-18T07:51:22Z

Looks like the order of the variables does change when the memory buffer is written out and re-read (ncdump shows the same thing as gdal). I don't know if the order should be preserved - perhaps @DennisHeimbigner would know.

Thanks for the quick look,
In my opinion the order should be preserved, for consistency, as it happens if you use typically Dataset class to store a netcdf in the disk. Currently, when in-memory is used, the subdatasets will be written alphabetically ordered as you pointed out.

jswhit · 2024-11-18T12:44:53Z

just curious - why does the order matter for your use case?

lagamura · 2024-11-18T12:50:54Z

To be in compliance with previous version of the product we are working on. A more specific usage would be if someone opens two netcdfs of the same product and try to compare the subdataset by indices.

jswhit · 2024-11-18T15:52:27Z

netcdf-c keeps track of creation order, and preserves that order when a dataset is written to disk. Since you are bypassing the c library when writing the memory buffer to disk directly, my guess is that the logic that preserves creation order is also bypassed. Unfortunately, I don't see any way to tell the C library to write the memory buffer to disk preserving the creation order.

lagamura · 2024-11-19T09:28:12Z

Just to clarify the use-case, we want to use in-memory feature combining with writing the IO.buffer result directly to S3.
It is possible by using the netcdf driver with gdal and write directly to s3 storage, so I will make a minimal example and check the sub-datasets order.

lagamura · 2024-11-19T10:57:39Z

Apparently, it seems netcdf gdal driver does not support writing a file directly on s3 (/vsis3).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

lagamura commented Nov 17, 2024

jswhit commented Nov 17, 2024

jswhit commented Nov 17, 2024

lagamura commented Nov 18, 2024

jswhit commented Nov 18, 2024

lagamura commented Nov 18, 2024

jswhit commented Nov 18, 2024 •

edited

Loading

lagamura commented Nov 19, 2024

lagamura commented Nov 19, 2024

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

In Memory netcdf subdatasets do not persist order when buffer is closed #1388

Comments

lagamura commented Nov 17, 2024

jswhit commented Nov 17, 2024

jswhit commented Nov 17, 2024

lagamura commented Nov 18, 2024

jswhit commented Nov 18, 2024

lagamura commented Nov 18, 2024

jswhit commented Nov 18, 2024 • edited Loading

lagamura commented Nov 19, 2024

lagamura commented Nov 19, 2024

jswhit commented Nov 18, 2024 •

edited

Loading