Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

NetCDF: HDF error when creating a lot of variables and attributes #2251

Closed
wkliao opened this issue Mar 17, 2022 · 3 comments
Closed

NetCDF: HDF error when creating a lot of variables and attributes #2251

wkliao opened this issue Mar 17, 2022 · 3 comments

Comments

@wkliao
Copy link
Contributor

wkliao commented Mar 17, 2022

NetCDF 4.8.1
HDF5 1.13.0
MPICH 3.4.3
gcc 8.5.0

I encountered "NetCDF: HDF error" when running a parallel program that creates
an HDF5-based NetCDF4 file. The test program that can reproduce the error is
available in icase_def.c

The test program follows the E3SM I/O pattern by creating the a large number of
variables and attributes:
27 global attributes
21 dimensions
560 variables
and each variable has a few attributes.
The test program does not call any nc_put_var* APIs.

When using one MPI process, the test program ran fine.
But when running 4 MPI processes, it printed the following errors and hung
at nc_enddef.

mpiexec -n 4 ./icase_def
Error at icase_def.c:35 : NetCDF: HDF error
Error at icase_def.c:35 : NetCDF: HDF error
Error at icase_def.c:35 : NetCDF: HDF error
@wkliao
Copy link
Contributor Author

wkliao commented Mar 25, 2022

Further investigation reveals that the location returns the error is at

netcdf-c/libhdf5/nc4hdf.c

Lines 1412 to 1413 in cd0f169

if (H5DSattach_scale(hdf5_var->hdf_datasetid, dsid, d) < 0)
return NC_EHDFERR;

In issue #1822, @brtnfld mentioned that "HDF5 does not test any of the HL APIs in a parallel setting". Given that, I wonder if netcdf plans to resolve the bug by taking @brtnfld's suggestion.

@DennisHeimbigner
Copy link
Collaborator

I wonder if this PR #2161
would fix the problem?

@wkliao
Copy link
Contributor Author

wkliao commented Mar 25, 2022

Yes. Thanks.
I assume this PR will go to 4.9.0.

wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Mar 25, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Mar 25, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
@wkliao wkliao closed this as completed Apr 22, 2022
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 22, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 28, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 28, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants