Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

occasional bad data files when writing with parallel zlib #1710

Closed
edwardhartnett opened this issue May 5, 2020 · 3 comments
Closed

occasional bad data files when writing with parallel zlib #1710

edwardhartnett opened this issue May 5, 2020 · 3 comments

Comments

@edwardhartnett
Copy link
Contributor

From the NOAA GFS system, we have a problem with files written with parallel compression. Sometimes they are unreadable. Recreating the file fixes the problem.

The problem occurs on read, with this error:

HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
  #000: H5Dio.c line 199 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 603 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 2293 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: H5Dchunk.c line 3658 in H5D__chunk_lock(): data pipeline read failed
    major: Dataset
    minor: Filter operation failed
  #004: H5Z.c line 1326 in H5Z_pipeline(): filter returned failure during read
    major: Data filters
    minor: Read failed
  #005: H5Zdeflate.c line 123 in H5Z_filter_deflate(): inflate() failed
    major: Data filters
    minor: Unable to initialize object

I am investigating further...

@WardF
Copy link
Member

WardF commented May 5, 2020

Thanks Ed, 'bad data' always grabs my attention. Watching this issue closely.

@edwardhartnett
Copy link
Contributor Author

OK, the good news is this is only happening on one machine. So there's a good chance this is the result of a build issue. We're going to rebuild the I/O stack and see if we can reproduce the problem...

@edwardhartnett
Copy link
Contributor Author

This turned out to be caused by mixing shared libraries from netcdf-c 4.7.4 and 4.7.2. Once they resolved their build issues, the problem went away. ;-)

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants