-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Usage with pyarrow parquet #10
Comments
Hi Tanguy, The dremel example was created with parquet's c++ api [1]. The last time I checked (~2 years ago), pyarrow's parquet writer/reader did not properly support structured data. But this could have changed. Do you have the full stack trace? The errors you listed are not fatal errors. |
Hello thanks for the answer ! It's actually a core dump SEGFAULT:
In python side there is nothing except the log right before. I remember some conversations on Pyarrow ability to store those but I thought it was resolved. The parquet-cpp however seems to now be in Arrow repo. I'll try to see if I can understand the difference between both format ! |
Hello, I'm very interested by the library usage however I struggle to apply it to a parquet file other than the dremel example.
segfaults with the error:
2021-04-15 15:30:40.254237: E struct2tensor/kernels/parquet/parquet_reader.cc:198]
The repetition type of the root node was 0, but should be 2. There may be something wrong with your supplied parquet schema. We will treat it as a repeated field.
2021-04-15 15:31:46.428109: W tensorflow/core/framework/dataset.cc:477]
Input of ParquetDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
I also tried saving again the dremel file loaded with Pyarrow and dumping it right away and I can reproduce the error.
How do you advise to save your parquet ?
Thanks for your help !
The text was updated successfully, but these errors were encountered: