-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Incremental or streaming decoding #10
Comments
Row at a time or chunk of rows at a time would be good, streaming individual rows is going to be inefficient in many cases (such as the time-double value examples I've shown before), so having something like: stream :: Monad m => Int -> Parser a -> ByteString m a -> Stream (Of (Vector a)) m (Either (Message, ByteString m r) r) would be quite useful. Also important is streaming serialisation. |
Hello, I am currently working on something comparable to a data-frame library and just stumbled upon this package. Looks great! 🙂 I would love to use this package for parsing CSVs etc., but I am fundamentally streaming-based, so this feature is important to me. Also, I would like to have a more low-level hook, since I am not sure which streaming-package I want to integrate with. |
A quick experiment: https://github.com/tonyday567/streaming-sv/ I got a fair way towards streaming with the existing library. The main blocker seemed to be the list in Records. |
Hi Tony. That's quite interesting. Thanks for linking it.
Do you mean the vector?
Perhaps we could change that structure to better support streaming, or create a separate, more stream-oriented structure as an alternative? |
Yes, I meant the Vector in Records. A streaming version would be something like:
Not sure what to do about the I had to hardcode an But impressive that streaming can occur out of the box without any prior engineering. Shows you're on the right track with these types. |
Currently sv will parse and load an entire document into memory before starting any decoding. On a 5GB CSV file, this would likely end in disaster.
It would be worth looking into whether we could add a "row-at-a-time" approach and what the trades-off would be.
The text was updated successfully, but these errors were encountered: