-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental or streaming decoding #10
Comments
Row at a time or chunk of rows at a time would be good, streaming individual rows is going to be inefficient in many cases (such as the time-double value examples I've shown before), so having something like: stream :: Monad m => Int -> Parser a -> ByteString m a -> Stream (Of (Vector a)) m (Either (Message, ByteString m r) r) would be quite useful. Also important is streaming serialisation. |
Hello, I am currently working on something comparable to a data-frame library and just stumbled upon this package. Looks great! 🙂 I would love to use this package for parsing CSVs etc., but I am fundamentally streaming-based, so this feature is important to me. Also, I would like to have a more low-level hook, since I am not sure which streaming-package I want to integrate with. |
A quick experiment: https://github.com/tonyday567/streaming-sv/ I got a fair way towards streaming with the existing library. The main blocker seemed to be the list in Records. |
Hi Tony. That's quite interesting. Thanks for linking it.
Do you mean the vector?
Perhaps we could change that structure to better support streaming, or create a separate, more stream-oriented structure as an alternative? |
Yes, I meant the Vector in Records. A streaming version would be something like:
Not sure what to do about the I had to hardcode an But impressive that streaming can occur out of the box without any prior engineering. Shows you're on the right track with these types. |
Currently sv will parse and load an entire document into memory before starting any decoding. On a 5GB CSV file, this would likely end in disaster.
It would be worth looking into whether we could add a "row-at-a-time" approach and what the trades-off would be.
The text was updated successfully, but these errors were encountered: