Today I released the next major version of cassava, my CSV parsing and encoding library.
New in this version is streaming and incremental parsing, exposed through
Data.Csv.Incremental respectively. Both approaches allow for O(1)-space parsing and more flexible error handling. The latter also allows for interleaving parsing and I/O.
The API now exposes three ways to parse CSV files, ordered from most convenient and least flexible to least convenient and most flexible:
Data.Csv causes the whole parse to fail if there are any errors, either in parsing or type conversion. This is convenient if you want to parse a small to medium-sized CSV file that you know is correctly formatted.
On the other extreme, if you're parsing a 1GB CSV file that's being uploaded by some user of your webapp, you probably want to use the
Data.Csv.Incremental module, to avoid high memory usage and to be able to more graciously deal with formatting errors in the user's CSV file.
Other notable changes:
The various index-based decode functions now take an extra argument that allow you to skip the header line, if the file has one. Previously you had to use the name-based decode functions to work with files that contained headers.
Space usage in
Data.Csv.decodeand friends has been reduced significantly. However, these decode functions still have somewhat high space usage, so if you're parsing 100MB or more of CSV data, you want to use the
Incrementalmodules. I have plans on improving space usage by a large amount in the future.