At hackathons I often end up chatting with people about changes I'd like to see in some of Haskell's core libraries. As always, there are many more changes I'd like to make than I have time to make. I'm posting some of my "to-dos" here in hope that someone with some spare time will pick them up.
Improvements to the binary package
In the binary package, add incremental input support to
Data.Binary.Get. This would allow users to parse large inputs read from e.g. a file, without having to resort to lazy I/O. The API would be quite simple, add a new data type:
data Result r = Fail !ByteString !Int64 | Partial (ByteString -> Result r) | Done r !ByteString !Int64
Done constructors contain the current parse state. This helps debugging and error reporting in the case of
Fail and makes it possible to hand the remaining input to some other function (or parser) in the case of
Done. In addition to this data type, we need a function to run a parser:
runGetPartial :: Get a -> Result a
That's it! The hard part is to implement this API while keeping the great performance of the current implementation. I believe Lennart Kolmodin had a working implementation of this design, but I can't find the code.
I'd also like to see the implementation techniques used in the blaze-builder package ported to
Data.Binary.Builder to improve the performance of builders.
Data.Binary.Builder has a nice, simple API and a lot of users (via
Data.Binary.Put). Giving those users some free performance would be a good thing.
If I'd undertake this project myself I'd start by writing some Criterion benchmarks for the parser, inspired by the current set of benchmarks, and porting all of the blaze-builder benchmarks to the binary package.
Improvements to the text package
In the text package, improve the performance of the lazy text builder in
Data.Text.Lazy.Builder, using the same blaze-binary implementation techniques mentioned above.
I'd also add a rewrite rule for
unpackCString# that would transcode a GHC string literal directly from UTF-8 to UTF-16 (which is what the text package uses internally) instead of going via a
String, which is what happens now.
All of the above changes will likely require you to read Core. If you're unfamiliar with Core you can take a peek at my slide deck from last year's CUFP, which has a few slides about reading Core.