Friday, August 24, 2012

You can soon play in the cabal sandbox

I just merged the last large set of patches, written by Mikhail Glushenkov, that are needed to implement the new cabal sandbox feature (a generalization of cabal-dev, cab, etc) into the cabal master branch.

This work will alleviate the dependency problems that crop up too often when working with cabal. It doesn't solve all problems, but it should prevent package breakages due to reinstalls (just like cabal-dev does) by avoiding the use of a shared package database.

There's still some work left to be done, mainly coming up with a UI that supports all the use cases we have and is easy to use at the same time. I hope we can get that done by the end of ICFP.

The mechanism used to implement this feature, package environments, will also allow us to do other useful things, like allowing you to specify a list of exact package versions you want to build your package against. This is useful e.g. if your building and shipping an executable, as you don't want each version to be released against a different set of library versions, depending on what the dependency solver picked on a given day.

Thursday, August 23, 2012

A new fast and easy to use CSV library

I'm proud to present the cassava library, an efficient, easy to use CSV library for Haskell. The library is designed in the style of aeson, Bryan O'Sullivan's excellent JSON library.

The library implements RFC 4180 with a few extensions, such as Unicode support. It is also fast. I compared it to the Python csv module, which is written in C, and cassava outperformed it in all my benchmarks. I've spent almost no time optimizing the library -- it owes its speed to attoparsec -- so there should still be room for speed improvements.

Here's the two second crash course in using the library. Given a CSV file with this content:

John Doe,50000
Jane Doe,60000

here's how you'd process it record-by-record:

{-# LANGUAGE ScopedTypeVariables #-}

import qualified Data.ByteString.Lazy as BL
import Data.Csv
import qualified Data.Vector as V

main :: IO ()
main = do
    csvData <- BL.readFile "salaries.csv"
    case decode csvData of
        Left err -> putStrLn err
        Right v -> V.forM_ v $ \ (name, salary :: Int) ->
            putStrLn $ name ++ " earns " ++ show salary ++ " dollars"

(In this example it's not strictly neccesary to parse the salary field as an Int, a String would do, but we do so for demonstration purposes.)

cassava is quite different from most CSV libraries. Most CSV libraries will let you parse CSV files into something equivalent to [[ByteString]], but after that you're on your own. cassava instead lets you declare what you expect the type of each record to be (i.e. (Text, Int) in the example above) and the library will then both parse the CSV file and convert each column to the requested type, doing error checking as it goes.

Download the package from Hackage: http://hackage.haskell.org/package/cassava

Get the code from GitHub: https://github.com/tibbe/cassava