Thursday, February 2, 2012

Forcing values returned from monadic computations

This is the start of my series on reliable techniques for writing fast Haskell code. In each installment we'll look at a specific situation and explore some simple guidelines you can use in your everyday programming.

Today's topic is forcing values returned from monadic computations. After reading this article you'll hopefully be able to tell when it's appropriate to force values passed to return and when it's not.

Consider this function:

f :: Int -> Int -> Int
f x y = x + y

f is strict in both arguments and when called with two arguments it will return a fully evaluated Int value. No laziness anywhere to be seen.

Now lets look at a function that looks very similar to f, but involves return:

g :: Monad m => Int -> Int -> m Int
g x y = return $ x + y

g looks very similar to f so it's easy to think that it, like f, is strict in both arguments and that it will return a fully evaluated Int value inside the given monad.

However, that's not the case. Lets look what happens if m is e.g. a state monad (such as IO) [1]. Here's a possible definition of a state monad:

newtype State s a = S (s -> (a, s))

instance Monad (State s) where
    return x = S $ \ s -> (x, s)
    m >>= f = ...

Lets expand >>=, return, and $ in g:

g :: Int -> Int -> State s Int
g x y = S $ \ s -> (x + y, s)

Since our monad wraps the return value in a (lazy) pair, the evaluation of x + y is delayed until the first component of that pair is evaluated, some time later in the program. If we don't want to delay the evaluation, we must first force the value passed to return (e.g. by using $!):

g :: Int -> Int -> State s Int
g x y = return $! x + y

Here's what we get if we expand the stricter version:

g :: Int -> Int -> State s Int
g x y = let z = x + y
        in z `seq` S $ \ s -> (z, s)

Now the expression x + y is evaluated before the pair is created and the result (rather than a thunk representing the expression) is stored in the pair.

We can also see the difference between the two different versions of g in the core generated by GHC. First the lazier version:

g = \ (x :: Int) (y :: Int) (s :: s) ->
    (case x of _
         I# x# -> case y of _
             I# y# -> I# (+# x# y#),

and then the stricter one:

g = \ (x :: Int) (y :: Int) (s :: s) ->
    case x of _
        I# x# -> case y of _
            I# y# -> (I# (+# x# y#), s)

If you look carefully you'll see that in the lazier version the evaluation of x and y has been pushed inside the first component of the pair.

The stricter version still allocates an I# box, because pairs are polymorphic in their components and unboxed values (like Int#) cannot be stored in polymorphic fields and thus need to be wrapped.

The above example used a simple arithmetic expression but the same reasoning applies to data types with strict fields. For example, given this strict-pair data type:

data StrictPair a b = SP !a !b

you need to use $! if you want its fields to be evaluated before a monadic computation returning a StrictPair returns, like so:

h :: Int -> Int -> State s (StrictPair Int Int)
h x y = return $! SP (x+1) (y+1)

Aside: Int (and Double, Float, etc.) arithmetic is really just a special case of a data type with strict fields as Int is defined as:

data Int = I# Int#

where Int# is an unboxed type. Fields of unboxed types are always strict.

Forcing values that are already in WHNF has no effect e.g.

-- Has no effect and thus potentially confuses readers:
h x y = return $! (x, y)

Guideline: force data types with strict fields (including e.g. Int) to WHNF using $! before passing them to return.

  1. This behavior is not specific to the state monad. Any monad where return is lazy in its argument would behave the same.


  1. I was recently bitten in the arse by the strictness of >>= in the IO monad. I was trying to recursively generate infinite list, and ended up with an infinite loop.

    I think allowing a polymorphic function to be strict for some types and lazy for others, without informing the user, is a major violation of the principle of least surprise. It would be nice if inquiring about the type of a function in ghci would return some strictness annotations. For example, if

    f x y = (x, y)
    g x y = x `seq` (x, y)

    then :t f would return

    f :: Int -> Int -> (Int, Int)

    and :t g would return

    g :: !Int -> Int -> (Int, Int)

    but they would remain compatible typewise.

    What do you think? Is there some deep type theory reason why that could not be implemented?

  2. Alex,

    See Max's paper on Strict Core:

    If I remember correctly one issue is how to handle higher-order functions e.g. what do you do with map :: (!a -> b) -> [a] -> [b]

    1. That is more than I was asking for. I just want an annotation that differentiates between (a -> b) -> [a] -> [b] and !(a -> b) -> [a] -> [b]. I know this information can be obtained with -ddump-stranal, but that assumes you have the source code to everything.

  3. Actually the strictness information about function arguments is already in the interface files (*.hi). What we would need is a tool to extract that information and print it in a nice form.

    How dificult would be to write this tool? Is the .hi format documented somewhere?

    1. ghc --show-iface prog.hi?

    2. Indeed the strictness information is in the interface file. Unfortunately it looks like this: U(AASAAAAAA)LL and I have not yet managed to figure out how to decode it. The interface format is not documented beyond a statement that it is the serialization of some types from a certain GHC source file.

    3. I've often wished for better document of this format. IIRC "U" is for unboxed and "L" is for lazy/don't know. "A" might be absent (or unused or some such.)