This is the start of my series on reliable techniques for writing fast Haskell code. In each installment we'll look at a specific situation and explore some simple guidelines you can use in your everyday programming.

Today's topic is forcing values returned from monadic computations. After reading this article you'll hopefully be able to tell when it's appropriate to force values passed to `return`

and when it's not.

Consider this function:

```
f :: Int -> Int -> Int
f x y = x + y
```

`f`

is strict in both arguments and when called with two arguments it will return a fully evaluated `Int`

value. No laziness anywhere to be seen.

Now lets look at a function that looks very similar to `f`

, but involves `return`

:

```
g :: Monad m => Int -> Int -> m Int
g x y = return $ x + y
```

`g`

looks very similar to `f`

so it's easy to think that it, like `f`

, is strict in both arguments and that it will return a fully evaluated `Int`

value inside the given monad.

However, that's not the case. Lets look what happens if `m`

is e.g. a state monad (such as `IO`

) [1]. Here's a possible definition of a state monad:

```
newtype State s a = S (s -> (a, s))
instance Monad (State s) where
return x = S $ \ s -> (x, s)
m >>= f = ...
```

Lets expand `>>=`

, `return`

, and `$`

in `g`

:

```
g :: Int -> Int -> State s Int
g x y = S $ \ s -> (x + y, s)
```

Since our monad wraps the return value in a (lazy) pair, the evaluation of `x + y`

is delayed until the first component of that pair is evaluated, some time later in the program. If we don't want to delay the evaluation, we must first force the value passed to `return`

(e.g. by using `$!`

):

```
g :: Int -> Int -> State s Int
g x y = return $! x + y
```

Here's what we get if we expand the stricter version:

```
g :: Int -> Int -> State s Int
g x y = let z = x + y
in z `seq` S $ \ s -> (z, s)
```

Now the expression `x + y`

is evaluated before the pair is created and the result (rather than a thunk representing the expression) is stored in the pair.

We can also see the difference between the two different versions of `g`

in the core generated by GHC. First the lazier version:

```
g = \ (x :: Int) (y :: Int) (s :: s) ->
(case x of _
I# x# -> case y of _
I# y# -> I# (+# x# y#),
s)
```

and then the stricter one:

```
g = \ (x :: Int) (y :: Int) (s :: s) ->
case x of _
I# x# -> case y of _
I# y# -> (I# (+# x# y#), s)
```

If you look carefully you'll see that in the lazier version the evaluation of `x`

and `y`

has been pushed inside the first component of the pair.

The stricter version still allocates an `I#`

box, because pairs are polymorphic in their components and unboxed values (like `Int#`

) cannot be stored in polymorphic fields and thus need to be wrapped.

The above example used a simple arithmetic expression but the same reasoning applies to data types with strict fields. For example, given this strict-pair data type:

```
data StrictPair a b = SP !a !b
```

you need to use `$!`

if you want its fields to be evaluated before a monadic computation returning a `StrictPair`

returns, like so:

```
h :: Int -> Int -> State s (StrictPair Int Int)
h x y = return $! SP (x+1) (y+1)
```

Aside: `Int`

(and `Double`

, `Float`

, etc.) arithmetic is really just a special case of a data type with strict fields as `Int`

is defined as:

```
data Int = I# Int#
```

where `Int#`

is an unboxed type. Fields of unboxed types are always strict.

Forcing values that are already in WHNF has no effect e.g.

```
-- Has no effect and thus potentially confuses readers:
h x y = return $! (x, y)
```

**Guideline**: force data types with strict fields (including e.g. `Int`

) to WHNF using `$!`

before passing them to `return`

.

- This behavior is not specific to the state monad. Any monad where
`return`

is lazy in its argument would behave the same.

I was recently bitten in the arse by the strictness of >>= in the IO monad. I was trying to recursively generate infinite list, and ended up with an infinite loop.

ReplyDeleteI think allowing a polymorphic function to be strict for some types and lazy for others, without informing the user, is a major violation of the principle of least surprise. It would be nice if inquiring about the type of a function in ghci would return some strictness annotations. For example, if

f x y = (x, y)

g x y = x `seq` (x, y)

then :t f would return

f :: Int -> Int -> (Int, Int)

and :t g would return

g :: !Int -> Int -> (Int, Int)

but they would remain compatible typewise.

What do you think? Is there some deep type theory reason why that could not be implemented?

Alex,

ReplyDeleteSee Max's paper on Strict Core: http://www.cl.cam.ac.uk/~mb566/papers/tacc-hs09.pdf?

If I remember correctly one issue is how to handle higher-order functions e.g. what do you do with map :: (!a -> b) -> [a] -> [b]

That is more than I was asking for. I just want an annotation that differentiates between (a -> b) -> [a] -> [b] and !(a -> b) -> [a] -> [b]. I know this information can be obtained with -ddump-stranal, but that assumes you have the source code to everything.

DeleteActually the strictness information about function arguments is already in the interface files (*.hi). What we would need is a tool to extract that information and print it in a nice form.

ReplyDeleteHow dificult would be to write this tool? Is the .hi format documented somewhere?

ghc --show-iface prog.hi? http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/IfaceFiles

DeleteIndeed the strictness information is in the interface file. Unfortunately it looks like this: U(AASAAAAAA)LL and I have not yet managed to figure out how to decode it. The interface format is not documented beyond a statement that it is the serialization of some types from a certain GHC source file.

DeleteI've often wished for better document of this format. IIRC "U" is for unboxed and "L" is for lazy/don't know. "A" might be absent (or unused or some such.)

Delete