This is the start of my series on reliable techniques for writing fast Haskell code. In each installment we'll look at a specific situation and explore some simple guidelines you can use in your everyday programming.
Today's topic is forcing values returned from monadic computations. After reading this article you'll hopefully be able to tell when it's appropriate to force values passed to
return and when it's not.
Consider this function:
f :: Int -> Int -> Int f x y = x + y
f is strict in both arguments and when called with two arguments it will return a fully evaluated
Int value. No laziness anywhere to be seen.
Now lets look at a function that looks very similar to
f, but involves
g :: Monad m => Int -> Int -> m Int g x y = return $ x + y
g looks very similar to
f so it's easy to think that it, like
f, is strict in both arguments and that it will return a fully evaluated
Int value inside the given monad.
However, that's not the case. Lets look what happens if
m is e.g. a state monad (such as
IO) . Here's a possible definition of a state monad:
newtype State s a = S (s -> (a, s)) instance Monad (State s) where return x = S $ \ s -> (x, s) m >>= f = ...
g :: Int -> Int -> State s Int g x y = S $ \ s -> (x + y, s)
Since our monad wraps the return value in a (lazy) pair, the evaluation of
x + y is delayed until the first component of that pair is evaluated, some time later in the program. If we don't want to delay the evaluation, we must first force the value passed to
return (e.g. by using
g :: Int -> Int -> State s Int g x y = return $! x + y
Here's what we get if we expand the stricter version:
g :: Int -> Int -> State s Int g x y = let z = x + y in z `seq` S $ \ s -> (z, s)
Now the expression
x + y is evaluated before the pair is created and the result (rather than a thunk representing the expression) is stored in the pair.
We can also see the difference between the two different versions of
g in the core generated by GHC. First the lazier version:
g = \ (x :: Int) (y :: Int) (s :: s) -> (case x of _ I# x# -> case y of _ I# y# -> I# (+# x# y#), s)
and then the stricter one:
g = \ (x :: Int) (y :: Int) (s :: s) -> case x of _ I# x# -> case y of _ I# y# -> (I# (+# x# y#), s)
If you look carefully you'll see that in the lazier version the evaluation of
y has been pushed inside the first component of the pair.
The stricter version still allocates an
I# box, because pairs are polymorphic in their components and unboxed values (like
Int#) cannot be stored in polymorphic fields and thus need to be wrapped.
The above example used a simple arithmetic expression but the same reasoning applies to data types with strict fields. For example, given this strict-pair data type:
data StrictPair a b = SP !a !b
you need to use
$! if you want its fields to be evaluated before a monadic computation returning a
StrictPair returns, like so:
h :: Int -> Int -> State s (StrictPair Int Int) h x y = return $! SP (x+1) (y+1)
Float, etc.) arithmetic is really just a special case of a data type with strict fields as
Int is defined as:
data Int = I# Int#
Int# is an unboxed type. Fields of unboxed types are always strict.
Forcing values that are already in WHNF has no effect e.g.
-- Has no effect and thus potentially confuses readers: h x y = return $! (x, y)
Guideline: force data types with strict fields (including e.g.
Int) to WHNF using
$! before passing them to
- This behavior is not specific to the state monad. Any monad where
returnis lazy in its argument would behave the same.