Thursday, September 30, 2010

Slides from my High-Performance Haskell talk at CUFP

Here are my slides for tomorrow morning's High-Performance Haskell tutorial:

(There's a downloadable PDF version, too, if you find that an easier format to deal with.)

I've also posted a Git repository of the slide source code, in case anyone would like to use the slides for their own purposes.

Monday, September 27, 2010

Final version of our GHC I/O manager paper

Bryan O'Sullivan has done a nice job cleaning up our paper about the motivation, design, and internals of the new GHC I/O manager. The paper will be presented by Bryan at the Haskell Symposium on Thursday, 30th September, 2010.

The new GHC I/O manager is included in GHC 7.0, which currently in the release candidate stage.

Tuesday, September 7, 2010

Builder monoid for Data.Text

In case you missed it, the 0.8 release of the text package includes a builder monoid for creating Text values, similar to the one in the binary package. The builder monoid can efficiently glue together small chunks of Text or characters into a lazy Text value.

The monoid is useful for implementing e.g. templating languages that work with Unicode text and fulfills the same roles as e.g. StringBuffer in Java.

Read the API documentation.

The static argument transformation

After a weekend of hacking in Utrecht, Don Stewart and I have improved the performance of the containers package, especially when the key or element type can be unboxed (e.g. Int). The performance gains come from repeatedly applying a few code transformations. Today, I will describe one of these transformations: the static argument transformation.

The transformation works most of the time, but make sure you have some benchmarks ready so you can catch the cases where it makes things worse. I recommend using the excellent Criterion benchmarking package, as it takes care of many subtle issues in benchmarking.

The static argument transformation

The static argument transformation transforms a recursive function with one or more constant arguments into a non-recursive wrapper with a locally defined recursive worker. For example,

is transformed to

This transformation tends to improve performance for some functions, especially higher-order functions.

Higher-order functions

To understand why this transformation can improve the performance of higher-order functions, consider this definition of filter:

The actual predicate is unknown to GHC when it compiles the module where filter is defined. At run time, a pointer to the actual predicate is passed as an argument to filter. This hurts performance as calling a function indirectly through a pointer is less efficient than calling a known function.

If filter could be inlined at the call site, where the actual predicate is known, we could avoid the cost of this indirect call. However, GHC doesn’t inline recursive functions (as it doesn’t know when to stop inlining) and therefore cannot inline filter at call sites.

The static argument transformation can help here. By applying the transformation we get

Note how the constant argument p is no longer passed in each recursive call.

Since this definition is not recursive, GHC can inline it. Since GHC can inline it, GHC can also inline the predicate into the body. For example, given this call site

the code GHC generates looks something like this, after simplification:

The predicate is now known and the extra indirection can be avoided. In addition, GHC can inline the definition of even and avoid the overhead of that function application:

In order of decreasing importance, here are the reasons the transformation improves performance:

  • We no longer have to make an indirect function call via a pointer.
  • We don’t have to make a function call to even, as its definition has been inlined.
  • The closure that’s created by the expression go xs is slightly smalled than the closure created by filter p xs, which save a little bit of allocation.

Since inlining is so important for this transformation to work, you may sometimes need to add an explicit INLINE pragma:

In this particular case it’s not needed as GHC inlines filter without any hints from us.

When applying the transformation to a function, try both with and without the inline pragma and measure the difference using benchmarks.

Summary

The static argument transformation can remove performance penalties introduced by using abstractions such as higher-order functions.

GHC is getting better and better over time so expect some of these manual transformations to be applied automatically by the compiler some day. Until then you can use them to improve the performance of your code.

Next time we'll take a closer look at how to reason about evaluation order an how making function arguments strict can improve the performance of non-polymorphic functions as well.