Thursday, April 25, 2013

More GSoC ideas

Here are another two GSoC ideas.

Faster ghc -c building

The main obstacle to implementing completely parallel builds in Cabal is that calling ghc --make on a bunch of modules is much faster than calling ghc -c on each module.

Today, Cabal builds modules by passing them to ghc --make, but we'd like to create the module dependency graph ourselves and then call ghc -c on each module, in parallel. However, since ghc -c is so much slower than ghc --make, the user needs 2 (and sometimes even more) cores for parallel builds to pay off. If we could make ghc -c faster, we'd could write a parallel build system that actually gives good speed-ups on typical developer hardware.

The project would involve profiling ghc -c to figure out where the time is spent and then trying to improve performance. One possible source of inefficiency is reparsing all .hi files every time ghc -c is run.

Preferably the student should be at least vaguely familiar with the GHC source code.

Cabal dependency solver improvements

There's one shortcoming in the Cabal package dependency solver that is starting to bite more and more often, namely not treating the components in a .cabal file (i.e. libraries, executables, tests, and benchmarks) as separate entities for the purpose of dependency resolution. In practice this means that for many core libraries this fails:

cabal install --only-dependencies --enable-tests

but this succeeds:

cabal install <manually compiled list of test dependencies>
cabal install --only-dependencies

The reason is that the Library defined in the package is a dependency of the test framework (i.e. the test-framework package), creating a dependency cycle involving the library itself and the test framework. However, if the test dependency is expressed as:

library foo

test-suite my-tests
  -- No direct dependency on the library:
  hs-source-dirs: . tests

the dependency solver could find a solution, as the test suite no longer depends on the library, but it doesn't today.

The project would involve fixing the solver to treat each component separately (i.e. as if it was a separate package) for the purpose of dependency resolution.

For an example of this problem see the hashable package's .cabal file. In this case the dependency cycle involves hashable and Criterion.

Monday, April 15, 2013 GSoC ideas

This year's Google Summer of Code is upon us. Every year I try to come up with a list of projects that I'd like see done. Here's this year's list, in no particular order:

Better formatting support for Haddock

While adequate for basic API docs, Haddock leaves something to be desired when you want to write more than a paragraph or two.

For example, you can't use bold text for empahsis or have links with anchor text. Support for images or inline HTML (e.g. for creating tables) is similarly missing. All headings need to be defined in the export list which is both inconvenient and mixes up API organization with the use of headings to structure longer segments of text.

This project would try to improve the state of Haskell documentation by improving the Haddock markup language by either

  • adding features from Markdown to the Haddock markup language, or
  • adding a new markup language that is a superset of Markdown to Haddock.

Why Markdown? Markdown is what most of the programming-related part of the web (e.g. GitHub, StackOverflow) has standardized on as a human-friendly markup language. The reason Markdown works so well is that it codifies the current practice, already used and improved over time in e.g. mailing list discussions, instead of inventing a brand new language.

Option 1: Add Markdown as an alternative markup language

This option would let users opt-in to use (a super set of) Markdown instead of the current Haddock markup by putting a

{-# HADDOCK Markdown #-}

pragma on top of the source file. The language would be a superset as we'd still want to support single-quoted strings to hyperlink identifiers, etc.

This option might run into difficulties with the C preprocessor, which also uses # for its markup. Part of the project would involve thinking about that problem and more generally the implications of using Markdown in Haddock.

Option 2: Add features from Markdown to Haddock

This option is slightly less ambitious in that we'd be adding a few select features from Markdown to Haddock, for example support for bold text and anchor texts. Since we're not trying to support all of Markdown the issue with the C preprocessor could be solved by not supporting #-style headings while still supporting *bold* for bold text.

More parallelism in Cabal builds

Builds could always be faster. There are a few more things that Cabal could build in parallel:

  • Each component (e.g. tests) could be built in parallel, while taking dependencies into account (i.e. from executables to the library).
  • Profiling and non-profiling versions could be built in parallel, making it much cheaper to always enable profiling by default in ~/.cabal/config.
  • Individual modules could be built in parallel.

The last option gives the most parallelism but is also the hardest to implement. It requires that we have correct dependency information (which we could get from ghc -M) and even then compiling individual modules using ghc -c is up to 2x slower compared to compiling the same modules with ghc --make. Still, it could be a win for anyone with >2 CPU cores and it would support building e.g. profiling libraries in parallel without much (or any) extra work.