Thursday, April 25, 2013

More haskell.org GSoC ideas

Here are another two haskell.org GSoC ideas.

Faster ghc -c building

The main obstacle to implementing completely parallel builds in Cabal is that calling ghc --make on a bunch of modules is much faster than calling ghc -c on each module.

Today, Cabal builds modules by passing them to ghc --make, but we'd like to create the module dependency graph ourselves and then call ghc -c on each module, in parallel. However, since ghc -c is so much slower than ghc --make, the user needs 2 (and sometimes even more) cores for parallel builds to pay off. If we could make ghc -c faster, we'd could write a parallel build system that actually gives good speed-ups on typical developer hardware.

The project would involve profiling ghc -c to figure out where the time is spent and then trying to improve performance. One possible source of inefficiency is reparsing all .hi files every time ghc -c is run.

Preferably the student should be at least vaguely familiar with the GHC source code.

Cabal dependency solver improvements

There's one shortcoming in the Cabal package dependency solver that is starting to bite more and more often, namely not treating the components in a .cabal file (i.e. libraries, executables, tests, and benchmarks) as separate entities for the purpose of dependency resolution. In practice this means that for many core libraries this fails:

cabal install --only-dependencies --enable-tests

but this succeeds:

cabal install <manually compiled list of test dependencies>
cabal install --only-dependencies

The reason is that the Library defined in the package is a dependency of the test framework (i.e. the test-framework package), creating a dependency cycle involving the library itself and the test framework. However, if the test dependency is expressed as:

library foo
  ...

test-suite my-tests
  -- No direct dependency on the library:
  hs-source-dirs: . tests

the dependency solver could find a solution, as the test suite no longer depends on the library, but it doesn't today.

The project would involve fixing the solver to treat each component separately (i.e. as if it was a separate package) for the purpose of dependency resolution.

For an example of this problem see the hashable package's .cabal file. In this case the dependency cycle involves hashable and Criterion.

2 comments:

  1. A lightweight idea -- not a SoC on its own, but very useful: `cabal init --scaffold`. Sets up not only your cabal site, but a basic directory structure, including test and benchmarking stubs, etc.

    ReplyDelete
  2. (a bit tangent) Having 'ghc --make +RTS -N${k}' to paralle properly would be a nice thing as well.

    ReplyDelete