Every year I put together a list of Google Summer of Code projects I'd like see students work on. Here's my list for this year.
As normal the focus is on existing infrastructure. I believe, and I think our experience in the past bears out, that such projects are more successful.
Improved Hackage login
The Hackage login/user system could use several improvements.
From a security perspective, we need to switch to a current best practice implementation of password storage, such as bcrypt, scrypt, or PBKDF2. MD5, which is what HTTP Digest Auth uses, has known attacks.
From a usability perspective, we need to move to a cookie-based login system. While using HTTP auth is convenient from an implementation perspective, it doesn't work well from a usability perspective (that's why sites that otherwise try to follow the REST approach don't use HTTP auth.) A cookie-based approach allows us to, among other things,
- display the current login status of the user,
- allow users to conveniently access a user preference page,
- allow users to log out, and
- adapt the UI to the current user.
An example of the latter would be to only show a link to the maintainer section for packages you maintain or show additional actions for the site admins. HTTP auth introduces an extra page transition if you want to move from a list to items to edit that list of items (e.g. you can edit uploaders on
/packages/uploaders/, you need to click on the link that takes you to
/packages/uploaders/edit.) This is because HTTP auth does authentication on a per HTTP request basis.
Other Hackage usability improvements
There are several other Hackage usability improvements I'd like to see.
The homepage is currently a write-up about the new Hackage server. While that made sense when the new Hackage server was brand new, a more useful homepage would include a list of recently updates packages, most popular packages, packages you maintain, and a link to "getting started" material and other documentation. Looking at other languages' package repo homepages for inspiration wouldn't be a bad start.
The search result page should include download counts and a more easily scannable result list. The current list is hard to read because the package descriptions don't line up. For example, compare the search result page for "xml" for Hackage and Ruby Gems.
Faster Cabal/GHC parallel builds
Mikhail Glushenkov and others have done a great job making our compiles faster. Cabal already builds packages in paralell and with GHC 7.8 it will build modules in parallel as well.
There are still more opportunities for parallelism. Cabal doesn't build individual components or different versions of the same component (e.g. vanilla and profiling) in parallel.
Building all the test suites in parallel would save time if you have many test suites and building vanilla and profiling versions at the same time would allow users to turn on profiling by default (in
~/.cabal/config) without paying (much of) a compile time penalty.
There's already some work underway here so there might not be enough Cabal work to last a student through the summer. The remaining time could be spent increasing the amount of parallism offered by
Today the parallel speed-up offered by
ghc -j is quite modest and I believe we ought to be able to increase it. If you exclude link times, if we had N independent modules of the same size we should get close to a N times parallel speed-up, which I don't think we do today. While real packages don't have this much available parallelism, improvements in the embarrasingly parallel case should help the average case.
Cabal file pretty-printer
If we had a Cabal file pretty printer, in the spirit of go-fmt for Go, we could more easily apply automatic rewrites to Cabal files. Having a formatter that applies a standard (i.e. normalizing) format to all files would make rewrites tools much simpler, as they wouldn't have to worry about preserving user formatting. Some tools that would benefit:
- cabal freeze, which will be included in Cabal-1.20
- cabal init
- A cabal version number bumper/PVP helper
I don't think such a pretty-printer should be terribly clever. Since Cabal files don't support pattern matching (like Haskell), aligning things doesn't really help readability much. Something simple like a 2 (or 4) space ident and starting each list of items on a new line below the item "header" ought to be enough. Here's an example:
name: Cabal version: 1.19.2 copyright: 2003-2006, Isaac Jones 2005-2011, Duncan Coutts license: BSD3 license-file: LICENSE author: Isaac Jones <email@example.com> Duncan Coutts <firstname.lastname@example.org> maintainer: email@example.com homepage: http://www.haskell.org/cabal/ bug-reports: https://github.com/haskell/cabal/issues synopsis: A framework for packaging Haskell software description: The Haskell Common Architecture for Building Applications and Libraries: a framework defining a common interface for authors to more easily build their Haskell applications in a portable way. . The Haskell Cabal is part of a larger infrastructure for distributing, organizing, and cataloging Haskell libraries and tools. category: Distribution cabal-version: >=1.10 build-type: Custom extra-source-files: README tests/README changelog source-repository head type: git location: https://github.com/haskell/cabal/ subdir: Cabal library build-depends: base >= 4 && < 5, deepseq >= 1.3 && < 1.4, filepath >= 1 && < 1.4, directory >= 1 && < 1.3, process >= 22.214.171.124 && < 1.3, time >= 1.1 && < 1.5, containers >= 0.1 && < 0.6, array >= 0.1 && < 0.6, pretty >= 1 && < 1.2, bytestring >= 0.9 if !os(windows) build-depends: unix >= 2.0 && < 2.8 ghc-options: -Wall -fno-ignore-asserts -fwarn-tabs