Monday, April 15, 2013

Haskell.org GSoC ideas

This year's Google Summer of Code is upon us. Every year I try to come up with a list of projects that I'd like see done. Here's this year's list, in no particular order:

Better formatting support for Haddock

While adequate for basic API docs, Haddock leaves something to be desired when you want to write more than a paragraph or two.

For example, you can't use bold text for empahsis or have links with anchor text. Support for images or inline HTML (e.g. for creating tables) is similarly missing. All headings need to be defined in the export list which is both inconvenient and mixes up API organization with the use of headings to structure longer segments of text.

This project would try to improve the state of Haskell documentation by improving the Haddock markup language by either

  • adding features from Markdown to the Haddock markup language, or
  • adding a new markup language that is a superset of Markdown to Haddock.

Why Markdown? Markdown is what most of the programming-related part of the web (e.g. GitHub, StackOverflow) has standardized on as a human-friendly markup language. The reason Markdown works so well is that it codifies the current practice, already used and improved over time in e.g. mailing list discussions, instead of inventing a brand new language.

Option 1: Add Markdown as an alternative markup language

This option would let users opt-in to use (a super set of) Markdown instead of the current Haddock markup by putting a

{-# HADDOCK Markdown #-}

pragma on top of the source file. The language would be a superset as we'd still want to support single-quoted strings to hyperlink identifiers, etc.

This option might run into difficulties with the C preprocessor, which also uses # for its markup. Part of the project would involve thinking about that problem and more generally the implications of using Markdown in Haddock.

Option 2: Add features from Markdown to Haddock

This option is slightly less ambitious in that we'd be adding a few select features from Markdown to Haddock, for example support for bold text and anchor texts. Since we're not trying to support all of Markdown the issue with the C preprocessor could be solved by not supporting #-style headings while still supporting *bold* for bold text.

More parallelism in Cabal builds

Builds could always be faster. There are a few more things that Cabal could build in parallel:

  • Each component (e.g. tests) could be built in parallel, while taking dependencies into account (i.e. from executables to the library).
  • Profiling and non-profiling versions could be built in parallel, making it much cheaper to always enable profiling by default in ~/.cabal/config.
  • Individual modules could be built in parallel.

The last option gives the most parallelism but is also the hardest to implement. It requires that we have correct dependency information (which we could get from ghc -M) and even then compiling individual modules using ghc -c is up to 2x slower compared to compiling the same modules with ghc --make. Still, it could be a win for anyone with >2 CPU cores and it would support building e.g. profiling libraries in parallel without much (or any) extra work.

6 comments:

  1. If anyone does decided to do individual modules in parallel I would suggest building on top of Shake, which already compiles individual modules in parallel. However, I would warn you that I've seen > 2x slower. If you are building one project you need about 4 CPU to outperform (since you don't generally get a consistent parallelism level of 4), or if you are building multiple projects (where you can get better parallelism) then 3 CPU is enough.

    ReplyDelete
  2. why is `ghc -c` slower?

    ReplyDelete
    Replies
    1. ghc --make can cache the parsed .hi files to avoid reparsing for each .hs file.

      Delete
  3. how about giving the final polish to the key xml and exception libraries?
    xml was left just 1inch away from being done and xml is v popular in large companies

    imo if we want haskell to become popular in large companies we need to give them the right tools to play with..

    hxt for example
    - needs to have a full schema implementation (has a large % of it now)
    - needs to have some clear examples on how to use it
    etc
    ---
    exception handling:
    - some good examples on how to use them for large projects, not just the hello world with just and nothing
    - mixing different exc handling paradigms (io exc with either, with custom with...)

    ---

    providing good quality, robust tools is the first sign of a mature language, that will be inviting to newcomers and large companies alike by addressing their standard risk-aversion upfront and helping them make the first steps towards haskell enlightenment ;)

    thanks for organizing this

    ReplyDelete
    Replies
    1. I believe this to be endemic in the Haskell community. Very few haskell projects are released with full documentation let alone usage examples or tutorials. There are notable exceptions to this rule, but on the whole, documentation is not haskell's strong point.

      -----------------------------------------------

      Case Study: Yesod/Persistent

      I'll give snoyberg a partial pass here because the library is still being reworked heavily, but the "Yesod Book" is woefully inadequate where it is not downright outdated. Reading the source code for persistent is hit-and miss.

      * Some functions are heavily documented, but a lot of the important ones are not documented at all. (can anyone tell me why a "CREATe TABLE" statement is *required* to have a lowercase e? http://hackage.haskell.org/packages/archive/persistent-postgresql/latest/doc/html/src/Database-Persist-Postgresql.html#line-293

      * The Book focusses on usage patterns, and not on underlying mechanics or documenting of public functions. As soon as you stray from the beaten path there is no way to find your way back, other than the IRC channel. Compiler Error Driven Programming is not my cup of tea.

      * The example project comes with lots of docs on what is being setup where, but not why. Is it even possible to serve static files under 2 different urls? So much magic is thrown in the mix, it feels like Django pre-0.9.
      (It may just be me but this code is incomprehensible to me http://hackage.haskell.org/packages/archive/persistent-template/latest/doc/html/src/Database-Persist-TH.html#mkMigrate)

      All of this is so frustrating to a new user like myself, but I know that as soon as I post this people will exclaim "Just read the source code! It's easy to understand! "

      ------------------------------------------------------------------

      The "Gold Standard" for any software package or language is that both the Standard Libraries and the core set of packages (Haskell Platform) should be brain-dead obvious enough from the docs/tutorial that a moderately skilled programmer can pick up and run with the core feature set without having to consult anything but the docs/tutes. At the moment, I believe that the Haskell community is not living up to that standard.

      Delete
    2. fully agree with your point @anon

      that's exactly the reason i've asked Johan to help add some pro touches to a few of the keys libraries in haskell, that can be more useful for a large/key audience

      the issue is at a larger scale indeed, but let's start that 10000 mile trip with one step in the right direction

      ps. haskell community is great, most of the times not snobby, but i agree with you - for basic to intermediary things a newbie should be able to find his way quickly in order for the language to get traction.

      for that plenty of good examples will help
      (from that perspective i like the lyah book - the real world one is outdated - also requires some brushing :)

      --
      imo if we have a rock solid xml, exception handling (enterprise stuff)
      followed by web and database (web stuff) in a similar way we'll be in great shape

      Delete