Monday, August 22, 2011

Results from the State of Haskell, 2011 Survey

This is the second year I ran the State of Haskell survey. Like last year, the goal of the survey is to figure out

  • where Haskell programmers "come from",
  • in which domains Haskell is being used,
  • to what extent Haskell is used commercially, and
  • what the major weakness in the Haskell ecosystem are.

Like last year I ran the survey for one week. I got 798 responses (compared to 804 last year). The survey was announced on reddit, the major Haskell mailing lists, my Twitter account, and on this blog.

Here are the results, with some commentary by me. You can find links to the raw data at the very end of this post.

You might want to open the results from last year's survey to compare with the results below.

How long have you been using Haskell?

The proportion of people who have used Haskell for less than a year is a bit smaller compared to last year, suggesting that adoption rate has slowed down somewhat.

How would you characterize your use of Haskell today?

Note that respondents could choose more than one option here, so results add up to more than 100%.

The number of people who use Haskell at work has gone up from 25% to 32%, which I'm very happy to see.

What is the status of Haskell in your workplace?

The number of people who work in places where Haskell is endorsed increased, which matches what we saw in the last question. 17% use Haskell in production applications.

In which domain(s) are you using Haskell?

Note that respondents could choose more than one option here, so results add up to more than 100%.

The proportion that use Haskell for web development rose from 23% to 32%, most likely because we now have two quality web frameworks: Snap and Yesod.

Areas where Haskell has traditionally been strong, like compilers and math, are still strong.

Last year I wished for more libraries for Big Data processing (e.g. MapReduce). I still haven't seen much in this area, except for a paper on implementing the Erlang programming model as a Haskell library. We ought to be able to write a library with a Par monad for distributed parallel algorithms.

Which environment(s) do you use to work with Haskell?

Note that respondents could choose more than one option here, so results add up to more than 100%.

Not much changed here. Emacs and vi are still the most commonly used environments by far. The Haskell specific environments still haven't gained much traction.

What language did you use just prior to adopting Haskell – or, if Haskell is not your primary language now, what is that primary language?

Note that respondents could choose more than one option here, so results add up to more than 100%.

Unlike last year, you could select multiple languages. I now realize that I should have reworded the question to better reflect that. We cannot directly compare the results to last year's as secondary languages (e.g. if you primarily use Java but also sometimes JavaScript) will show up in this year's results.

Unsurprisingly, big languages like C, C++, and Java show up on the top. Like last year, Python is also very popular, but perhaps more so than you'd expect given the number of Python users (compared to e.g. the number of Java users). I interpret that as users of more modern languages (like Python) are more likely to adopt other modern languages (like Haskell).

If Haskell disappeared tomorrow, what language(s) might you use as a "replacement"?

Note that respondents could choose more than one option here, so results add up to more than 100%.

Clojure lost a bit of ground compare to last year, from 25% to 17%. Scala gained some ground (4%).

Hackage

The open-ended section last year was dominated by comments on Hackage, libraries, and performance, so this year I added a number of questions on these topics to get some more quantitative results.

Asking users to rank something on a scale (e.g. from 1 to 5) is tricky. For example, is 3.7 a high or low score? It's hard to say without having something to compare to. We should be able to better analyze the answers next year when we have two years worth of data points. Then we'd at least be able to say if we're improving or not.

Scale: 1 - poor, 5 - excellent

Number of libraries

Mean: 3.95

Users seem happy with the number of libraries on Hackage (there are over two thousand).

Overall quality of libraries

Mean: 3.45

Users are also quite happy with the overall quality of libraries, but see the separate library section for a breakdown.

Ease of finding a library for a given task

Mean: 3.24

It's somewhat difficult to find the right library for a task. We could use a better package search engine and some kind of recommendation system on Hackage.

Ease of judging the quality of a library

Mean: 2.54

While they do exist, it's hard to find high quality libraries on Hackage. There's no ranking whatsoever. A recommendation system, using social signals such as number of downloads, number of libraries depending on a given library, test coverage, documentation completeness, etc would help here.

Likelihood that a library will build on your machine

Mean: 3.48

I suspect that this number is pulled down by Windows users, who have a harder time building packages as the package developers more often use some Unix variant. Having Windows build bots might help here.

Personally I still have problems building Gtk2Hs. I wish there was a Wiki describing all the steps (with cut-n-paste instructions) for installing Gtk2Hs on each platform.

Libraries

This section provides a deep-dive into library quality issues.

Documentation

Mean: 2.97

Many libraries on Hackage have no documentation at all. Personally I tend to just ignore such libraries. If a library lacks documentation I start to wonder if it also lacks tests, if someone gave performance any thought, and so on. It might be a great library, but I will never find out because the lack of documentation makes me look elsewhere.

Haddock has recently started outputting documentation coverage reports when building packages. Perhaps this will encourage people to write more documentation.

Perhaps we could introduce a badge system on Hackage where packages that have 100% Haddock documentation coverage would sport a "documentation badge" on the package's Hackage page.

Test coverage

Mean: 2.94

With a few prominent exceptions, test coverage is poor to non-existent in most libraries.

Thomas Tuegel recently added testing support to Cabal. Making it easier to run tests should hopefully encourage people to write more of them. Test integration in Cabal also means that Hackage will eventually be able to run test suites automatically and publish test results.

Having a Cabal build bot plugins for e.g. Jenkins would make it easier to run continuous builds and thus get more out of your test suites. I use Jenkins a lot and it e.g. helps me make sure that my packages don't break on e.g. older versions of GHC.

Performance

Mean: 3.50

Users are mostly content with the performance of Hackage libraries. I think this is a testament to how good GHC is: you can get good performance without paying any attention to performance.

I still think we need to work on the performance of our libraries, especially core libraries for e.g. data structures, talking to databases, running web servers, etc. If you get performance right at the lower level, you don't have to think too much about it when writing your applications.

Integration with other libraries

Mean: 3.14

We could make libraries fit together better:

  • APIs are still a bit inconsistent,
  • we still don't program against interfaces enough (e.g. there are no type classes unifying different container implementations),
  • we have two Unicode string types (String and Text),
  • we don't use qualified naming everywhere (but instead use ad-hoc identifier prefixes/suffixes),

and so on.

The Haskell Platform is one attempt to address this, by giving us a vehicle for making coordinated changes, but progress has been slow.

Personally I've felt that the (now) old libraries process slowed us down; it's hard to do anything by consensus in a large, diverse community. Even if you eventually reach consensus you have spent more time than it's worth making whatever (simple) change you intended to make.

Cross platform compatibility

Mean: 3.41

Many libraries still don't build on Windows. We need people who use Windows to help out to make sure they do. As I mentioned earlier, having build bots would help here.

API stability

Mean: 3.26

As a language community I think we're still figuring stuff out. We're still experimenting with different programming models (e.g. iteratee I/O) and it will take a while until we settle on some best practices for writing APIs.

That being said, there are some good libraries that show the way. To name a few: bytestring, text, mysql-simple, and binary. For example, the latter two show how to create APIs that marshal Haskell values to/from byte strings, in different circumstances.

Ease of use

Mean: 3.32

Not a great score, probably related to the lack of documentation. I recommend that anyone who designs APIs for others to use should watch Simplicity Ain't Easy.

Reasoning about performance

You often hear that it's hard to reason about performance in Haskell so I asked two questions related to that.

Reasoning about the performance of Haskell programs is...

Scale: 1 - easy, 5 - hard

Mean: 3.47

So people do find it difficult to reason about performance of their programs. Curiously, they do find the performance of the packages they used to be good (see earlier question about libraries). Perhaps this can be interpreted as people only rarely run into performance problems, but when they do they're not sure how to tackle them.

It isn't terribly difficult to reason about performance in Haskell (there are quite a few people who know how to) once you're taught a few basic concepts and techniques, but we do a poor job of teaching people. In fact, we typically don't educate people in how to reason about performance at all!

I've started thinking about writing a medium sized tutorial, perhaps 60 pages or so, covering everything you need to know to be able to write production quality Haskell code. Perhaps I can find some time after my move.

What would help you most when reasoning about the performance of your Haskell programs?

Note that respondents could choose more than one option here, so results add up to more than 100%.

I wasn't sure if I would get anything useful out of this question. It's a bit like asking people what kind of free stuff they'd like. However, there are some relative differences between the different options. For example, the results show that people prefer profilers to lint tools.

We need to better document the strictness properties of our APIs and document performance considerations and gotchas in general. The Haddock documentation for some packages already document such things in the introduction section of the module documentation.

We could also use some teaching material on the issue that we could point to.

What do you think is Haskell's most glaring weakness / blind spot / problem?

I didn't include a "general comments" section this year. A few people felt that only focusing on weaknesses was a bit negative. I'll reintroduce the section next year. Feel free to share any others thoughts you might have in the comments section of the blog.

The list of weaknesses was a bit more diverse than last year, perhaps due to breaking out library and performance issues into separate questions. This is a good thing. It means that we don't have any huge blindspots in our ecosystem.

Here's a sample of topics that came up:

  • Lack of GUI libraries.
  • Frustration with lack of (visible) progress on Hackage.
  • The learning curve.
  • Lack of more comprehensive documentation for libraries (i.e. beyond simple reference documentation).
  • Difficulties in reasoning about laziness.

Closing thoughts

I'd like to thank everyone who took the time to take the survey. Hopefully we can use the results to guide future infrastructure work in the community.

Raw data

All the source data is available in a spreadsheet in Google Docs or as an HTML table export from that spreadsheet.

24 comments:

  1. Please use more colors.

    ReplyDelete
  2. Hmm. As a newcomer to Haskell I was indeed surprised by the paucity of GUI libraries, and more importantly how nobody seems to care about this. Obviously I don't know enough to suggest this seriously, but is anyone looking at wrapping the new Qt Quick on a high level? It's advertised as being declarative and so should be easier to construct high-level bindings to Haskell for than other toolkits. Qt Quick is mainly used for mobile platforms right now but soon there is going to be a reimplementation of the standard desktop-application widgets in QML (see this page ).

    ReplyDelete
  3. Thank you for running the survey again, Johan. It was very enlightening.

    ReplyDelete
  4. Do you have any /concrete/ suggestions on how Windows users can "do more" to improve the situation with Windows?

    As far as I can tell, the present situation is that the vast majority of the Haskell community just /assumes/ that everybody uses Unix. The Cabal Wiki used to suggest using Makefiles, for example.

    The other big, big thing is that on Windows, Cabal is seemingly incapable of figuring out where external libraries are installed. If you go to Linux, install the zlib developer headers and then try to build the Haskell zlib library, it just works. On Windows, Cabal has no idea where the heck to look for the headers.

    ReplyDelete
  5. Yes, help maintain libraries. For example, Windows users are responsible for the majority of time I spend fixing bugs in the network package, but I've yet to find one who's willing to help me test/maintain it from a Windows perspective.

    The other thing you can do is to install Jenkins and set up continuous builds for important packages. This will help the rest of us avoid breaking these packages by misstake.

    ReplyDelete
  6. Could you post another graph for the question:

    "What language did you use just prior to adopting Haskell – or, if Haskell is not your primary language now, what is that primary language?"

    which corrects for relative sizes of user bases? I would be very interested in this.

    Also it may be interesting to do the same for the following question:

    "If Haskell disappeared tomorrow, what language(s) might you use as a "replacement"?"

    If you (or anyone else) can suggest alternate ways of correcting for this I would like to hear it but I imagine that tiobe (http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html) would be the best way to go. Thanks for your great work!

    ReplyDelete
  7. Regarding Windows builds, I've recently spent some time configuring an EC2-based Windows slave for wxHaskell (detailed instructions are here). I'll be happy to help set up similar slaves for other Haskell projects. Please get in touch (maciek dot makowski at gmail) if you need a hand.

    ReplyDelete
  8. I'm pretty new to Haskell but I'd be keen to sign up for something like the Linux kernel Janitor's project for some of the Hackage libraries as a learning exercise - writing docs and tests etc..., up to writing some benchmarks and having a crack at some optimizations as I get better at that.

    It'd be nice to have some hints about which libraries need the help and some feedback on whether I'm doing the right thing - other than that I can probably look after myself for the most part.

    ReplyDelete
  9. xander,

    Unfortunately I don't have time at the moment. Feel free to grab the raw data, create the graphs and post them online somewhere!

    ReplyDelete
  10. Maciek,

    EC2 sounds like an interesting option. Do you keep an instance up at all times? Perhaps it would be a worthwhile investment for haskell.org.

    ReplyDelete
  11. Anonymous,

    Take a library from the Haskell platform that lacks tests/docs and drop the maintainer an email saying that you're interested in helping out. I'm sure he/she will appreciate the help and give you some pointers.

    ReplyDelete
  12. Johan,

    No, only the central server (master) needs to be up all the time, the bot is only started when there is something to build and is terminated once complete. This makes sense if the server is dedicated to building just one project. For a single server that builds many different projects other options might be more suitable.

    ReplyDelete
  13. Windows support is always painful because it lacks standard locations for external libraries and doesn't come with the tools (make, autoconf, etc) required to build many cross platform projects.

    The best solution I've found is to embed C libraries inside the Haskell bindings so that 'cabal install' handles everything. This can work well for simple projects that don't need much configuration as GHC comes with healthy subset of the MSYS/MinGW environment. You come unstuck however for complex projects that really need to run 'configure' and then 'make' from a Makefile.

    Ruby handles this pretty well. On Windows they have RubyDevKit (http://rubyinstaller.org/add-ons/devkit/) for building ruby gems which require C/C++. We're half way there, as GHC already includes a lot of what is available in RubyDevKit.

    ReplyDelete
  14. "Personally I've felt that the (now) old libraries process slowed us down; it's hard to do anything by consensus in a large, diverse community. Even if you eventually reach consensus you have spent more time than it's worth making whatever (simple) change you intended to make."

    I agree with you. Maybe a workaround would be for the community to reach consensus on large (time consuming changes) to even out the community and development costs?

    ReplyDelete
  15. Johan Tibell, can you provide steps to setup jenkins? I have virtual machine with win7 with jenkins.

    ReplyDelete
  16. Sergey,

    I've never set up Jenkins on Windows so I don't know how. On Ubuntu you basically just run sudo apt-get install jenkins and then go to the web interface to configure it (e.g. set up user accounts, SMTP server). What I'd really like to get working is for my Linux Jenkins server to talk to a Windows Jenkins build slave so it can start builds on Windows.

    The problem I had last time I tried was that the shell script that describes how to build a package (e.g. a few lines of cabal invocations) didn't run properly on the Windows machine due to environment problems. Since I didn't have root access on the Windows machine I couldn't fix the issues.

    ReplyDelete
  17. please, for the love of all things edward tufte or junkcharts, stay the hell away from the damned pie charts!!!!!!!!!!!!

    ReplyDelete
  18. Johan, no-no-no. I'm already have jenkins (for build our product), but I don't know what I should setup for properly test ghc (haskell platform?, ghc from source? etc). If you describe you setup on Ubuntu I will try to reproduce it on Windows machine.

    ReplyDelete
  19. Sergey,

    Here's my Jenkins setup (which lives at http://ci.johantibell.com/):

    I installed several versions of GHC using the binary installers for Linux. I made sure they are on the path of my jenkins user. I build using a shell script. Here's the shell script the network package:

    cabal install -w $compiler --only-dependencies --enable-tests
    cabal clean
    autoreconf
    cabal configure -w $compiler --enable-tests
    cabal build
    cabal test
    cabal sdist

    The $compiler will be replaced by e.g. ghc-7.0.3, as network is built as a matrix project.

    Setting up a Windows build slave should require that the Jenkins job itself is defined on the Windows machine, just that the Jenkins master can e.g. ssh to the machine and find the different versions of GHC on the path.

    ReplyDelete
  20. Biggest weaknesses:

    1) difficulty of debugging

    2) no record system

    For laziness reasoning it would be nice to have "snapshot :: a -> IO (Snapshot a)" where Snapshot a is something that captures the actual state of its arg (i.e. so you can observe whether it is evaluated), something like deepseq except it doesn't overwrite any thunks, it just figures out where all of them are.

    ReplyDelete
  21. Also helpful and simple would be supporting multiple modules in a single source file. That allows having something like private member variables for a type by encapsulating them in modules, without having to spew 100's of separate files (one for each module) a la Java.

    ReplyDelete
  22. For me, the biggest problem with Hackage libraries is that many of them are contaminated with GPL. Please, use BSD or MIT for libraries.

    Haskore is one of the most annoying examples. What a waste of great code!
    The non-adoption of Yi could probably be explained by GPL as well.

    ReplyDelete
  23. Will you be doing this for 2012? I'd love to see the series continue!

    ReplyDelete
    Replies
    1. It's a lot of work so I wasn't planning to. If someone would take it over that would be great!

      Delete