Monday, August 22, 2011

Results from the State of Haskell, 2011 Survey

This is the second year I ran the State of Haskell survey. Like last year, the goal of the survey is to figure out

  • where Haskell programmers "come from",
  • in which domains Haskell is being used,
  • to what extent Haskell is used commercially, and
  • what the major weakness in the Haskell ecosystem are.

Like last year I ran the survey for one week. I got 798 responses (compared to 804 last year). The survey was announced on reddit, the major Haskell mailing lists, my Twitter account, and on this blog.

Here are the results, with some commentary by me. You can find links to the raw data at the very end of this post.

You might want to open the results from last year's survey to compare with the results below.

How long have you been using Haskell?

The proportion of people who have used Haskell for less than a year is a bit smaller compared to last year, suggesting that adoption rate has slowed down somewhat.

How would you characterize your use of Haskell today?

Note that respondents could choose more than one option here, so results add up to more than 100%.

The number of people who use Haskell at work has gone up from 25% to 32%, which I'm very happy to see.

What is the status of Haskell in your workplace?

The number of people who work in places where Haskell is endorsed increased, which matches what we saw in the last question. 17% use Haskell in production applications.

In which domain(s) are you using Haskell?

Note that respondents could choose more than one option here, so results add up to more than 100%.

The proportion that use Haskell for web development rose from 23% to 32%, most likely because we now have two quality web frameworks: Snap and Yesod.

Areas where Haskell has traditionally been strong, like compilers and math, are still strong.

Last year I wished for more libraries for Big Data processing (e.g. MapReduce). I still haven't seen much in this area, except for a paper on implementing the Erlang programming model as a Haskell library. We ought to be able to write a library with a Par monad for distributed parallel algorithms.

Which environment(s) do you use to work with Haskell?

Note that respondents could choose more than one option here, so results add up to more than 100%.

Not much changed here. Emacs and vi are still the most commonly used environments by far. The Haskell specific environments still haven't gained much traction.

What language did you use just prior to adopting Haskell – or, if Haskell is not your primary language now, what is that primary language?

Note that respondents could choose more than one option here, so results add up to more than 100%.

Unlike last year, you could select multiple languages. I now realize that I should have reworded the question to better reflect that. We cannot directly compare the results to last year's as secondary languages (e.g. if you primarily use Java but also sometimes JavaScript) will show up in this year's results.

Unsurprisingly, big languages like C, C++, and Java show up on the top. Like last year, Python is also very popular, but perhaps more so than you'd expect given the number of Python users (compared to e.g. the number of Java users). I interpret that as users of more modern languages (like Python) are more likely to adopt other modern languages (like Haskell).

If Haskell disappeared tomorrow, what language(s) might you use as a "replacement"?

Note that respondents could choose more than one option here, so results add up to more than 100%.

Clojure lost a bit of ground compare to last year, from 25% to 17%. Scala gained some ground (4%).

Hackage

The open-ended section last year was dominated by comments on Hackage, libraries, and performance, so this year I added a number of questions on these topics to get some more quantitative results.

Asking users to rank something on a scale (e.g. from 1 to 5) is tricky. For example, is 3.7 a high or low score? It's hard to say without having something to compare to. We should be able to better analyze the answers next year when we have two years worth of data points. Then we'd at least be able to say if we're improving or not.

Scale: 1 - poor, 5 - excellent

Number of libraries

Mean: 3.95

Users seem happy with the number of libraries on Hackage (there are over two thousand).

Overall quality of libraries

Mean: 3.45

Users are also quite happy with the overall quality of libraries, but see the separate library section for a breakdown.

Ease of finding a library for a given task

Mean: 3.24

It's somewhat difficult to find the right library for a task. We could use a better package search engine and some kind of recommendation system on Hackage.

Ease of judging the quality of a library

Mean: 2.54

While they do exist, it's hard to find high quality libraries on Hackage. There's no ranking whatsoever. A recommendation system, using social signals such as number of downloads, number of libraries depending on a given library, test coverage, documentation completeness, etc would help here.

Likelihood that a library will build on your machine

Mean: 3.48

I suspect that this number is pulled down by Windows users, who have a harder time building packages as the package developers more often use some Unix variant. Having Windows build bots might help here.

Personally I still have problems building Gtk2Hs. I wish there was a Wiki describing all the steps (with cut-n-paste instructions) for installing Gtk2Hs on each platform.

Libraries

This section provides a deep-dive into library quality issues.

Documentation

Mean: 2.97

Many libraries on Hackage have no documentation at all. Personally I tend to just ignore such libraries. If a library lacks documentation I start to wonder if it also lacks tests, if someone gave performance any thought, and so on. It might be a great library, but I will never find out because the lack of documentation makes me look elsewhere.

Haddock has recently started outputting documentation coverage reports when building packages. Perhaps this will encourage people to write more documentation.

Perhaps we could introduce a badge system on Hackage where packages that have 100% Haddock documentation coverage would sport a "documentation badge" on the package's Hackage page.

Test coverage

Mean: 2.94

With a few prominent exceptions, test coverage is poor to non-existent in most libraries.

Thomas Tuegel recently added testing support to Cabal. Making it easier to run tests should hopefully encourage people to write more of them. Test integration in Cabal also means that Hackage will eventually be able to run test suites automatically and publish test results.

Having a Cabal build bot plugins for e.g. Jenkins would make it easier to run continuous builds and thus get more out of your test suites. I use Jenkins a lot and it e.g. helps me make sure that my packages don't break on e.g. older versions of GHC.

Performance

Mean: 3.50

Users are mostly content with the performance of Hackage libraries. I think this is a testament to how good GHC is: you can get good performance without paying any attention to performance.

I still think we need to work on the performance of our libraries, especially core libraries for e.g. data structures, talking to databases, running web servers, etc. If you get performance right at the lower level, you don't have to think too much about it when writing your applications.

Integration with other libraries

Mean: 3.14

We could make libraries fit together better:

  • APIs are still a bit inconsistent,
  • we still don't program against interfaces enough (e.g. there are no type classes unifying different container implementations),
  • we have two Unicode string types (String and Text),
  • we don't use qualified naming everywhere (but instead use ad-hoc identifier prefixes/suffixes),

and so on.

The Haskell Platform is one attempt to address this, by giving us a vehicle for making coordinated changes, but progress has been slow.

Personally I've felt that the (now) old libraries process slowed us down; it's hard to do anything by consensus in a large, diverse community. Even if you eventually reach consensus you have spent more time than it's worth making whatever (simple) change you intended to make.

Cross platform compatibility

Mean: 3.41

Many libraries still don't build on Windows. We need people who use Windows to help out to make sure they do. As I mentioned earlier, having build bots would help here.

API stability

Mean: 3.26

As a language community I think we're still figuring stuff out. We're still experimenting with different programming models (e.g. iteratee I/O) and it will take a while until we settle on some best practices for writing APIs.

That being said, there are some good libraries that show the way. To name a few: bytestring, text, mysql-simple, and binary. For example, the latter two show how to create APIs that marshal Haskell values to/from byte strings, in different circumstances.

Ease of use

Mean: 3.32

Not a great score, probably related to the lack of documentation. I recommend that anyone who designs APIs for others to use should watch Simplicity Ain't Easy.

Reasoning about performance

You often hear that it's hard to reason about performance in Haskell so I asked two questions related to that.

Reasoning about the performance of Haskell programs is...

Scale: 1 - easy, 5 - hard

Mean: 3.47

So people do find it difficult to reason about performance of their programs. Curiously, they do find the performance of the packages they used to be good (see earlier question about libraries). Perhaps this can be interpreted as people only rarely run into performance problems, but when they do they're not sure how to tackle them.

It isn't terribly difficult to reason about performance in Haskell (there are quite a few people who know how to) once you're taught a few basic concepts and techniques, but we do a poor job of teaching people. In fact, we typically don't educate people in how to reason about performance at all!

I've started thinking about writing a medium sized tutorial, perhaps 60 pages or so, covering everything you need to know to be able to write production quality Haskell code. Perhaps I can find some time after my move.

What would help you most when reasoning about the performance of your Haskell programs?

Note that respondents could choose more than one option here, so results add up to more than 100%.

I wasn't sure if I would get anything useful out of this question. It's a bit like asking people what kind of free stuff they'd like. However, there are some relative differences between the different options. For example, the results show that people prefer profilers to lint tools.

We need to better document the strictness properties of our APIs and document performance considerations and gotchas in general. The Haddock documentation for some packages already document such things in the introduction section of the module documentation.

We could also use some teaching material on the issue that we could point to.

What do you think is Haskell's most glaring weakness / blind spot / problem?

I didn't include a "general comments" section this year. A few people felt that only focusing on weaknesses was a bit negative. I'll reintroduce the section next year. Feel free to share any others thoughts you might have in the comments section of the blog.

The list of weaknesses was a bit more diverse than last year, perhaps due to breaking out library and performance issues into separate questions. This is a good thing. It means that we don't have any huge blindspots in our ecosystem.

Here's a sample of topics that came up:

  • Lack of GUI libraries.
  • Frustration with lack of (visible) progress on Hackage.
  • The learning curve.
  • Lack of more comprehensive documentation for libraries (i.e. beyond simple reference documentation).
  • Difficulties in reasoning about laziness.

Closing thoughts

I'd like to thank everyone who took the time to take the survey. Hopefully we can use the results to guide future infrastructure work in the community.

Raw data

All the source data is available in a spreadsheet in Google Docs or as an HTML table export from that spreadsheet.