Wednesday, March 14, 2012

The Cabal of my dreams

This post outlines changes I think we should make to the Cabal build infrastructure, in order for it to stay relevant to Haskell developers. This post is intentionally high level and uses strong statements; I'm trying to establish a direction to go in, not discuss implementation details. I have a huge amount of respect and appreciation for the Cabal developers; this post is in no way intended as a criticism of them or their work.

Here's what I expect of a build system:

Hermetic builds

Builds should be completely independent. The build system must behave as if all artifacts were rebuilt every time. If any artifacts are shared, they must be both immutable and uniquely determined by the inputs used to create them (e.g. compiler flags, source files, dependencies used, etc.)

If artifacts can't be safely shared, you're better of rebuilding every time than trying to share them anyway.

Parallel builds

Real world programs are too large to be built sequentially. Parallel builds are a necessary feature; soon distributed builds will be as well.

Changes needed to Cabal

cabal build means "build my package"

If I'm in a source directory and type cabal build, I want Cabal to build my package, not tell me about steps I could take to build my package. cabal build should

  • configure the package, even if I haven't configured the package before,
  • build any dependencies and store them locally in the source tree (optionally sharing them with other builds if that can be done safely), and
  • build the package.

Aside: configure very much feels like an unnecessary step. It would be better if build took all the flags configure takes today, so we never have to run configure again. This is after all what cabal install does today.

cabal test means "test my package"

If I'm in a source directory and type cabal test, I want Cabal to run my test suites. It is understood that to run the test suites they first need to be built, so go build and run them already!

I don't know what cabal install means

At the end of the day we build code to produce executables. When I type cabal install <library> I'm doing busy work that the build system should do for me, manually telling it about dependencies it might need to install later. cabal install should, except perhaps in the case of executables, be replaced by cabal build as described above.

The only reason I can see for running cabal install <library> manually is to save some compilation time in the future or to make sure the packages exists locally e.g. before getting on a plane.

No package is an island

We frequently have to work on multiple dependent packages at once and thus we also need a way to build them together, without registering the by-products in some shared package database. For example, given this directory structure

src
  |-- foo
  |  | -- foo.cabal
  |-- bar
     | -- bar.cabal

where foo is a package that depends on bar, I should be able to do

cd src
cabal build foo

and that better traverse the directory structure, find, and build dependencies and use them when compiling foo.

Aside: This could be implemented in a few different, more or less explicit, ways e.g. by specifying the directories used to search for dependencies explicitly, as in

cd src/foo
cabal build --root-dir=~/src

Parallel builds

We're almost there. Parallel builds were developed by a GSoC student last year. We need to get the patches merged into Cabal.

Building specific components within a package

Not as important as the above changes, but having cabal build build all components in a single package starts getting annoying when you have several of them. For example, if you want to run a single test suite, but the package defines several, you end up unnecessarily linking binaries you won't run.

Proposal, take the component to build as an extra, optional argument to build. For example,

cd src/foo
cabal build some-executable

would only build some-executable. Combine that with multi-package builds and we can use

cd src
cabal build foo:some-component

to build a specific component in a specific package in a source tree of multiple packages.

23 comments:

  1. Upvote!
    I'd be up for helping out this project, though I don't know if I have enough slack in my work stack to actually be able to follow through at the moment. But I absolutely agree that cabal should evolve to being more of a full "proper" build tool / package manager

    ReplyDelete
  2. WRT I don't know what cabal install means:

    Any multi-language program is not going to be buildable with cabal. Also, in the case of mine, the whole point is to build a library *and* an executable. So as far as I can see, I'll always need 'cabal install lib' to pull in the necessary haskell dependencies.

    ReplyDelete
  3. Sounds great... but another problem to address is the need to build some packages in specific ways. For example, for gloss-web, I need to build gloss with --with-ghc-options=-XTrustworthy. Currently, I just do that... but if you eliminate 'cabal install' for libraries and/or do the hermetic build thing, then I need a new mechanism to indicate that Cabal should make sure gloss is built with that option. Doing this right would be better than the status quo, but doing it wrong would mean actually making things worse.

    ReplyDelete
  4. Evan,

    Note that cabal build will build your library just fine, leaving you with a .a file you can link into your bigger project. Are you sure you really want to install that .a somewhere using Cabal?

    Tacking a step back: in my mind Cabal will never replace the distro's package manager and thus multi-language projects need to be built using the distro package manager's tools or some other tool suited for such a task.

    ReplyDelete
  5. Chris,

    we'll definitely need a way to specify package flags for dependencies as well e.g. --flags=gloss-web="--ghc-options=-XTrustWorthy". Long term I'd like to move away from per package flags as much as possible.

    ReplyDelete
  6. Actually cabal won't build my library, because it involves both haskell and C++. The dependencies go across languages, e.g. hsc2hs needs to be rerun when the depended upon headers change. And of course hs files that depend on cc files have to rebuild the cc files when necessary.

    I don't have a .a file, and I don't want to copy it somewhere, the .o files get linked into the binary at compile time, and again dynamically at runtime for plugins.

    ReplyDelete
  7. Evan,

    In that case make or some other language agnostic build system is likely a better fit for you. The Cabal build system is still quite far away from being language agnostic. Perhaps if we merge in some ideas from Shake.

    ReplyDelete
    Replies
    1. cabal with explicit setup.hs dependencies would help a lot here.

      We can already easily incorporate autotools builds into a cabal build; but I don't think that functionality has much business being an internal component to cabal/cabal-install. Extensions like that are *very* handy, but I think they should be extensions to a lean core tool.

      Cabal just dosen't make it easy to create complex new features for a specific build process (or class of build processes...).

      Delete
  8. I've been really busy during the last two months and wasn't able to work on the parallel patches, but I'm still committed to merging them in.

    ReplyDelete
  9. +1

    In addition, cabal should support multiple remote-repo.
    Not only, we can mirror the official hackage. If the hackage.org is down, the whole haskell community *freeze*. But also, anyone can do overlay private remote-repo with official hackage.

    ReplyDelete
  10. Johan,

    I think part of the problem we have now has to do with Cabal's design. It tries to be in charge of everything. On top of that we have issues like Evan points out, which boils down to: Haskell's FFI support means that you want to build Haskell plus the transitive closure of whatever the FFI supports.

    Since we've designed Cabal to be in charge of this, we need Cabal to be able to build the transitive closure of whatever the FFI supports!

    One way to change this is to add a mode of operation to cabal so that other build systems can ask cabal for information or to do things. For example, Evan could use a makefile to orchestrate the build, even the build of the haskell bits, if cabal could read in the .cabal file, calculate which packages to use, and then return the ghc invocation lines. Those lines could then be issued from make in the context of the full build process.

    Another issue with cabal is that it only knows about what the cabal team lets it know about. Thinking in terms of preprocessors: I've run into build issues where hsc2hs had a feature to overcome the issue but cabal didn't support that hsc2hs feature simply because no one bothered to add it yet. Unfortunately, even if I added that feature and got it accepted it would be quite some time before I could expect anyone else to have that feature in their cabal.

    There are other cases other than preprocessors where I really want to define my own behaviors for cabal and be able to share them with other projects and developers. I really wish I could add support for this or that as a 'cabal plugin' and upload it to hackage.

    Imagine the way git works, 'git foo' invokes 'foo' which is allowed to be a separate command/script/program. We could have the same convention for cabal. Imagine if you could create cabal plugins (and we could teach cabal how to build them and install them as needed) this way. Then all the crazy gtk2hs build tools could be cabal plugins.

    Other examples, cabal-dev and cab could both be plugins to cabal instead of a separate tools. Same goes for the other cabal tools that are on hackage now.

    As functional programmers, the control that cabal has should feel inverted (and unnatural?) to us.

    ReplyDelete
  11. dagitj,

    I think I agree with more or less everything you say and I hope we'll get to those more tricky issues soon. My proposals are a bit more modest, but should build up some of the infrastructure we will need whether we support multi-language projects, different build systems, etc.

    ReplyDelete
  12. For the build-bot project I'm working on, we're currently writing some kind of "local hackage" service, that may at some point be relevant for a few of your ideas.

    Also, we've been discussing with Duncan about, at some point, factoring out many of the features cabal-install has in a separate library to be able to reuse them in the build-bot project and for the new shiny hackage-server. Maybe that would be the best time for making some changes so that cabal-install can support some of the ideas described in the post or in the comments.

    ReplyDelete
  13. I completely agree with what you say about "hermetic builds". That is not a problem with the entire cabal system; it is a problem with the cabal command, which always targets a "global" package database. (Even the user package database is "global" because it affects all projects.)

    So I use `cabal install` exactly once after each new GHC install: `cabal install cabal-dev`. After that, I only use cabal-dev.

    You are suggesting something intermediate between cabal and cabal-dev: conceptually separate package databases per project, like with cabal-dev, but smart enough to share artifacts between projects when it can be proven that it makes sense. That would be really nice.

    `cabal install [lib]` (or `cabal-dev install [lib]`) makes perfect sense; that's what I do all day long. A large system is composed of many libraries, and only a small proportion of them create executables. I don't want to be forced to link my library into some other irrelevant package, just because it happens to create an executable, every time I compile the library I am working on. In order to test my library after I compile it, it usually does need to be "installed" in some package database every time I compile it, so `cabal build [lib]` would hardly be useful.

    ReplyDelete
  14. I have three use cases for `cabal install` which cannot be supported by `cabal build` because they are not in the context of a cabal project.

    (1) Installing an executable.
    (2) Testing whether a library compiles on my machine.
    (3) Installing a library to explore it in ghci.

    These use cases should continue to be supported by cabal.

    ReplyDelete
  15. Yitz, I don't quite follow your example in the last paragraph. If you're developing a library, wouldn't you follow these steps:

    1. Edit
    2. Compile (cabal build)
    3. Run unit/integration/system tests (cabal test, manual tests)
    4. Commit
    5. Goto 1

    ?

    Why does the library be installed to test it, unless your testing the packaging mechanism itself?

    ReplyDelete
    Replies
    1. Well, for step (3), when my project consists of many packages, it can be very complex and fiddly to get all of the cabal files exactly right. During development, that may not be what I want to spend my time on at that stage. I just want to run the tests. And as Tillman points out, I may want to bring up some specific combination of package versions in ghci.

      It's so simple conceptually just to create a fresh directory and `cabal-dev install` the exact packages I need at exactly the right versions. To me, at least, the meaning of "install" is absolutely clear and is exactly what I need. It means: make these specific package versions available as compiled libraries in the current sandbox. I can group together large groups of unpublished package versions in separate lightweight yackage servers, so that I can load them easily into any sandbox.

      If it's somehow possible to get that kind of clarity, simplicity, and control without sandboxes, that's fine.

      The only problems I see with the current cabal - assuming that everything is done within sandboxes - are that you need to rebuild a lot of artifacts in each sandbox (not so much of a problem for me in practice usually, but nice if it could be improved), and the lack of parallel builds. Your good ideas could definitely help with those.

      All of that is for the case where a team is working on a large number of their own packages as part of a large project. For the simple case where I am working on only one package of my own, but with a large number of dependencies on other people's packages, then the system you are describing sounds great. There I don't want to have to think about package versions any more than absolutely necessary.

      Delete
    2. cabal-dev install is very close to the cabal build I describe above. Shortcomings include cabal-dev add-source taking a snapshot of another package's source directory instead of creating a live "link" to it.

      Lets try to separate mechanism from goal. We're engineers so we tend to get them mixed up. Package databases are a mechanism, not a goal. The goals are the use cases we want to support (build the package in the current source tree, launch ghci with the current package and its dependencies in scope, etc.)

      Delete
  16. Tillmann,

    I think (1) is a good use of cabal install. It combines cabal build and 'cp dist/build/exe $PATH'. I tried to say as much in the post.

    Why would (2) installing a library be required to see if it compiles? Unpack the source code and run cabal build. If it builds, it builds! Install only adds a file copy (and ghc-register) call on top of that.

    (3) is useful, although I expect it to be largely done through cabal repl (built during last GSoC), which would let you use ghci with a project under development and even if invoking ghci requires running some preprocessors first (like hsc2hs.)

    ReplyDelete
    Replies
    1. In my use case (2), I actually want to see whether it would install, of course. Usually, if it builds, it also installs, but maybe some packages do extra steps in the install phase, what do I know. On the other hand, since there's no way to undo these extra steps, maybe I don't want to run them when I don't know yet whether I really want the package in the end.

      Maybe I actually just want `cabal test` to accept the name of a package on hackage. The semantics of `cabal test foo` would be something like the following:

      cabal unpack foo
      cd foo-1.2.3
      cabal configure --activate-testsuites-if-possible
      cabal build
      cabal test --do-not-complain-if-there-are-no-testsuites
      cd ..
      rm -r foo-1.2.3

      Delete
    2. (3) is the major reason I cabal(-dev) install manually things.
      Leiningen (Clojure build tool) works the way you describe: you need to have a configured project and run 'lein repl' in it so that you may dowload and toy with a new lib in the repl. So you end up creating blank projects with just one dependency _just to bring it into repl scope_.

      Okay, that is not so horrible, but still it's annoying when you want to just try some stuff without needing it immediately for a project.

      Delete
  17. > Package databases are a mechanism, not a goal. The goals are the use cases we want to support.

    Sure! My point is to bring up a use case which appears to be a bit different than the ones you were thinking of. Namely, a large project consisting of many packages, where there is a need for convenient fine-grained control over which combination of package versions get built together, not necessary as an executable, and without the hack of artificially narrowing the version ranges in many cabal files.

    I admit that I am biased, because this is the use case I deal with on a daily basis in my work. But I believe that support for this kind of use case is part of what makes the difference between Haskell being a serious industrial tool and not just a research topic and curious toy.

    ReplyDelete