Tuesday, February 21, 2012

Wanted: an efficient union algorithm for hash-array mapped tries

There's really only one thing that's holding me back from releasing the new hash-array mapped trie (HAMT) based implementation of unordered-containers, a fast implementation of union.

The current released version is implemented in terms of a patricia trie (also known as a radix tree) and it uses a linear time union algorithm. The new version uses a naive implementation that simply calls insert repeatedly. This makes union 378% slower in the new implementation, on my main benchmark. Since I use HAMTs to implement HashSet in addition to HashMap, this is less than optimal.

My plan is to try to port the algorithm used on patricia tries to the HAMT implementation. I think this should be possible, due to both data structures being bit tries, but I haven't had time to try this myself. Any Haskellers out there who would like to try? If someone could implement a time complexity efficient version, I could then do the last bit of low-level optimization.

You can find the source of the HAMT-based implementation in the unordered-containers hamt branch on GitHub.

You'll also want to look at the union implementation used in IntMap.

The HAMT data structure itself is described in Ideal Hash Trees by Phil Bagwell. Note that we're using a persistent implementation, but most of the ideas (like removing empty nodes by using a bitmask) are still used.

Phil also told me that section 4.5 in A Generic Parallel Collection Framework might be useful in implementing union.

12 comments:

  1. Tempted. Will look into this tonight.

    ReplyDelete
  2. I've merged an implementation by Twan van Laarhoven, so we should be ready for a release soon!

    ReplyDelete
    Replies
    1. Turns out that the implementation (while being a nice improvement) isn't as fast as initially thought so if anyone wants to take a second look their might be room for improvement.

      Delete
    2. I'll have a look see later this week after I've sorted out some project deadlines i'm currently amidst.

      Delete
  3. I'm currently implementing HAMTs in Dart and I was wondering about the same thing which is how I ended up on your blog. Do you have news since your last update?

    ReplyDelete
    Replies
    1. We now have a good (although not beautiful) implementation. Check out the code here:
      https://github.com/tibbe/unordered-containers/blob/master/Data/HashMap/Base.hs#L525

      It's a bit long-winded given that we use quite a few different node types as that gives us better peformance.

      Delete
    2. Yes I've seen that, you have a combinatorial explosion :)

      In the dart version I only have Empty, Bitmask and Collision for now. I've tried to add Leaf but the speedup was negligible. I guess there's a bottleneck somewhere else, but there's no profiling for dart (yet?).

      Thanks for the heads up.

      Delete
    3. Leaf is not so much about performance as it is avoiding downcasts. In the Clojure implementation (which is written in Java) they use downcast to convert Objects into keys, values, or subtrees, depending on if some specific pointer is NULL or not. We could do the same in the Haskell implementation, but I've been hesitant to do that so far. It makes the implementation both more complicated and less type safe.

      Delete
    4. Weird, I didn't need any downcast, neither in my haskell reference implementation or in my dart implementation (I'm using double dispatch to encode pattern-matching on both arguments). I just use Collision with a singleton leaf as a list. You seem to imply that even in the haskell implementation, if you were getting rid of Leaf you'd need to perform an unsafe cast. Is that right? If so I probably missed some point in my implementation.

      Delete
    5. I should have given more context. The Clojure implementation uses (or at least used to use) a trick where the ArrayNode (their equivalent of my BitmapIndexed) stores either a pointer to a subtree, or the key-value pair. Essentially what they have is

      data HashMap k v = BitmapIndexed Bitmap (Array Any) | ...

      and if the value at index k is NULL, the value at index k + 1 points to a subtree. Otherwise, the value at index k points to the key and the value at index k + 1 points to the value. I could do the same, but then I would need a downcast from Any.

      I believe they called this design "leafless."

      If you could post your code somewhere, perhaps I could better understand what the difference between our implementation is.

      Delete
    6. Ah, I see. It's a really low level trick. Right now my implementation doesn't even use a bitmask, just a mutable map from ints to trees (because it's already muuuuch faster than a naive implementation of immutable maps and dart doesn't have native popcount, neither int32/64 nor exposed native arrays; plus I wanted to get something running quickly and replacing the map later is a local change), so I guess I'm not concerned by this optimisation yet. I'll opensource the code soon, I have to get the headers right and that kind of stuff.

      Delete
  4. This comment has been removed by the author.

    ReplyDelete