I’m currently thinking about tidying up some of the data structures in my sound change applier. Basically, there are three major data structures currently:
- A type to contain the results of the initial surface parse
- A type to contain the AST after some simplification and transformations into more efficient forms
- A ‘regex’ type to contain the patterns that are being matched against
Now, what I’d like to do is merge the first two together. since they share a lot of common structure. I’ve been getting my head around GADTs to do this, but I’ve also been debating my decision to use the container package, and especially IntMaps and IntSets. I have a relatively small number of elements, so the question is: are they really that much better than lists from a performance point of view?
The answer appears to be yes. I created a simple benchmark using the Criterion package, and even for one element, membership testing for an IntMap seems to take about half the time that membership testing for an equivalent list. Here’s the numbers:
benchmarking looking up from set
collecting 100 samples, 636326 iterations each, in estimated 1.685638 s
bootstrapping with 100000 resamples
mean: 26.50939 ns, lb 26.50159 ns, ub 26.52257 ns, ci 0.950
std dev: 50.88050 ps, lb 34.82538 ps, ub 82.22622 ps, ci 0.950
found 16 outliers among 100 samples (16.0%)
11 (11.0%) high mild
5 (5.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers
benchmarking looking up from list
collecting 100 samples, 416158 iterations each, in estimated 1.685640 s
bootstrapping with 100000 resamples
mean: 40.59912 ns, lb 40.56189 ns, ub 40.66390 ns, ci 0.950
std dev: 244.8702 ps, lb 162.5805 ps, ub 356.1506 ps, ci 0.950
found 12 outliers among 100 samples (12.0%)
6 (6.0%) high mild
6 (6.0%) high severe
variance introduced by outliers: 0.990%
variance is unaffected by outliers
And this is something I really don’t understand, since intuitively it seems like they should be about the same in the single element case. But the results of my little experiments speak for themselves, so I guess I’m going to stick with using the containers package.
EDIT: I think this might be an indirection issue. [a] is a list of boxed values, therefore to test the element of a singleton list for equality the code will need to follow a pointer and unbox the value. In addition, to check that there are no more elements in the list, it will need to follow another pointer to check that the next value is [].
By contrast, intsets are, I think, strict in their elements, so the ints are probably inlined into the data type. That means that there is no pointer following to get the value of the first element for the equality test. Additionally, there might be more size information stored in the intset implementation, although I’m not sure about that.
Posted by chrisdb on 2011-01-27 and edited on 2011-02-03
Tags: haskell
Comments