Lowering switch statements with hashing

Fri Jan 17 13:30:54 PST 2014

On Fri, Jan 17, 2014 at 06:09:43PM +0100, Jasper Neumann wrote:
> > (2) Use the CHM algorithm to produce an order preserving
> > hash function. If the backend supports two relocations on
> > the same address (one positive, one negative) OR the jump
> > table can be expressed as relative expressions, this has the
> > nice advantage of requiring only one additional memory
> > access.
> 
> I had also considered CHM. This is a method which can easily deal
> with really huge label sets since its runtime is O(n), and produces
> a quite small auxiliary table.
> However CHM produces a hash function that needs 2 table accesses
> whereas Jenkins' needs only one.
> CHM: h(x) = (table[h1(x)]+table[h2(x)]) mod size
> Jenkins: h(x) = h1(x) ^ table[h2(x)]
> This means that code generated with Jenkins' method ought to be a
> bit faster.

For out-of-order CPUs is unlikely to make a difference.

> I admit that Jenkins' method will probably take much longer to
> evaluate the auxiliary table but since we have to deal with only
> moderately large label sets (usually well below 1000 labels) this
> should be acceptable. Our goal is not to speed up the compiler at
> all means but to let it produce better code.

The main problem is that the missing degree of freedom makes the
algorithm not scale. Consider for a moment the problem of not finding a
construction in reasonable time.  Now imagine this happens because
someone just aded one more label to the switch. The huge problem of
gperf, Jenkin's and similar implementation is that they don't scale well
beyond a few dozen items -- it might work or it might not. That's a
serious limitation for a tool like a compiler, especially when dealing
with a changing code base.

Joerg