[LLVMdev] Lowering switch statements with hashing
Jasper Neumann
jn at sirrida.de
Fri Jan 17 14:52:23 PST 2014
Hello Joerg!
>> I had also considered CHM. This is a method which can easily deal
>> with really huge label sets since its runtime is O(n), and produces
>> a quite small auxiliary table.
>> However CHM produces a hash function that needs 2 table accesses
>> whereas Jenkins' needs only one.
>> CHM: h(x) = (table[h1(x)]+table[h2(x)]) mod size
>> Jenkins: h(x) = h1(x) ^ table[h2(x)]
>> This means that code generated with Jenkins' method ought to be a
>> bit faster.
> For out-of-order CPUs is unlikely to make a difference.
The difference will probably be noticeable; a benchmark will prove it.
At least CHM needs more code. I am not sure about the needed table space.
>> I admit that Jenkins' method will probably take much longer to
>> evaluate the auxiliary table but since we have to deal with only
>> moderately large label sets (usually well below 1000 labels) this
>> should be acceptable. Our goal is not to speed up the compiler at
>> all means but to let it produce better code.
> The main problem is that the missing degree of freedom makes the
> algorithm not scale. Consider for a moment the problem of not finding
> a construction in reasonable time. Now imagine this happens because
> someone just aded one more label to the switch. The huge problem of
> gperf, Jenkin's and similar implementation is that they don't scale
> well beyond a few dozen items -- it might work or it might not.
> That's a serious limitation for a tool like a compiler,
> especially when dealing with a changing code base.
I tweaked Jenkins' algorithm a bit and got it much faster than the
original. My experience is that it can now easily deal with several
thousand labels in reasonable time (please try it).
If a solution is not found in a reasonable time the compiler will do
what it had done before my patch was present: Split the label set in two
halves and repeat the algorithm with these halves. Therefore we should
not expect that the resulting code will become slower just because of
the patch.
We should include a means to create statistics for switch statements in
order to obtain a discussion base; a failure of achieving perfect by
Jenkins' code could even be output as a hint or warning. If we are then
faced with such warnings we can easily enhance the patch by including
alternative algorithms such as CHM. We can also do it just to be sure
and/or for the sake of being able to benchmark real life programs with
different hashing schemes.
As for the size of label sets, in "A Superoptimizer Analysis of Multiway
Branch Code Generation" by Roger Anthony Sayle
(http://ols.fedoraproject.org/GCC/Reprints-2008/sayle-reprint.pdf) we
can find the following:
"The largest difficult switch statement, found in the conversions.c
source file of Debian’s msort-8.42 package, had 1,080 case ranges,
covering 1,202 case values branching to 320 unique destination labels."
I tried exactly these switches and Jenkins' code could easily deal with
them. For this large switch the following code was produced (labels
renamed):
movl %edi, %esi
shll $15, %esi
shrl $21, %esi
movl %edi, %edx
andl $127, %edx
movzwl BTable(%rdx,%rdx), %edx
xorl %esi, %edx
cmpl ValTable(,%rdx,4), %edi
jne DefaultCase
jmpq JumpTable(,%rdx,8)
Best regards
Jasper
More information about the llvm-dev
mailing list