Lowering switch statements with hashing

Sun Jan 19 05:07:20 PST 2014

Hello folks,

by mistake my answer was carbon-copied to the wrong forum 
(llvmdev at cs.uiuc.edu). Sorry for the cross-posting.

Jasper

Hello Joerg!

 >> I had also considered CHM. This is a method which can easily deal
 >> with really huge label sets since its runtime is O(n), and produces
 >> a quite small auxiliary table.
 >> However CHM produces a hash function that needs 2 table accesses
 >> whereas Jenkins' needs only one.
 >> CHM: h(x) = (table[h1(x)]+table[h2(x)]) mod size
 >> Jenkins: h(x) = h1(x) ^ table[h2(x)]
 >> This means that code generated with Jenkins' method ought to be a
 >> bit faster.

 > For out-of-order CPUs is unlikely to make a difference.

The difference will probably be noticeable; a benchmark will prove it. 
At least CHM needs more code. I am not sure about the needed table space.

 >> I admit that Jenkins' method will probably take much longer to
 >> evaluate the auxiliary table but since we have to deal with only
 >> moderately large label sets (usually well below 1000 labels) this
 >> should be acceptable. Our goal is not to speed up the compiler at
 >> all means but to let it produce better code.

 > The main problem is that the missing degree of freedom makes the
 > algorithm not scale. Consider for a moment the problem of not finding
 > a construction in reasonable time.  Now imagine this happens because
 > someone just aded one more label to the switch. The huge problem of
 > gperf, Jenkin's and similar implementation is that they don't scale
 > well beyond a few dozen items -- it might work or it might not.
 > That's a serious limitation for a tool like a compiler,
 > especially when dealing with a changing code base.

I tweaked Jenkins' algorithm a bit and got it much faster than the 
original. My experience is that it can now easily deal with several 
thousand labels in reasonable time (please try it).
If a solution is not found in a reasonable time the compiler will do 
what it had done before my patch was present: Split the label set in two 
halves and repeat the algorithm with these halves. Therefore we should 
not expect that the resulting code will become slower just because of 
the patch.
We should include a means to create statistics for switch statements in 
order to obtain a discussion base; a failure of achieving perfect by 
Jenkins' code could even be output as a hint or warning. If we are then 
faced with such warnings we can easily enhance the patch by including 
alternative algorithms such as CHM. We can also do it just to be sure 
and/or for the sake of being able to benchmark real life programs with 
different hashing schemes.

As for the size of label sets, in "A Superoptimizer Analysis of Multiway 
Branch Code Generation" by Roger Anthony Sayle 
(http://ols.fedoraproject.org/GCC/Reprints-2008/sayle-reprint.pdf) we 
can find the following:
"The largest difficult switch statement, found in the conversions.c 
source file of Debian's msort-8.42 package, had 1,080 case ranges, 
covering 1,202 case values branching to 320 unique destination labels."
I tried exactly these switches and Jenkins' code could easily deal with 
them. For this large switch the following code was produced (labels 
renamed):
	movl	%edi, %esi
	shll	$15, %esi
	shrl	$21, %esi
	movl	%edi, %edx
	andl	$127, %edx
	movzwl	BTable(%rdx,%rdx), %edx
	xorl	%esi, %edx
	cmpl	ValTable(,%rdx,4), %edi
	jne	DefaultCase
	jmpq	JumpTable(,%rdx,8)

Best regards
Jasper