[LLVMdev] RFC: Using hashing for switch statements

Fri Jan 24 13:53:55 PST 2014

On Fri, Jan 24, 2014 at 12:41 PM, Anton Korobeynikov <
anton at korobeynikov.info> wrote:

> Hi Jasper,
>
> As it was stated before - all these are pretty fine theoretical
> arguments. However, without real numbers of benchmarking it's hard to
> say anything definite about whether this approach is preferable.
>
> There are plenty of switches in LLVM and clang itself. Can't you
> provide some statistics how the stuff was lowered before and after?

Sounds like it's worth implementing in tree, off by default, and then doing
some benchmarking.

FWIW, I don't think sparse switches are very common in LLVM, so it's not
clear how often this optimization would fire.

> On Fri, Jan 24, 2014 at 11:51 PM, Jasper Neumann <jn at sirrida.de> wrote:
> > Hi folks,
> >
> > here is a short RFC regarding hashing for the implementation of switch
> > statements and my preliminary patch.
> > I posted this patch on 2014-01-16 21:58 at llvm-commits at cs.uiuc.edu.
> You can
> > find a copy e.g. on
> >
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html
> .
> >
> > Best regards
> > Jasper
> >
> > ===
> >
> > Preliminary: Special identifiers
> >
> > n    number of given switch labels
> > N    size of the (intermediate) hash range, usually n rounded up to the
> next
> > power of 2
> > x    the value to be switch upon
> >
> >
> > Why (perfect) hashing?
> >
> > Hashing enables a jump table like approach even for sparse label sets.
> > Let's concentrate on perfect hashing in order to keep the code smaller
> and
> > faster.
> > Hashing is faster than a decision tree provided there are enough labels,
> > i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8 labels
> > assuming random values of x.
> > The memory consumption of the hashing tables is comparable to the memory
> > footage of the code of a decision tree.
> > A decision tree consumes a lot of branch prediction resources.
> > For minimal perfect hashing the jump table does not need to contain dummy
> > labels and therefore has n instead of (almost) N elements.
> > "Switch-to-lookup tables": If all jump targets are sufficiently similar
> > hashing opens the possibility to use table which contain the differences.
> > Thereby the indirect jump is eliminated and we get a huge performance
> gain.
> >
> >
> > Why not hashing?
> >
> > A decision tree or even an else-if chain is faster than hashing for very
> few
> > labels.
> > The branch prediction logic usually does not work as satisfactorily as
> for
> > decision trees if fed with cyclical access patterns; i.e. the break-even
> > point in this scenario is higher.
> > Hashing can not deal with large label ranges since very single label is
> > treated. A possible compromise/solution might be to first catch all
> single
> > labels and sufficiently short ranges by hashing and then treat the
> remaining
> > ranges in a decision tree.
> >
> >
> > Why reversible hashing?
> >
> > Reversible hashing does not need a value table and a simple range check
> of
> > the calculated hash value is sufficient.
> > Reversible hashing is very simple to implement.
> > The usual jump table approach is a special case of reversible hashing.
> >
> >
> > Why simple hashing?
> >
> > A comparison against a value table is sufficient.
> > Simple hashing does not need an extra table.
> >
> >
> > Why not simple hashing?
> >
> > Simple hashing needs a lot of hash function candidates to be tested
> until a
> > suitable is found; this costs time in the compiler.
> > For large label sets the chance to find such a function is very small;
> the
> > applicability is therefore reduced to quite small sets.
> > Minimal perfect hashing is usually not achievable.
> >
> >
> > Why Jenkins' hashing?
> >
> > Jenkins' hashing produces very fast and compact code and needs one extra
> > table (BTable) with usually less elements than n; the elements itself are
> > unsigned integers which can become as high N-1.
> > The code first generates two hash values h1=f1(x) and h2=f2(x) where f1
> and
> > f2 limit their range by shifting or masking, then the final hash as h =
> h1 ^
> > BTable[h1].
> > Minimal perfect hashing is achievable, albeit often at the cost of larger
> > tables.
> > For sufficiently high N the table memory can optionally be reduced by
> > introducing an extra scrambling table (STable); BTable then consists of
> > bytes whereas STable contains 256 words or DWords. Obviously this
> reduction
> > of memory costs another indirection in the resulting code.
> > As far as I tested Jenkins' hashing can be used for label sets of several
> > thousand elements; often even one million labels can be treated.
> > The application in a compiler is to lower switch statements. Switch
> > statements with much more than 1000 labels are almost never present in
> > practice.
> > Jenkins' hashing usually is sufficiently fast.
> > By the way: You will find the following remark in lib/Target/README.txt:
> > ==>
> > Investigate lowering of sparse switch statements into perfect hash
> tables:
> > http://burtleburtle.net/bob/hash/perfect.html
> > <==
> >
> >
> > Why not Jenkins' hashing?
> >
> > The search time of Jenkins' hashing might become unacceptably long for
> > extraordinary large label sets.
> > In practice this should never happen, however in this case the label set
> > will be split in two parts are the process of lowering switches will be
> > repeated.
> > An alternative (for very large label sets) could be a different perfect
> hash
> > algorithm such as CHM.
> >
> >
> > Why CHM?
> >
> > CHM is extraordinarily fast at generating the extra table with a runtime
> of
> > O(1).
> > CHM can be used to hash almost arbitrarily many labels.
> > CHM produces an ordered minimal perfect hashing function.
> >
> >
> > Why not CHM?
> >
> > CHM needs extra table space of 2.09 * n elements; this is larger
> compared to
> > Jenkins' method.
> > CHM (in the implementation of cmph-2.0) is much more complicated than the
> > code of Jenkins' hashing:
> > h1 = f1(x) % N;
> > h2 = f2(x) % N;
> > if (h1 == h2 && ++h2 >= N)  h2 = 0;
> > h = (Table[h1] + Table[h2]) % n;
> > By enlarging N to a power of 2 and other tweaking we might replace the 3
> > modulo operations by shifting or masking at the cost of more table
> space. We
> > might also eliminate the conditional statement if in this case we choose
> > different hash functions and retry the calculation.
> > There is at least one more table access compared to Jenkins' method.
> >
> > ===
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> --
> With best regards, Anton Korobeynikov
> Faculty of Mathematics and Mechanics, Saint Petersburg State University
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140124/3a8435d8/attachment.html>