Lowering switch statements with hashing

Anton Korobeynikov anton at korobeynikov.info
Thu Jan 16 13:24:33 PST 2014


Jasper,

Will you please provide RFC outlining the algorithm itself and
possible some benchmarks as the .txt definitely does not contain
enough details...

I'm a bit concerned about perfect hashing since it usually (classic
implementations by Jenkins or MPH) involves two loads with the second
load from the location computed by a first load and thus this in many
cases yields two cache misses in a row.

On Fri, Jan 17, 2014 at 12:58 AM, Jasper Neumann <jn at sirrida.de> wrote:
> Hello all,
>
> here is the first public version of my patch, however a lot should still be
> done before the patch is actually committed.
>
> The patch modifies lib/CodeGen/SelectionDAG/SelectionDAGBuilder.*.
> The patch for http://llvm.org/bugs/show_bug.cgi?id=18347 is already embedded
> but can be turned off with the llvm switch -tree-split-classic=true (see
> TreeSplitClassic).
> Place gen_hash.inc and hashlib.* in lib/CodeGen/SelectionDAG/.
> gen_hash.inc currently is a separate file to ease editing for me but shall
> be integrated into SelectionDAGBuilder.cpp later.
>
> You will find some documentation in the *.txt files.
>
> To test the patch you may generate a c++ file which tests a randomly
> generated switch statement you can use switchgen.cpp.
>
> To test the hashing library hashlib there is hashtest.cpp.
>
> Currently the default behavior is that the classical jump table method is
> disabled; you can change this with the llvm switch -switch-perfect-hash=...
> (search SelectionDAGBuilder.cpp and gen_hash.inc for
> SwitchPerfectHashMethods).
>
> Please observe how e.g. the following routines are treated using my patch
> compared to the release version of clang.
>
> int reversible(int x) {
>   switch (x) {
>     case 0: return 0;
>     case 100: return 1;
>     case 200: return 2;
>     case 300: return 3;
>     case 400: return 4;
>     case 500: return 5;
>     case 600: return 6;
>     case 700: return 7;
>     case 800: return 8;
>     case 900: return 9;
>     default: return -1;
>     }
>   }
>
> int reversible_simple(unsigned int x) {
>   switch (x) {
>     case 0:
>     case 10:
>     case 20:
>     case 30:
>     case 40:
>     case 50:
>     case 60:
>     case 70:
>     case 80:
>     case 90:
>     case 100:
>       return 1;
>     default: return 0;
>     }
>   }
>
> bool is_power_2(unsigned int x) {
>   switch (x) {
>     case 0x00000001:
>     case 0x00000002:
>     case 0x00000004:
>     case 0x00000008:
>     case 0x00000010:
>     case 0x00000020:
>     case 0x00000040:
>     case 0x00000080:
>     case 0x00000100:
>     case 0x00000200:
>     case 0x00000400:
>     case 0x00000800:
>     case 0x00001000:
>     case 0x00002000:
>     case 0x00004000:
>     case 0x00008000:
>     case 0x00010000:
>     case 0x00020000:
>     case 0x00040000:
>     case 0x00080000:
>     case 0x00100000:
>     case 0x00200000:
>     case 0x00400000:
>     case 0x00800000:
>     case 0x01000000:
>     case 0x02000000:
>     case 0x04000000:
>     case 0x08000000:
>     case 0x10000000:
>     case 0x20000000:
>     case 0x40000000:
>     case 0x80000000: return true;
>     default: return false;
>     }
>   }
>
> int map_power_2(unsigned int x) {
>   switch (x) {
>     case 0x00000001: return 0x00;
>     case 0x00000002: return 0x01;
>     case 0x00000004: return 0x02;
>     case 0x00000008: return 0x03;
>     case 0x00000010: return 0x04;
>     case 0x00000020: return 0x05;
>     case 0x00000040: return 0x06;
>     case 0x00000080: return 0x07;
>     case 0x00000100: return 0x08;
>     case 0x00000200: return 0x09;
>     case 0x00000400: return 0x0a;
>     case 0x00000800: return 0x0b;
>     case 0x00001000: return 0x0c;
>     case 0x00002000: return 0x0d;
>     case 0x00004000: return 0x0e;
>     case 0x00008000: return 0x0f;
>     case 0x00010000: return 0x10;
>     case 0x00020000: return 0x11;
>     case 0x00040000: return 0x12;
>     case 0x00080000: return 0x13;
>     case 0x00100000: return 0x14;
>     case 0x00200000: return 0x15;
>     case 0x00400000: return 0x16;
>     case 0x00800000: return 0x17;
>     case 0x01000000: return 0x18;
>     case 0x02000000: return 0x19;
>     case 0x04000000: return 0x1a;
>     case 0x08000000: return 0x1b;
>     case 0x10000000: return 0x1c;
>     case 0x20000000: return 0x1d;
>     case 0x40000000: return 0x1e;
>     case 0x80000000: return 0x1f;
>     default: return -1;
>     }
>   }
>
> As an example for big sparse switches in real-life you might e.g. look at
> conversions.c from Debian's msort package downloadable at
> http://packages.debian.org/source/sid/msort.
>
> Waiting for your comments and advice. Happy testing
> Jasper
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>



-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University



More information about the llvm-commits mailing list