Lowering switch statements with hashing, version 2

Sat Feb 1 13:25:52 PST 2014

Hello all,

here is the second public version of my patch.
There is still a lot to be done before the patch can actually be 
committed such as adapting the affected tests and writing new ones.

The patch modifies lib/CodeGen/SelectionDAG/SelectionDAGBuilder.*.
The patch for http://llvm.org/bugs/show_bug.cgi?id=18347 is already 
embedded but can be turned off with the llvm switch 
-switch-split-classic=true (see SwitchSplitClassic).
gen_hash.inc currently is a separate file to ease editing for me but can 
(or should?) be integrated into SelectionDAGBuilder.cpp later.

You will find some documentation in the *.txt files.

To test the patch you may use switchgen.cpp to generate a c++ file which 
tests a randomly generated switch statement.

To test the hashing library hashlib there is hashtest.cpp.

Currently the default behavior is that all methods but the classical 
jump table method are activated; you can change this with the llvm 
switch -switch-methods=... (search SelectionDAGBuilder.cpp and 
gen_hash.inc for SwitchMethods).

As an example for big sparse switches in real-life you might e.g. look 
at conversions.c from Debian's msort package downloadable at 
http://packages.debian.org/source/sid/msort.

I have logged the switch statements as they come through to 
SelectionDAGBuilder::visitSwitch and simulated the decisions with an 
auxiliary program in order to generate statistics and to play with 
different settings without recompiling. Processing all grabbed 15170 
switch statements only took about 5 seconds using a poorly optimized 
Delphi program on an Atom netbook. I generated statistics with 
(case_with_jt.txt) and without (case_no_jt.txt) the classical jump table 
method activated.

Waiting for your comments and advice. Happy testing
Jasper
-------------- next part --------------
# switches = 15170

# ranges:
min  = 1
max  = 316
mean = 7.275412
dev  = 20.5346966

range sizes:
min  = 1
max  = 204
mean = 1.1279419
dev  = 1.231322

# affected switches:
# too small  = 707  // nr_keys<4 or nr_if<6 [non reversible hashing]; split
# too sparse = 314  // nr_keys*30/100>nr_if; split
# hashed     = 3829  // using any hashing method
# not hashed = 0  // hashing failed

method    # switches   # keys                      # ranges
bit_test           8   min=  6 max= 43 mean=15     min=  3 max=  9 mean= 4.1
small_range    10312   min=  1 max= 40 mean= 2.3   min=  1 max=  3 mean= 1.9
jump_table         0   
reversible      3302   min=  4 max=345 mean=26.3   min=  4 max=316 mean=24.8
simple_and        38   min=  6 max=144 mean=20     min=  4 max= 82 mean=16.6
simple_shr         0   
simple_rol         7   min=  6 max=  9 mean= 6.7   min=  6 max=  8 mean= 6.6
simple_rol_xor   126   min=  6 max= 73 mean=11     min=  5 max= 73 mean=10.1
simple_rol_add    39   min=  6 max= 41 mean=11.6   min=  4 max= 41 mean= 9.4
simple_rol_sub    34   min=  6 max= 17 mean= 9.9   min=  4 max= 15 mean= 8.9
simple_mul       267   min=  6 max=149 mean=15.9   min=  4 max=149 mean=14.1
ab_1               7   min= 75 max=175 mean=125    min= 72 max=175 mean=113 
ab_2               1   min=241 max=241 mean=241    min=109 max=109 mean=109 
ab_3               6   min= 45 max=108 mean=57.8   min= 20 max=100 mean=45.2
ab_x1              2   min= 25 max= 50 mean=37.5   min= 20 max= 26 mean=23  
ab_x3              0   

Factors for reversible hashing (load >= 40%), # switches:
1 3257
2 17
4 3
8 3
16 2
512 6
32768 2
262144 1
16777216 2
134217728 9

# ranges, count:
0 0
1 3327
2 5776
3 1694
4 1293
5 586
6 320
7 205
8 187
9 129
10 101
11 141
12 53
13 221
14 31
15 27
16 69
17 19
18 21
19 11
20 14
21 15
22 12
23 7
24 13
25 10
26 13
27 11
28 5
29 7
30 4
31 5
32 189
33 4
34 3
35 2
36 1
37 5
38 5
39 2
40 5
41 4
42 90
43 89
44 3
45 5
46 1
47 2
48 3
49 3
50 1
51 0
52 0
53 1
54 0
55 2
56 2
57 2
58 0
59 27
60 82
61 0
62 1
63 1
64 2
65 1
66 1
67 1
68 1
69 0
70 0
71 1
72 1
73 2
74 0
75 1
76 1
77 1
78 1
79 1
80 0
81 0
82 2
83 1
84 2
85 1
86 1
87 1
88 0
89 0
90 0
91 0
92 0
93 0
94 0
95 0
96 1
97 1
98 0
99 0
100 0
101 1
102 0
103 0
104 0
105 0
106 0
107 0
108 0
109 0
110 0
111 0
112 0
113 0
114 0
115 1
116 0
117 0
118 0
119 1
120 0
121 0
122 0
123 0
124 0
125 0
126 0
127 0
128 1
129 0
130 0
131 1
132 62
133 126
134 0
135 1
136 0
137 0
138 1
139 0
140 0
141 0
142 0
143 0
144 0
145 0
146 0
147 0
148 1
149 1
150 0
151 0
152 0
153 0
154 0
155 73
156 0
157 1
158 0
159 0
160 0
161 2
162 0
163 0
164 1
165 1
166 0
167 0
168 0
169 0
170 0
171 0
172 0
173 0
174 0
175 1
176 0
177 0
178 1
179 0
180 0
181 0
182 0
183 0
184 0
185 0
186 1
187 0
188 0
189 0
190 1
191 0
192 0
193 0
194 0
195 0
196 0
197 0
198 0
199 0
200 0
201 0
202 0
203 0
204 0
205 0
206 1
207 0
208 0
209 0
210 0
211 0
212 0
213 0
214 0
215 0
216 1
217 0
218 0
219 0
220 0
221 0
222 0
223 0
224 0
225 0
226 0
227 0
228 1
229 0
230 0
231 0
232 0
233 0
234 0
235 0
236 0
237 0
238 0
239 0
240 0
241 0
242 0
243 0
244 0
245 0
246 0
247 0
248 0
249 0
250 0
251 0
252 0
253 0
254 0
255 0
256 0
257 0
258 0
259 0
260 0
261 0
262 0
263 0
264 0
265 0
266 0
267 0
268 0
269 0
270 0
271 0
272 0
273 0
274 0
275 0
276 0
277 0
278 0
279 0
280 0
281 0
282 0
283 0
284 0
285 0
286 0
287 0
288 0
289 0
290 0
291 0
292 0
293 0
294 0
295 0
296 0
297 0
298 0
299 0
300 0
301 0
302 0
303 0
304 0
305 0
306 0
307 0
308 0
309 0
310 0
311 0
312 0
313 0
314 0
315 0
316 1
-------------- next part --------------
# switches = 15170

# ranges:
min  = 1
max  = 316
mean = 7.275412
dev  = 20.5346966

range sizes:
min  = 1
max  = 204
mean = 1.1279419
dev  = 1.231322

# affected switches:
# too small  = 699  // nr_keys<4 or nr_if<6 [non reversible hashing]; split
# too sparse = 143  // nr_keys*30/100>nr_if; split
# hashed     = 560  // using any hashing method
# not hashed = 0  // hashing failed

method    # switches   # keys                      # ranges
bit_test           8   min=  6 max= 43 mean=15     min=  3 max=  9 mean= 4.1
small_range    10312   min=  1 max= 40 mean= 2.3   min=  1 max=  3 mean= 1.9
jump_table      3448   min=  4 max=345 mean=26.9   min=  4 max=316 mean=24.5
reversible        39   min=  4 max= 48 mean= 9.6   min=  4 max= 48 mean= 9.6
simple_and        38   min=  6 max=144 mean=20     min=  4 max= 82 mean=16.6
simple_shr         0   
simple_rol         6   min=  6 max=  9 mean= 6.5   min=  6 max=  8 mean= 6.3
simple_rol_xor   125   min=  6 max= 73 mean=11.1   min=  5 max= 73 mean=10.1
simple_rol_add    39   min=  6 max= 41 mean=11.6   min=  4 max= 41 mean= 9.4
simple_rol_sub    34   min=  6 max= 17 mean= 9.9   min=  4 max= 15 mean= 8.9
simple_mul       263   min=  6 max=149 mean=15.8   min=  4 max=149 mean=14.1
ab_1               7   min= 75 max=175 mean=125    min= 72 max=175 mean=113 
ab_2               1   min=241 max=241 mean=241    min=109 max=109 mean=109 
ab_3               6   min= 45 max=108 mean=57.8   min= 20 max=100 mean=45.2
ab_x1              2   min= 25 max= 50 mean=37.5   min= 20 max= 26 mean=23  
ab_x3              0   

Factors for reversible hashing (load >= 40%), # switches:
2 11
4 3
8 3
16 2
512 6
32768 2
262144 1
16777216 2
134217728 9

# ranges, count:
0 0
1 3327
2 5776
3 1694
4 1293
5 586
6 320
7 205
8 187
9 129
10 101
11 141
12 53
13 221
14 31
15 27
16 69
17 19
18 21
19 11
20 14
21 15
22 12
23 7
24 13
25 10
26 13
27 11
28 5
29 7
30 4
31 5
32 189
33 4
34 3
35 2
36 1
37 5
38 5
39 2
40 5
41 4
42 90
43 89
44 3
45 5
46 1
47 2
48 3
49 3
50 1
51 0
52 0
53 1
54 0
55 2
56 2
57 2
58 0
59 27
60 82
61 0
62 1
63 1
64 2
65 1
66 1
67 1
68 1
69 0
70 0
71 1
72 1
73 2
74 0
75 1
76 1
77 1
78 1
79 1
80 0
81 0
82 2
83 1
84 2
85 1
86 1
87 1
88 0
89 0
90 0
91 0
92 0
93 0
94 0
95 0
96 1
97 1
98 0
99 0
100 0
101 1
102 0
103 0
104 0
105 0
106 0
107 0
108 0
109 0
110 0
111 0
112 0
113 0
114 0
115 1
116 0
117 0
118 0
119 1
120 0
121 0
122 0
123 0
124 0
125 0
126 0
127 0
128 1
129 0
130 0
131 1
132 62
133 126
134 0
135 1
136 0
137 0
138 1
139 0
140 0
141 0
142 0
143 0
144 0
145 0
146 0
147 0
148 1
149 1
150 0
151 0
152 0
153 0
154 0
155 73
156 0
157 1
158 0
159 0
160 0
161 2
162 0
163 0
164 1
165 1
166 0
167 0
168 0
169 0
170 0
171 0
172 0
173 0
174 0
175 1
176 0
177 0
178 1
179 0
180 0
181 0
182 0
183 0
184 0
185 0
186 1
187 0
188 0
189 0
190 1
191 0
192 0
193 0
194 0
195 0
196 0
197 0
198 0
199 0
200 0
201 0
202 0
203 0
204 0
205 0
206 1
207 0
208 0
209 0
210 0
211 0
212 0
213 0
214 0
215 0
216 1
217 0
218 0
219 0
220 0
221 0
222 0
223 0
224 0
225 0
226 0
227 0
228 1
229 0
230 0
231 0
232 0
233 0
234 0
235 0
236 0
237 0
238 0
239 0
240 0
241 0
242 0
243 0
244 0
245 0
246 0
247 0
248 0
249 0
250 0
251 0
252 0
253 0
254 0
255 0
256 0
257 0
258 0
259 0
260 0
261 0
262 0
263 0
264 0
265 0
266 0
267 0
268 0
269 0
270 0
271 0
272 0
273 0
274 0
275 0
276 0
277 0
278 0
279 0
280 0
281 0
282 0
283 0
284 0
285 0
286 0
287 0
288 0
289 0
290 0
291 0
292 0
293 0
294 0
295 0
296 0
297 0
298 0
299 0
300 0
301 0
302 0
303 0
304 0
305 0
306 0
307 0
308 0
309 0
310 0
311 0
312 0
313 0
314 0
315 0
316 1
-------------- next part --------------
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.

2014-02-01

Comments on hashlib

The purpose of hashlib is to provide (optionally minimal) perfect hashing for 32 bit integers (signed and unsigned; called "label" here) to be used for a static set of constants and is meant to be used e.g. in a compiler. It is usable up to some 100000 labels.
Perfect hashing is a technique which makes the advantages of a jump table or a value table lookup also available for arbitrarily sparse label sets which often occur in switch statements.
The goal is to generate special code for the given specific label set.
This is done by trying several methods; each of the methods can be activated or left unused.


Here are the methods in detail.
x is the given label value, a and b are auxiliary values, all other variables are determined by hashlib.
ror_32 is the rotate right function for 32 bit integers where the first argument is rotated right by the second.
rol_32 is the rotate left function for 32 bit integers where the first argument is rotated left by the second.
All other operators (+, -, *, &, ^, <<, >>) are as defined in C for unsigned 32 bit values; overflows have to be discarded.

gen_reversible:
This method catches the case when the constants can be described as x_i = d * i + c .
The charm of this method is that there is a simple function h which can invert the formula such that h(x_i)=i and that this function is invertible; thus this is called reversible hashing. This means that after applying the hash function h a simple range check is necessary to verify that x_i is a member of the label set. All other methods of hashlib need to check the value after applying the hash function.
h(x) = ror_32(x-salt, a_shr) * a_mask

The next methods are usually only applicable for a small label set.

gen_simple_and:
This hash function simply masks out the lowest bits.
h(x) = x & a_mask

gen_simple_shr:
This hash function simply masks out the highest bits.
h(x) = x >> a_shr

gen_simple_rol:
This hash function masks out a contiguous row of bits.
h(x) = rol_32(x, salt) & a_mask

gen_simple_rol_xor:
h(x) = (rol_32(x, salt) ^ x) & a_mask

gen_simple_rol_add:
h(x) = (rol_32(x, salt) + x) & a_mask

gen_simple_rol_sub:
h(x) = (rol_32(x, salt) - x) & a_mask

gen_simple_mul:
h(x) = (x * salt) >> a_shr

The next methods can be used also on large label sets. They calculate the auxiliary values a and b and need up to 2 tables.
The scramble table is used to keep the memory needed for p_tabb small because in this case p_tabb only contains bytes; to see p_scramble in action it takes at least 8000 labels.
These methods as well as the technique to generate the tables were developed by Bob Jenkins, see http://burtleburtle.net/bob/hash/perfect.html; I added gen_ab_x*.
There is a common auxiliary function calc_tab(a,b):
if tabb_len == 0:
  return a
if tabb_len == 1:
  return a ^ p_tabb[0]
if tabb_len > 1 && scramble_len == 0:
  return a ^ p_tabb[b]
if tabb_len > 1 && scramble_len > 0:
  return a ^ p_scramble[p_tabb[b]]

gen_ab_1:
a = (x << a_shl) >> a_shr
b = (x >> b_shr) & b_mask
h(x) = calc_tab(a,b)

gen_ab_2:
a = (x >> a_shl) & a_mask
b = (x << b_shr) >> b_shr
h(x) = calc_tab(a,b)

gen_ab_3:
x = x + salt
if do_xor_shr_16:
  x = x ^ (x >> 16)
if do_add_shl_8:
  x = x + (x << 8)
x = x ^ (x >> 4)
if a_shl == 0:
  a = x >> a_shr
if a_shl != 0:
  a = ((x << a_shl) + x) >> a_shr
b = (x >> b_shr) & b_mask
h(x) = calc_tab(a,b)

gen_ab_x1:
t = x * salt
a = (t >> a_shr) & a_mask
a = t >> b_shr
h(x) = calc_tab(a,b)

gen_ab_x3:
u64 t = (u64)(x) * (u64)(salt)  // unsigned 64 bit
a = (u32)(t >> 32) & a_mask     // use upper half
a = (u32)(t) >> b_shr           // use lower half
h(x) = calc_tab(a,b)

Obviously there is room for improvement while generating the code such as detecting masking with 0 or with -1, shifting by 0 or even the omission of the calculation of b when tabb_len <= 1.

On most modern CPUs gen_ab_x1 and gen_ab_x3 can be faster than gen_ab_3.

For simplicity I have left out some stuff of Bob Jenkin's code such as the special treatment of very small label sets and strings. Most of the identifier names are still the original ones.

The interface might not be the best one can think of; I am working on it.

Options: ...
Other generated info: ...
Caveats: ...

...to be continued...

Have fun
Jasper

(c) 2013..2014 by Jasper L. Neumann
www.sirrida.de / programming.sirrida.de
E-Mail: info at sirrida.de

===

History

2014-01-16: First public release

2014-02-01: Update released
Changes:
Endlessly adding keys causing a heap overflow if a range contained -1 and 0.
Division by zero if a switch contained -1 and 0 or 0x7fffffff and 0x80000000.
Added the gen_ab_x* methods for Jenkins' hashing using a multiplication.

-------------- next part --------------
Index: lib/CodeGen/SelectionDAG/hashlib.hpp
===================================================================

--- lib/CodeGen/SelectionDAG/hashlib.hpp	(revision 0)
+++ lib/CodeGen/SelectionDAG/hashlib.hpp	(revision 0)
@@ -0,0 +1,567 @@
+#ifndef hashlib_header
+#define hashlib_header
+
+/*@/// doc */
+// Hashing of 32 bit integers
+//
+// (c) 2013..2014 by Jasper L. Neumann
+// www.sirrida.de / programming.sirrida.de
+// E-Mail: info at sirrida.de
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// First version: 2013-11
+// Last change: 2014-02
+//
+// Based on the minimal hashing code generator of
+// Robert John Jenkins Junior (Bob Jenkins)
+// http://burtleburtle.net/bob/hash/perfect.html
+
+// I corrected some errors and tweaked code and parameters to speed it up.
+// String handling and special treatment of small switch sets are left out.
+// tabb containing only zeros is now optimized away.
+// I added reversible and simple hashing as well as 2 new a/b hashing methods.
+// Now it is a reusable library.
+
+// Bugs corrected:
+// 1. tabh was too small for minimal hashing => AV in apply().
+// 2. For 17 keys, slow mode, non-minimal the code generator could choose
+//    gen_ab_2 (perfhex.c, hexn(), case 2) and the shift right amount for b
+//    could get 32. This does not always work as expected since at least for
+//    x86 using 32 bit registers such a shift does *nothing* and the resulting
+//    code could trap on accessing invalid tabb elements.
+// 3. For slen>131072 alen got an undefined value => AV / heap overflow.
+// 4. For fast minimal hashing alen was probably guessed smaller than intended
+//    (quirk, no real impact).
+
+/*
+Here are some notes from Bob Jenkins' sources:
+
+perfect.c: code to generate code for a hash for perfect hashing.
+(c) Bob Jenkins, September 1996, December 1999
+You may use this code in any way you wish, and it is free.  No warranty.
+I hereby place this in the public domain.
+Source is http://burtleburtle.net/bob/c/perfect.c
+
+This generates a minimal perfect hash function.  That means, given a
+set of n keys, this determines a hash function that maps each of
+those keys into a value in 0..n-1 with no collisions.
+
+The perfect hash function first uses a normal hash function on the key
+to determine (a,b) such that the pair (a,b) is distinct for all
+keys, then it computes a^scramble[tab[b]] to get the final perfect hash.
+tab[] is an array of 1-byte values and scramble[] is a 256-term array of
+2-byte or 4-byte values.  If there are n keys, the length of tab[] is a
+power of two between n/3 and n.
+
+I found the idea of computing distinct (a,b) values in "Practical minimal
+perfect hash functions for large databases", Fox, Heath, Chen, and Daoud,
+Communications of the ACM, January 1992.  They found the idea in Chichelli
+(CACM Jan 1980).  Beyond that, our methods differ.
+
+The key is hashed to a pair (a,b) where a in 0..*alen*-1 and b in
+0..*blen*-1.  A fast hash function determines both a and b
+simultaneously.  Any decent hash function is likely to produce
+hashes so that (a,b) is distinct for all pairs.  I try the hash
+using different values of *salt* until all pairs are distinct.
+
+The final hash is (a XOR scramble[tab[b]]).  *scramble* is a
+predetermined mapping of 0..255 into 0..smax-1.  *tab* is an
+array that we fill in in such a way as to make the hash perfect.
+
+First we fill in all values of *tab* that are used by more than one
+key.  We try all possible values for each position until one works.
+
+This leaves m unmapped keys and m values that something could hash to.
+If you treat unmapped keys as lefthand nodes and unused hash values
+as righthand nodes, and draw a line connecting each key to each hash
+value it could map to, you get a bipartite graph.  We attempt to
+find a perfect matching in this graph.  If we succeed, we have
+determined a perfect hash for the whole set of keys.
+
+*scramble* is used because (a^tab[i]) clusters keys around *a*.
+*/
+/*@\\\+00CE*/
+
+#include <stdlib.h>
+#include <strings.h>
+#include <stdint.h>
+#include <string.h>
+
+/*@/// standard stuff */
+// Tell the compiler of an unused parameter
+#ifdef __GNUC__
+  #define UNUSED(x) x __attribute__((unused))
+#else
+  #define UNUSED(x) x
+#endif
+
+#ifndef mycall
+#ifdef __GNUC__
+  // Define your favourite function options here
+  // #define mycall __attribute__((fastcall))
+  #define mycall __attribute__((regparm(3)))
+#endif
+#endif
+
+#ifndef mycall
+  #define mycall
+#endif
+
+typedef uint8_t t_8u;
+typedef uint16_t t_16u;
+typedef uint32_t t_32u;
+typedef uint64_t t_64u;
+
+typedef int8_t t_8s;
+typedef int16_t t_16s;
+typedef int32_t t_32s;
+typedef int64_t t_64s;
+
+typedef void *t_pointer;
+typedef bool t_bool;
+typedef char t_char;
+typedef int t_int;
+typedef float t_float;
+
+#define nil 0
+#define assigned(f) ((f)!=nil)
+/*@\\\+1259*/
+
+/*@/// interface */
+// public:
+/*@/// typedef enum tq_hash_method */
+typedef enum {
+  HM_UNASSIGNED,
+  HM_REVERSIBLE,
+  HM_SIMPLE_AND,
+  HM_SIMPLE_SHR,
+  HM_SIMPLE_ROL,
+  HM_SIMPLE_ROL_XOR,
+  HM_SIMPLE_ROL_ADD,
+  HM_SIMPLE_ROL_SUB,
+  HM_SIMPLE_MUL,
+  HM_AB_1,
+  HM_AB_2,
+  HM_AB_3,
+  HM_AB_X1,    // experimental
+  HM_AB_X2,    // experimental
+  HM_AB_X3,    // experimental
+  HM_MAX       // dummy
+  } tq_hash_method;
+/*@\\\0000000212+5AAE*/
+/*@/// typedef struct tr_gen_opt */
+typedef struct tr_gen_opt {
+  // parameters for the generating callback functions, see tr_callback
+  // all values should be treated read-only
+
+  // common parameters
+  t_32u a_shl;
+  t_32u a_shr;
+  t_32u a_mask;  // and factor
+  t_32u b_shl;
+  t_32u b_shr;
+  t_32u b_mask;  // and factor
+  t_32u salt;  // and extra stuff
+
+  // for tabular methods
+  t_32u *p_scramble;  // array; pointer to the scramble table
+  t_32u scramble_len;  // 0=not used
+  t_32u scramble_max;
+  t_32u *p_tabb;  // array; pointer to the B table
+  t_32u tabb_len;
+  t_32u tabb_max;
+  t_bool do_xor_shr_16;  // extra scrambling
+  t_bool do_add_shl_8;  // extra scrambling
+
+  // info
+  t_bool check_range;  // check range
+  t_bool check_value;  // check if value OK
+    // if both are set: reversible hashing; test by bit array or jump table
+    // i.e. reversible hashing is not minimal / jump table contains holes
+  t_32u hash_used_max;  // maximum used hash value
+  t_32u hash_gen_max;  // maximum generated hash value
+
+  // statistics
+  t_32u count_for_limit;  // used total work, see calc_opt.work_limit
+  t_32u max_count;  // max. work per element, see
+  t_32u low;  // lowest key; use as hint to generate value table
+  t_32u high;  // highest key
+  t_32u bits;  // needed bits for keys
+  t_bool use_signed;  // use signed values for keys
+  t_bool is_minimal;  // no default entries in jump table
+  tq_hash_method hash_method;  // chosen hash method
+  } tr_gen_opt;
+/*@\\\+A55F*/
+typedef void (*tf_generate) (const tr_gen_opt &gen_opt, t_pointer gen_stuff);
+typedef void (*tf_step_key) (t_32u key, t_pointer key_stuff, t_pointer step_stuff);
+
+/*@/// typedef struct tr_callback */
+typedef struct tr_callback {
+  // all values are read/write
+  tf_generate a_gen[HM_MAX];  // indexed by tq_hash_method
+#if 0
+  /*@/// tf_generate gen_reversible; */
+  tf_generate gen_reversible;
+
+  /*
+  // This is the only hash method here which does not detect dupes
+  // OPTIMIZE for gen_opt.a_shr=0 and gen_opt.salt=1
+
+  t_32u calc_rev(t_32u x) {
+  // begin
+    return ror_32(x-gen_opt.salt, gen_opt.a_shr) * gen_opt.a_mask;
+    }
+  */
+  /*@\\\+57D3*/
+
+  /*@/// tf_generate gen_simple_and; */
+  tf_generate gen_simple_and;
+
+  /*
+  t_32u calc_and(t_32u x) {
+  // begin
+    return x & gen_opt.a_mask;
+    }
+  */
+  /*@\\\0000000401+1224*/
+  /*@/// tf_generate gen_simple_shr; */
+  tf_generate gen_simple_shr;
+
+  /*
+  // gen_opt.a_shr: 1..31
+
+  t_32u calc_shr(t_32u x) {
+  // begin
+    return x >> gen_opt.a_shr;
+    }
+  */
+  /*@\\\0000000501+CAFE*/
+  /*@/// tf_generate gen_simple_rol; */
+  tf_generate gen_simple_rol;
+
+  /*
+  // gen_opt.salt: 1..31
+
+  t_32u calc_rol(t_32u x) {
+  // begin
+    return rol_32(x, gen_opt.salt) & gen_opt.a_mask;
+    }
+  */
+  /*@\\\000000080E+44C6*/
+  /*@/// tf_generate gen_simple_rol_xor; */
+  tf_generate gen_simple_rol_xor;
+
+  /*
+  // gen_opt.salt: 1..31
+
+  t_32u calc_rol_xor(t_32u x) {
+  // begin
+    return (rol_32(x, gen_opt.salt) ^ x) & gen_opt.a_mask;
+    }
+  */
+  /*@\\\0000000811+0C3E*/
+  /*@/// tf_generate gen_simple_rol_add; */
+  tf_generate gen_simple_rol_add;
+
+  /*
+  // gen_opt.salt: 1..31
+
+  t_32u calc_rol_add(t_32u x) {
+  // begin
+    return (rol_32(x, gen_opt.salt) + x) & gen_opt.a_mask;
+    }
+  */
+  /*@\\\0000000822+AFA6*/
+  /*@/// tf_generate gen_simple_rol_sub; */
+  tf_generate gen_simple_rol_sub;
+
+  /*
+  // gen_opt.salt: 1..31
+
+  t_32u calc_rol_sub(t_32u x) {
+  // begin
+    return (rol_32(x, gen_opt.salt) - x) & gen_opt.a_mask;
+    }
+  */
+  /*@\\\0000000822+78F7*/
+  /*@/// tf_generate gen_simple_mul; */
+  tf_generate gen_simple_mul;
+
+  /*
+  t_32u calc_mul(t_32u x) {
+  // begin
+    return (t_32u)(x * gen_opt.salt) >> gen_opt.a_shr;
+    }
+  */
+  /*@\\\0000000401+D40F*/
+
+  /*@/// tf_generate gen_ab_1; */
+  tf_generate gen_ab_1;
+
+  /*
+  // OPTIMIZE for gen_opt.a_shl=gen_opt.a_shr
+  // gen_opt.a_shr: 1..31
+
+  t_32u calc_ab_1(t_32u x) {
+  // var
+    t_32u a,b;
+  // begin
+    a = (x << gen_opt.a_shl) >> gen_opt.a_shr;
+    b = (x >> gen_opt.b_shr) & gen_opt.b_mask;
+    return calc_tab(a,b);
+    }
+  */
+  /*@\\\+797B*/
+  /*@/// tf_generate gen_ab_2; */
+  tf_generate gen_ab_2;
+
+  /*
+  // OPTIMIZE for gen_opt.b_shl=gen_opt.b_shr
+  // gen_opt.a_shr: 1..31
+
+  t_32u calc_ab_2(t_32u x) {
+  // var
+    t_32u a,b;
+  // begin
+    a = (x >> gen_opt.a_shr) & gen_opt.a_mask;
+    b = (x << gen_opt.b_shl) >> gen_opt.b_shr;
+    return calc_tab(a,b);
+    }
+  */
+  /*@\\\+8BA4*/
+  /*@/// tf_generate gen_ab_3; */
+  tf_generate gen_ab_3;
+
+  /*
+  t_32u calc_ab_3(t_32u x) {
+  // var
+    t_32u a,b;
+  // begin
+    x += gen_opt.salt;
+
+    if (gen_opt.do_xor_shr_16)
+      x ^= (x >> 16);
+
+    if (gen_opt.do_add_shl_8)
+      x += (x << 8);
+
+    x ^= (x >> 4);
+
+    if (gen_opt.a_shl == 0)
+      a = x >> gen_opt.a_shr;
+    else
+      a = ((x << gen_opt.a_shl) + x) >> gen_opt.a_shr;
+
+    b = (x >> gen_opt.b_shr) & gen_opt.b_mask;
+
+    return calc_tab(a,b);
+    }
+  */
+  /*@\\\+9DCD*/
+#endif
+  } tr_callback;
+
+/*
+t_32u calc_tab(t_32u a, t_32u b) {
+// begin
+  if (gen_opt.tabb_len == 0)
+    return a;  // OPTIMIZE: do not calculate b
+  else if (gen_opt.tabb_len == 1)
+    return a ^ gen_opt.p_tabb[0];
+  // OPTIMIZE: do not calculate b, tabb[0]=const!
+  else if (gen_opt.scramble_len == 0)
+    return a ^ gen_opt.p_tabb[b];
+  else
+    return a ^ gen_opt.p_scramble[gen_opt.p_tabb[b]];
+  }
+*/
+/*@\\\0000002201+5678*/
+/*@/// typedef struct tr_calc_opt */
+typedef struct tr_calc_opt {
+  // all values are read/write
+  t_32u nr_mul_checks;  // number of tries for gen_simple_mul
+  t_32u listlen_limit;  // max. number of elements with same b value (speedup)
+  t_32u retry_initkey;  // number of times to try to find distinct (a,b)
+  t_32u retry_perfect;  // number of times to try to make a perfect hash
+  t_32u use_scramble;  // use scramble if blen >= use_scramble
+  t_32u count_limit;  // max. work per element (speedup)
+  t_32u work_limit;  // max. work total (speedup), 0=infinity
+  t_float min_load_factor;  // <= 1 (better <= 0.5)
+  t_float near_minimal_factor;  // >= 1, for ab hashing to limit hash value
+  t_float keyspace_factor;  // <= 1, twice key space if
+    // self.priv.nkeys > self.slen*self.calc_opt.keyspace_factor, fast: 0.8
+  t_bool minimal;  // for minimal hashing, i.e. no holes
+  t_bool fast;  // speedup this library at the potential cost of larger tables
+  } tr_calc_opt;
+/*@\\\0000000D11+15B0*/
+
+// private:
+/*@/// typedef struct tr_key */
+typedef struct tr_key *tpr_key;
+typedef struct tr_key {
+  // representation of a key
+  t_32u key_k;  // the initial hash value for this key, aka hash_k
+  tpr_key next_k;  // next key
+
+  t_pointer user_k;  // for key only, optional
+
+  // beyond this point is mapping-dependent
+  t_32u a_k;  // a, of the key maps to (a,b)
+  t_32u b_k;  // b, of the key maps to (a,b)
+  tpr_key nextb_k;  // next key with this b
+  } tr_key;
+/*@\\\+6DA4*/
+/*@/// typedef struct tr_bstuff */
+typedef struct tr_bstuff *tpr_bstuff;
+typedef struct tr_bstuff {
+  // things indexed by b of original (a,b) pair
+  t_32u val_b;  // hash=a^tabb[b].val_b
+  tpr_key list_b;  // tabb[i].list_b is list of keys with b==i
+  t_32u listlen_b;  // length of list_b
+  t_32u water_b;  // high watermark of who has visited this map node
+  } tr_bstuff;
+/*@\\\000000080F+B4A5*/
+/*@/// typedef struct tr_hstuff */
+typedef struct tr_hstuff {
+  // things indexed by final hash value
+  tpr_key key_h;  // tabh[i].key_h is the key with a hash of i
+  } tr_hstuff;
+/*@\\\000000040F+445B*/
+/*@/// typedef struct tr_qstuff */
+typedef struct tr_qstuff *tpr_qstuff;
+typedef struct tr_qstuff {
+  // things indexed by queue position
+  tpr_bstuff b_q;  // b that currently occupies this hash
+  t_32u parent_q;  // queue position of parent that could use this hash
+  t_32u newval_q;  // what to change parent tab[b] to to use this hash
+  t_32u oldval_q;  // original value of tab[b]
+  } tr_qstuff;
+/*@\\\+1BEC*/
+typedef mycall void (*tf_calc_ab) (const tr_gen_opt &opt, tr_key &key);
+typedef mycall t_32u (*tf_calc_hash) (const tr_gen_opt &opt, const tr_key &key);
+/*@/// typedef struct tr_private */
+typedef struct tr_private {
+  t_32u nkeys;  // number of keys
+  tpr_key keys;  // head of list of keys
+
+  // local tables
+  tpr_bstuff p_tabb;  // array; table indexed by b
+  t_32u slen;  // p_scramble[] values in 0..slen-1, a power of 2, aka smax
+  t_32u alen;  // a in 0..alen-1, a power of 2
+  t_32u blen;  // b in 0..blen-1, a power of 2
+  t_32u *p_scramble;  // array; used in final hash function
+
+  t_32u lowbit;  // for inithex(), lowest interesting bit
+  t_32u highbit;  // for inithex(), highest interesting bit
+  t_32u diffbits;  // bits which differ for some key
+
+  // code generator, hash generators
+  tr_hstuff *p_inv_hash;  // array; inverse hash
+  t_32u allocated_scramble;
+  t_32u highhash;  // highest allowed hash value
+  t_bool trans;  // do transitive closure
+  } tr_private;
+/*@\\\0000001201+0A6E*/
+/*@/// typedef struct tr_state_engine */
+typedef struct tr_state_engine {
+  // state machine used in inithex()
+  t_32u i,j,k;
+  } tr_state_engine;
+/*@\\\+2819*/
+/*@/// typedef struct tr_reversible */
+typedef struct tr_reversible {
+  // internal transfer record
+  t_32u gcd,ror,inv,max;
+  } tr_reversible;
+/*@\\\+3696*/
+/*@/// typedef enum tq_reason */
+typedef enum {
+  REASON_SUCCESS,  // found distinct (a,b) for all keys, put keys in tabb[]
+  REASON_FAILURE,  // didn't find distinct (a,b) for all keys
+  REASON_LIMIT,  // limit reached, early abort
+  REASON_DUPE  // a real dupe found, cannot proceed
+  } tq_reason;
+/*@\\\+DCDA*/
+
+// public:
+/*@/// class to_perfect_hash */
+class to_perfect_hash {
+
+// Fields are ordered such that alignment costs minimal space
+// and often used fields need short offsets
+private:
+  tr_private priv;
+  tr_state_engine state_engine;
+
+public:
+  tr_calc_opt calc_opt;
+  tr_gen_opt gen_opt;
+  tr_callback cb;
+  t_int ref_count;  // not used by library; can be used by application
+
+public:
+  to_perfect_hash () { init(); };
+  ~to_perfect_hash () { stop(); };
+private:
+  // don't clone me
+  to_perfect_hash (const to_perfect_hash &) {};
+  to_perfect_hash &operator =(const to_perfect_hash&) {return *this;};
+public:
+  void defaults();
+  t_bool self_test() const;  // OK?
+
+  void clean();
+  void add_key(t_32u value, t_pointer user_k);
+  t_bool generate_hash();
+  void do_generate(t_pointer gen_stuff);
+  t_32u calc_hash(t_32u key_k) const;
+  void step_keys(tf_step_key f_step_key, t_pointer step_stuff) const;
+  t_bool inv_hash(t_32u hash, t_32u &key, t_pointer &key_stuff) const;
+
+private:
+  void init();
+  void stop();
+  mycall void calc_max();
+
+  // main hash methods
+  mycall t_bool test_reversible();
+    mycall void calc_rev_cfg(tr_reversible &rev, t_32u min);
+  mycall t_bool test_simple_perfect();
+    mycall t_bool check_direct(tq_hash_method hash_method, t_32u x);
+  mycall t_bool findhash();
+    mycall void initalen();
+    mycall void scrambleinit();
+    mycall void setlow();
+    mycall void inithex(t_32u salt);
+    mycall tq_reason inittab();
+    /*@/// function apply(tabh,tabq,tail,rollback):t_bool; */
+    mycall t_bool apply(
+      tr_hstuff *tabh,  // tar_hstuff
+      tr_qstuff *tabq,  // tar_qstuff
+      t_32u tail,
+      t_bool rollback
+      );
+    /*@\\\000000020E+5EC8*/
+    /*@/// function augment(tabh,tabq,item,highwater):tq_reason; */
+    mycall tq_reason augment(
+      tr_hstuff *tabh,  // tar_hstuff
+      tr_qstuff *tabq,  // tar_qstuff
+      tpr_bstuff item,
+      t_32u highwater
+      );
+    /*@\\\000000040D+EC2E*/
+    /*@/// function perfect(tabh,tabq):tq_reason; */
+    mycall tq_reason perfect(
+      tr_hstuff *tabh,  // tar_hstuff
+      tr_qstuff *tabq  // tar_qstuff
+      );
+    /*@\\\+FF44*/
+    mycall void finalize_perfect();
+  };
+/*@\\\003E00290500293D00293200293200293D00293D+0E34*/
+/*@\\\0000001701+3181*/
+
+#endif
+/*@\\\0001000011000D01*/

Property changes on: lib/CodeGen/SelectionDAG/hashlib.hpp
___________________________________________________________________
Added: svn:keywords
   + Date Revision Author Id URL
Added: svn:author
   + jneumann

Index: lib/CodeGen/SelectionDAG/hashlib.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/hashlib.cpp	(revision 0)
+++ lib/CodeGen/SelectionDAG/hashlib.cpp	(revision 0)
@@ -0,0 +1,2412 @@
+/*@/// doc */
+// Hashing of 32 bit integers
+//
+// (c) 2013..2014 by Jasper L. Neumann
+// www.sirrida.de / programming.sirrida.de
+// E-Mail: info at sirrida.de
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// First version: 2013-11
+// Last change: 2014-02
+//
+// Based on the minimal hashing code generator of
+// Robert John Jenkins Junior (Bob Jenkins)
+// http://burtleburtle.net/bob/hash/perfect.html
+
+// I corrected some errors and tweaked code and parameters to speed it up.
+// String handling and special treatment of small switch sets are left out.
+// tabb containing only zeros is now optimized away.
+// I added reversible and simple hashing as well as 2 new a/b hashing methods.
+// Now it is a reusable library.
+
+// Bugs corrected:
+// 1. tabh was too small for minimal hashing => AV in apply().
+// 2. For 17 keys, slow mode, non-minimal the code generator could choose
+//    gen_ab_2 (perfhex.c, hexn(), case 2) and the shift right amount for b
+//    could get 32. This does not always work as expected since at least for
+//    x86 using 32 bit registers such a shift does *nothing* and the resulting
+//    code could trap on accessing invalid tabb elements.
+// 3. For slen>131072 alen got an undefined value => AV / heap overflow.
+// 4. For fast minimal hashing alen was probably guessed smaller than intended
+//    (quirk, no real impact).
+
+/*
+Here are some notes from Bob Jenkins' sources:
+
+perfect.c: code to generate code for a hash for perfect hashing.
+(c) Bob Jenkins, September 1996, December 1999
+You may use this code in any way you wish, and it is free.  No warranty.
+I hereby place this in the public domain.
+Source is http://burtleburtle.net/bob/c/perfect.c
+
+This generates a minimal perfect hash function.  That means, given a
+set of n keys, this determines a hash function that maps each of
+those keys into a value in 0..n-1 with no collisions.
+
+The perfect hash function first uses a normal hash function on the key
+to determine (a,b) such that the pair (a,b) is distinct for all
+keys, then it computes a^scramble[tab[b]] to get the final perfect hash.
+tab[] is an array of 1-byte values and scramble[] is a 256-term array of
+2-byte or 4-byte values.  If there are n keys, the length of tab[] is a
+power of two between n/3 and n.
+
+I found the idea of computing distinct (a,b) values in "Practical minimal
+perfect hash functions for large databases", Fox, Heath, Chen, and Daoud,
+Communications of the ACM, January 1992.  They found the idea in Chichelli
+(CACM Jan 1980).  Beyond that, our methods differ.
+
+The key is hashed to a pair (a,b) where a in 0..*alen*-1 and b in
+0..*blen*-1.  A fast hash function determines both a and b
+simultaneously.  Any decent hash function is likely to produce
+hashes so that (a,b) is distinct for all pairs.  I try the hash
+using different values of *salt* until all pairs are distinct.
+
+The final hash is (a XOR scramble[tab[b]]).  *scramble* is a
+predetermined mapping of 0..255 into 0..smax-1.  *tab* is an
+array that we fill in in such a way as to make the hash perfect.
+
+First we fill in all values of *tab* that are used by more than one
+key.  We try all possible values for each position until one works.
+
+This leaves m unmapped keys and m values that something could hash to.
+If you treat unmapped keys as lefthand nodes and unused hash values
+as righthand nodes, and draw a line connecting each key to each hash
+value it could map to, you get a bipartite graph.  We attempt to
+find a perfect matching in this graph.  If we succeed, we have
+determined a perfect hash for the whole set of keys.
+
+*scramble* is used because (a^tab[i]) clusters keys around *a*.
+*/
+/*@\\\0000001301+00CE*/
+
+// TODO: Cost function with options
+// TODO: Treat also 64 bit integers and strings?
+
+#include "hashlib.hpp"
+
+/*@/// tools */
+/*@/// static mycall void break_point() */
+static mycall void break_point(
+  ) {
+  // cout << "break_point\n";
+  // exit(1);  /////
+  }
+/*@\\\+0903*/
+
+/*@/// static mycall t_32u min_32(t_32u a, t_32u b) */
+static mycall t_32u min_32(
+  t_32u a,
+  t_32u b
+  ) {
+// begin
+  if (a < b)
+    return a;
+  else
+    return b;
+  }
+/*@\\\0000000109+9FB9*/
+/*@/// static mycall t_32u ror_32(t_32u x, t_32u shift) */
+static mycall t_32u ror_32(
+  t_32u x,
+  t_32u shift
+  ) {
+// begin
+  return (x >> shift) | (x << (32-shift));
+  }
+/*@\\\0000000201+10C3*/
+/*@/// static mycall t_32u rol_32(t_32u x, t_32u shift) */
+static mycall t_32u rol_32(
+  t_32u x,
+  t_32u shift
+  ) {
+// begin
+  return (x << shift) | (x >> (32-shift));
+  }
+/*@\\\+2B4D*/
+/*@/// static mycall t_32u gcd(t_32u a, t_32u b) */
+static mycall t_32u gcd(
+  t_32u a,
+  t_32u b
+  ) {
+// Greatest common divisor
+// var
+  t_32u x;
+// begin
+  do {
+    x = a % b;
+    a = b;
+    b = x;
+    } while (!(x == 0));
+  return a;
+  }
+/*@\\\+575F*/
+/*@/// static mycall t_32u mul_inv_32(t_32u x) */
+static mycall t_32u mul_inv_32(
+  t_32u x
+  ) {
+// var
+  t_32u xn,t;
+// begin
+  if ((x & 1) == 0)
+    return 0;
+  else {
+    xn = x;
+    while (true) {
+      t = x*xn;
+      if (t == 1)
+        break;
+      xn = xn*(2-t);
+      }
+    return xn;
+    }
+  }
+/*@\\\+F413*/
+/*@/// static mycall t_32u floor_log2(t_32u x) */
+static mycall t_32u floor_log2(
+  t_32u x
+  ) {
+// var
+  t_32u res;
+// begin
+  res = (t_32u)(-1);
+  while (x != 0) {
+    ++res;
+    x >>= 1;
+    }
+  return res;
+  }
+/*@\\\+256A*/
+/*@/// static mycall t_32u ceil_log2(t_32u x) */
+static mycall t_32u ceil_log2(
+  t_32u x
+  ) {
+// begin
+  switch (x) {
+    case 0:
+      break_point();
+      return (t_32u)(-1);
+    case 1:
+      return 0;
+    default:
+      return floor_log2(x-1)+1;
+    }
+  }
+
+
+//static mycall t_32u ceil_log2(
+//  t_32u val
+//  ) {
+//// return the ceiling of the log (base 2) of val
+//// var
+//  t_32u i;
+//// begin
+//  i = 0;
+//  while ((t_32u)(1) << i < val)
+//    ++i;
+//  return i;
+//  }
+/*@\\\+3595*/
+/*@\\\+8B49*/
+
+/*@/// define */
+#define UB4BITS 32
+
+// test_simple_perfect
+#define MAX_SET 256
+  // Max. size of hash table for simple hashing
+#define SEARCH_START 0x04d7651f
+  // Hacker's Delight, de Bruijn, fig.5-26, maps 2**x
+#define SEARCH_OFFSET 0x61c88647
+  // (1-1/((1+sqrt 5)/2)) * 0x1_0000_0000
+#define ZERO_MEM(var,size) memset(&var,0,(size))
+#define ZERO_OUT(var) memset(&var,0,sizeof(var))
+
+#ifndef TRACE
+  #define TRACE(s)
+  // Some general logging; string s; num(t_32u)=>string must be defined
+#endif
+#ifndef TRACE_ODO
+  #define TRACE_ODO(s)
+  // Some counting logging; string s; num(t_32u)=>string must be defined
+#endif
+#ifndef RAISE
+  #define RAISE(s)
+  // Might be used to let self_test() raise exceptions; string s
+#endif
+/*@\\\+A7BB*/
+
+/*@/// static mycall t_32u permute(...) */
+static mycall t_32u permute(
+  t_32u x,                                  // input, a value in some range
+  t_32u nbits                             // input, number of bits in range
+  ) {
+// compute p(x), where p is a permutation of 0..(1<<nbits)-1
+// permute(0)=0.  This is intended and useful.
+// var
+  t_int i;
+  t_32u mask;
+  t_int const2;
+  t_int const3;
+  t_int const4;
+  t_int const5;
+// begin
+  mask   = ((t_32u)(1) << nbits)-1;  // all ones
+  const2 = 1 + nbits/2;
+  const3 = 1 + nbits/3;
+  const4 = 1 + nbits/4;
+  const5 = 1 + nbits/5;
+  for (i = 0; i <= 20-1; ++i) {
+    x = (x + (x << const2)) & mask;
+    x = (x ^ (x >> const3));
+    x = (x + (x << const4)) & mask;
+    x = (x ^ (x >> const5));
+    }
+  return x;
+  }
+/*@\\\+E28C*/
+
+/*@/// static mycall t_32u calc_hash_tab(const tr_gen_opt &opt, const tr_key &key) */
+static mycall t_32u calc_hash_tab(
+  const tr_gen_opt &opt,
+  const tr_key &key
+  ) {
+// begin
+  if (opt.tabb_len == 0)
+    return key.a_k;  // OPTIMIZE: do not calculate b
+  else if (opt.tabb_len == 1)
+    return key.a_k ^ opt.p_tabb[0];
+  // OPTIMIZE: do not calculate b, tabb[0]=const!
+  else if (opt.scramble_len == 0)
+    return key.a_k ^ opt.p_tabb[key.b_k];
+  else
+    return key.a_k ^ opt.p_scramble[opt.p_tabb[key.b_k]];
+  }
+/*@\\\0000000C01+6375*/
+/*@/// static mycall void calc_ab_*(const tr_gen_opt &opt, tr_key &key) */
+/*@/// static mycall void calc_ab_reversible(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_reversible(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  key.a_k = ror_32(x-opt.salt, opt.a_shr) * opt.a_mask;
+  }
+/*@\\\+8C46*/
+
+/*@/// static mycall void calc_ab_and(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_and(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  key.a_k = x & opt.a_mask;
+  }
+/*@\\\000000050B+900E*/
+/*@/// static mycall void calc_ab_shr(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_shr(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  key.a_k = x >> opt.a_shr;
+  }
+/*@\\\+5098*/
+/*@/// static mycall void calc_ab_rol(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_rol(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  key.a_k = rol_32(x, opt.salt) & opt.a_mask;
+  }
+/*@\\\+4E20*/
+/*@/// static mycall void calc_ab_rol_xor(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_rol_xor(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  key.a_k = (rol_32(x, opt.salt) ^ x) & opt.a_mask;
+  }
+/*@\\\0000000621+7218*/
+/*@/// static mycall void calc_ab_rol_add(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_rol_add(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  key.a_k = (rol_32(x, opt.salt) + x) & opt.a_mask;
+  }
+/*@\\\0000000621+3440*/
+/*@/// static mycall void calc_ab_rol_sub(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_rol_sub(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  key.a_k = (rol_32(x, opt.salt) - x) & opt.a_mask;
+  }
+/*@\\\0000000620+4349*/
+/*@/// static mycall void calc_ab_mul(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_mul(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  key.a_k = (t_32u)(x*opt.salt) >> opt.a_shr;
+  }
+/*@\\\+9161*/
+
+/*@/// static mycall void calc_ab_1(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_1(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  // if (opt.a_shl == opt.a_shr)
+  //   break_point();  // can happen; optimize as key.a_k = x & opt.a_mask;
+                       // in this case also opt.b_shr=0
+  if (opt.a_shr == 0)
+    break_point();  // TCH
+  if (opt.a_shr == 32) {
+    break_point();
+    key.a_k = 0;  // TCH
+    }
+  else
+    key.a_k = (x << opt.a_shl) >> opt.a_shr;
+
+  // if (opt.b_shr == 0)
+  //   break_point();  // can happen
+  if (opt.b_shr == 32) {
+    key.b_k = 0;  // CAN HAPPEN (don't care, opt.b_mask should be 0)
+    if (opt.b_mask != 0)
+      break_point();  // TCH
+    }
+  else
+    key.b_k = (x >> opt.b_shr) & opt.b_mask;
+  }
+/*@\\\0000000A13+5B0C*/
+/*@/// static mycall void calc_ab_2(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_2(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  // if (opt.a_shr == 0)
+  //   break_point();  // can happen
+  key.a_k = (x >> opt.a_shr) & opt.a_mask;
+
+  if (opt.b_shl == opt.b_shr)
+    break_point();  // TCH
+  if (opt.b_shr == 0)
+    break_point();  // TCH
+  if (opt.b_shr == 32)  // BUG 2 in Jenkin's code (not caught)
+    key.b_k = 0;  // CAN HAPPEN, optimized away for app by finalize_perfect
+  else
+    key.b_k = (x << opt.b_shl) >> opt.b_shr;
+  }
+/*@\\\+422D*/
+/*@/// static mycall void calc_ab_3(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_3(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// var
+  t_32u x;
+// begin
+  x = key.key_k;
+  x += opt.salt;
+  if (opt.do_xor_shr_16)
+    x ^= x >> 16;
+  if (opt.do_add_shl_8)
+    x += x << 8;
+  x ^= x >> 4;
+
+  if (opt.a_shr == 0)
+    break_point();
+  if (opt.a_shr == 32) {
+    break_point();
+    key.a_k = 0;  // TCH
+    }
+  else if (opt.a_shl == 0)
+    key.a_k = x >> opt.a_shr;  // extra case, not identical to below
+  else
+    key.a_k = ((x << opt.a_shl) + x) >> opt.a_shr;
+
+  // if (opt.b_shr == 0)
+  //   break_point();  // can happen
+  if (opt.b_shr == 32) {
+    key.b_k = 0;  // CAN HAPPEN (don't care, opt.b_mask should be 0)
+    if (opt.b_mask != 0)
+      break_point();  // TCH
+    }
+  else
+    key.b_k = (x >> opt.b_shr) & opt.b_mask;
+  }
+/*@\\\+294E*/
+
+/*@/// static mycall void calc_ab_x1(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_x1(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// 32x32=>32 mul
+// var
+  t_32u tmp;
+  t_32u x;
+// begin
+  x = key.key_k;
+
+  tmp = x * opt.salt;
+
+  key.a_k = (tmp >> opt.a_shr) & opt.a_mask;
+
+  if (opt.b_shr == 32)
+    key.b_k = 0;  // CAN HAPPEN
+  else
+    key.b_k = tmp >> opt.b_shr;
+  }
+/*@\\\+2D44*/
+/*@/// static mycall void calc_ab_x3(const tr_gen_opt &opt, tr_key &key) */
+static mycall void calc_ab_x3(
+  const tr_gen_opt &opt,
+  tr_key &key
+  ) {
+// 32x32=>64 mul
+// var
+  t_64u tmp;
+  t_32u x;
+// begin
+  x = key.key_k;
+
+  tmp = (t_64u)(x) * (t_64u)(opt.salt);
+
+  key.a_k = (t_32u)(tmp >> 32) & opt.a_mask;
+
+  if (opt.b_shr == 32)
+    key.b_k = 0;  // CAN HAPPEN
+  else
+    key.b_k = (t_32u)(tmp) >> opt.b_shr;
+  }
+/*@\\\+FB86*/
+// other ideas:
+// tmp:=x*salt; b:=tmp>>b_shr; a:=((tmp+x)>>a_shr)&a_mask
+// tmp:=x*salt; b:=tmp>>b_shr; a:=((tmp>>a_shr)+x)&a_mask
+/*@\\\0000000105+0925*/
+
+/*@/// static const tf_calc_ab a_f_calc_ab[HM_MAX]={...}; */
+static const tf_calc_ab a_f_calc_ab[HM_MAX]={
+  nil,
+  calc_ab_reversible,
+  calc_ab_and,
+  calc_ab_shr,
+  calc_ab_rol,
+  calc_ab_rol_xor,
+  calc_ab_rol_add,
+  calc_ab_rol_sub,
+  calc_ab_mul,
+  calc_ab_1,
+  calc_ab_2,
+  calc_ab_3,
+  calc_ab_x1,
+  nil,  // calc_ab_x2,
+  calc_ab_x3
+  };
+/*@\\\+B33B*/
+
+/*@/// class to_perfect_hash */
+/*@/// void to_perfect_hash::init() */
+void to_perfect_hash::init(
+  ) {
+// begin
+  /*@/// zero all */
+  ZERO_OUT(this->priv);
+  ZERO_OUT(this->state_engine);
+
+  ZERO_OUT(this->calc_opt);
+  ZERO_OUT(this->gen_opt);
+
+  ZERO_OUT(this->cb);
+  /*@\\\0000000109+5ED9*/
+
+  this->defaults();
+  this->clean();
+  }
+/*@\\\0000000301+DFF9*/
+/*@/// void to_perfect_hash::defaults() */
+void to_perfect_hash::defaults(
+  ) {
+// begin
+  // Limits, usable up to ~700_000 elements
+  this->calc_opt.minimal = false;
+  this->calc_opt.fast = false;
+  this->calc_opt.nr_mul_checks = 100000;
+  this->calc_opt.listlen_limit = 100;
+  this->calc_opt.retry_initkey = 2048;
+  this->calc_opt.retry_perfect = 10;  // 200;
+  this->calc_opt.count_limit = 100000;  // max. work per element, should be const
+  this->calc_opt.work_limit = 0;  // max sum work, should scale with sqrt(nkeys)
+  // this->calc_opt.work_limit = 10000000;  // max sum work, should scale with sqrt(nkeys)
+  this->calc_opt.use_scramble = 4096;
+  this->calc_opt.near_minimal_factor = 4.0;  // way too large to be effective
+  this->calc_opt.min_load_factor = 0.4;
+  this->calc_opt.keyspace_factor = 0.85;
+  }
+/*@\\\+F5B5*/
+/*@/// void to_perfect_hash::stop() */
+void to_perfect_hash::stop(
+  ) {
+// begin
+  this->clean();
+  }
+/*@\\\+B2AB*/
+/*@/// void to_perfect_hash::add_key(t_32u value, t_pointer user_k) */
+void to_perfect_hash::add_key(
+  t_32u value,
+  t_pointer user_k
+  ) {
+// var
+  tpr_key mykey;
+// begin
+  mykey = (tpr_key)calloc(1, sizeof(*mykey));
+
+  mykey->key_k = value;
+  mykey->user_k = user_k;
+
+  mykey->next_k = this->priv.keys;
+  this->priv.keys = mykey;
+  ++this->priv.nkeys;
+  }
+/*@\\\000000051E+1BB1*/
+/*@/// t_bool to_perfect_hash::generate_hash() */
+t_bool to_perfect_hash::generate_hash(
+  ) {
+// begin
+  if (this->calc_opt.min_load_factor > 1.0)
+    this->calc_opt.min_load_factor = 1.0;
+  if (this->calc_opt.near_minimal_factor < 1.0)
+    this->calc_opt.near_minimal_factor = 1.0;
+  if (this->priv.p_inv_hash != nil) {
+    free(this->priv.p_inv_hash);
+    this->priv.p_inv_hash = nil;
+    }
+
+  if (this->priv.nkeys <= 1) {
+    // 0/1 elements
+    return false;
+    }
+  else if (this->test_reversible())
+    return true;
+  else if (this->test_simple_perfect())
+    return true;
+  else if (this->findhash())
+    return true;
+  else
+    return false;
+  }
+/*@\\\0000000119+6E80*/
+/*@/// void to_perfect_hash::do_generate(t_pointer gen_stuff) */
+void to_perfect_hash::do_generate(
+  t_pointer gen_stuff
+  ) {
+// begin
+  if (assigned(this->cb.a_gen[this->gen_opt.hash_method])) {
+      this->cb.a_gen[this->gen_opt.hash_method](this->gen_opt, gen_stuff);
+    }
+  }
+/*@\\\0000000325+8D6F*/
+/*@/// t_32u to_perfect_hash::calc_hash(t_32u key_k) const */
+t_32u to_perfect_hash::calc_hash(
+  t_32u key_k
+  ) const {
+// var
+  tr_key my_key;
+// begin
+  my_key.key_k = key_k;
+  a_f_calc_ab[this->gen_opt.hash_method](this->gen_opt, my_key);
+  return calc_hash_tab(this->gen_opt, my_key);
+  }
+/*@\\\0000000601+5DDD*/
+/*@/// void to_perfect_hash::step_keys(tf_step_key f_step_key, t_pointer step_stuff) const */
+void to_perfect_hash::step_keys(
+  tf_step_key f_step_key,
+  t_pointer step_stuff
+  ) const {
+// var
+  tpr_key mykey;
+// begin
+  if (assigned(f_step_key)) {
+    mykey = this->priv.keys;
+    while (mykey != nil) {
+      f_step_key(mykey->key_k, mykey->user_k, step_stuff);
+      mykey = mykey->next_k;
+      }
+    }
+  }
+/*@\\\+CD1C*/
+/*@/// t_bool to_perfect_hash::self_test() const */
+t_bool to_perfect_hash::self_test(
+  ) const {
+// to be used before add_key or after generate_hash
+// var
+  tr_key my_key;
+  tpr_key p_key;
+  t_bool *hash_target;  // array
+  t_32u n;
+  t_32u hash;
+  t_32u max_hash;
+  t_32u value;
+  t_pointer key_stuff;
+  t_bool res;
+  tf_calc_ab f_calc_ab;
+// begin
+  f_calc_ab = a_f_calc_ab[this->gen_opt.hash_method];
+  if (this->gen_opt.hash_used_max > this->gen_opt.hash_gen_max) {
+    break_point();
+    RAISE("hash_used_max>hash_gen_max");
+    return false;
+    }
+
+  if (this->priv.nkeys <= 1) {
+    // 0/1 elements => no check
+    return true;
+    }
+
+  ZERO_OUT(my_key);
+
+  res = false;
+  key_stuff = nil;  // not used
+  hash_target = nil;
+  // try
+    hash_target = (t_bool*)
+      calloc(this->gen_opt.hash_used_max+1, sizeof(hash_target[0]));
+    max_hash = 0;
+    n = 0;
+    p_key = this->priv.keys;
+    while (p_key != nil) {
+      ++n;
+      my_key.key_k = p_key->key_k;
+      f_calc_ab(this->gen_opt, my_key);
+      if ((my_key.b_k > this->gen_opt.tabb_len) &&
+         (this->gen_opt.tabb_len != 0)) {
+        break_point();
+        RAISE("b_k too large");
+        res = false;
+        goto STOP;
+        }
+      hash = calc_hash_tab(this->gen_opt, my_key);
+      if (hash > max_hash)
+        max_hash = hash;
+      if (hash > this->gen_opt.hash_used_max) {
+        break_point();
+        RAISE("hash>hash_used_max");
+        res = false;
+        goto STOP;
+        }
+      if (hash_target[hash]) {
+        break_point();
+        RAISE("hash dupe");
+        res = false;
+        goto STOP;
+        }
+      hash_target[hash] = true;
+
+      value = 171161;  // dummy value
+      if (! this->inv_hash(hash, value, key_stuff)) {
+        break_point();
+        RAISE("absent inverse hash");
+        res = false;
+        goto STOP;
+        }
+      if (value != p_key->key_k) {
+        break_point();
+        RAISE("invalid inverse hash");
+        res = false;
+        goto STOP;
+        }
+
+      p_key = p_key->next_k;
+      }
+
+    if (n != this->priv.nkeys) {
+      break_point();
+      RAISE("nkeys defective");
+      res = false;
+      goto STOP;
+      }
+    if (max_hash != this->gen_opt.hash_used_max) {
+      break_point();
+      RAISE("max hash != hash_used_max");
+      res = false;
+      goto STOP;
+      }
+    res = true;
+  // finally
+STOP:
+    if (hash_target != nil)
+      free(hash_target);
+
+  return res;
+  }
+/*@\\\0000001F20+2A4A*/
+/*@/// void to_perfect_hash::clean() */
+void to_perfect_hash::clean(
+  ) {
+// var
+  tpr_key mykey,next;
+// begin
+  mykey = this->priv.keys;
+  while (mykey != nil) {
+    next = mykey->next_k;
+    free(mykey);
+    mykey = next;
+    }
+  if (this->priv.p_tabb != nil)
+    free(this->priv.p_tabb);
+  if (this->gen_opt.p_tabb != nil)
+    free(this->gen_opt.p_tabb);
+  if (this->priv.p_scramble != nil)
+    free(this->priv.p_scramble);
+  if (this->priv.p_inv_hash != nil)
+    free(this->priv.p_inv_hash);
+  ZERO_OUT(this->state_engine);
+  ZERO_OUT(this->priv);
+  ZERO_OUT(this->gen_opt);
+  }
+/*@\\\+5210*/
+/*@/// t_bool to_perfect_hash::inv_hash(t_32u hash, t_32u &key, t_pointer &key_stuff) const */
+t_bool to_perfect_hash::inv_hash(
+  t_32u hash,
+  t_32u &key,
+  t_pointer &key_stuff
+  ) const {
+// var
+  tpr_key mykey;
+// begin
+  // Don't touch result parameters if there is no matching key
+  if ( (this->priv.p_inv_hash != nil) &&
+       (hash <= this->gen_opt.hash_used_max) ) {
+    mykey = this->priv.p_inv_hash[hash].key_h;
+    if (mykey != nil) {
+      key = mykey->key_k;
+      key_stuff = mykey->user_k;
+      return true;
+      }
+    }
+  return false;
+  }
+/*@\\\+4588*/
+
+/*@/// mycall void to_perfect_hash::calc_rev_cfg(tr_reversible &rev, t_32u min) */
+mycall void to_perfect_hash::calc_rev_cfg(tr_reversible &rev, t_32u min) {
+// var
+  tpr_key mykey;
+  t_32u key;
+  t_32u q;
+  t_32u v;
+// begin
+  /*@/// gcd = */
+  rev.gcd = 0;
+  mykey = this->priv.keys;
+  while (mykey != nil) {
+    key = mykey->key_k-min;
+    if (key != 0) {
+      if (rev.gcd == 0)
+        rev.gcd = key;
+      else
+        rev.gcd = gcd(rev.gcd, key);
+      }
+    mykey = mykey->next_k;
+    }
+  /*@\\\+DC53*/
+  /*@/// rev.ror,rev.inv = */
+  rev.ror = 0;
+  rev.inv = 0;
+  if (rev.gcd != 0) {
+    q = rev.gcd;
+    while ((q & 1) == 0) {
+      ++rev.ror;
+      q = q >> 1;
+      }
+    rev.inv = mul_inv_32(q);
+    }
+  /*@\\\+D247*/
+  /*@/// rev.max = */
+  rev.max = 0;
+  mykey = this->priv.keys;
+  while (mykey != nil) {
+    v = ror_32(mykey->key_k-min, rev.ror) * rev.inv;
+    if (v > rev.max)
+      rev.max = v;
+    mykey = mykey->next_k;
+    }
+  /*@\\\+4EBE*/
+  }
+/*@\\\0000000A01+F815*/
+/*@/// mycall t_bool to_perfect_hash::test_reversible() */
+mycall t_bool to_perfect_hash::test_reversible(
+  ) {
+// var
+  tpr_key mykey;
+  t_32u key;
+  t_32u min_u;
+  t_32s min_s;
+  tr_reversible rev_u;
+  tr_reversible rev_s;
+// begin
+  if (! assigned(this->cb.a_gen[HM_REVERSIBLE])) {
+    return false;
+    }
+
+  /*@/// min_*,max_* = */
+  mykey = this->priv.keys;
+  key = mykey->key_k;
+  min_u = key;
+  min_s = (t_32s)key;
+  while (mykey != nil) {
+    key = mykey->key_k;
+    if (min_u > key)
+      min_u = key;
+    if (min_s > (t_32s)key)
+      min_s = (t_32s)key;
+    mykey = mykey->next_k;
+    }
+  /*@\\\+BD75*/
+  calc_rev_cfg(rev_u, min_u);
+  calc_rev_cfg(rev_s, min_s);
+
+  if (rev_s.max < rev_u.max) {
+    // The values seem to be signed; optimize for that
+    rev_u = rev_s;
+    min_u = min_s;
+    }
+
+  if (rev_u.max >= 0x10000000) {
+    // forget it, way too large
+    return false;
+    }
+
+  // density = (t_float)this->priv.nkeys / (t_float)(rev_u.max+1);
+  if (this->calc_opt.minimal) {
+    if (rev_u.max+1 != this->priv.nkeys)  // (density != 1.0)
+      return false;
+    }
+  else {
+    if ((t_float)this->priv.nkeys / (t_float)(rev_u.max+1)  // density
+        < this->calc_opt.min_load_factor)
+      return false;
+    }
+
+  this->gen_opt.hash_method = HM_REVERSIBLE;
+  this->gen_opt.salt = min_u;
+  this->gen_opt.a_shr = rev_u.ror;
+  this->gen_opt.a_mask = rev_u.inv;
+  this->gen_opt.hash_used_max = rev_u.max;  // also filled by calc_max()
+  this->gen_opt.hash_gen_max = (t_32u)(-1);
+  this->gen_opt.check_value = (rev_u.max+1 != this->priv.nkeys);
+  this->gen_opt.check_range = true;
+  this->calc_max();  // fill inverse hash table
+  return true;
+  }
+/*@\\\003E000911000916000903000903001901001901+26FE*/
+
+/*@/// mycall t_bool to_perfect_hash::check_direct(...) */
+mycall t_bool to_perfect_hash::check_direct(
+  tq_hash_method hash_method,
+  t_32u x
+  ) {
+/*@/// var */
+// var
+  tpr_key mykey;
+  t_bool s[MAX_SET];
+  t_32u v;
+  tf_calc_ab f_calc_ab;
+  tf_generate f_gen;
+/*@\\\0000000313+40D1*/
+// begin
+  f_calc_ab = a_f_calc_ab[hash_method];
+  f_gen = this->cb.a_gen[hash_method];
+  if (! assigned(f_gen)) {
+    return false;
+    }
+
+  this->gen_opt.salt = x;
+  ZERO_MEM(s[0], MAX_SET*sizeof(s[0]));
+
+  /*@/// loop */
+  mykey = this->priv.keys;
+  while (mykey != nil) {
+    f_calc_ab(this->gen_opt, *mykey);
+    v = mykey->a_k;
+    if (s[v]) {
+      return false;
+      }
+    s[v] = true;
+    mykey = mykey->next_k;
+    }
+  /*@\\\+9122*/
+
+  this->gen_opt.hash_method = hash_method;
+  return true;
+  }
+/*@\\\0000000701+0BBE*/
+/*@/// mycall t_bool to_perfect_hash::test_simple_perfect() */
+mycall t_bool to_perfect_hash::test_simple_perfect(
+  ) {
+// var
+  t_bool found;
+  t_32u mul;
+  t_32u i;
+  t_32u bits;
+// begin
+  found = false;
+  bits = ceil_log2(this->priv.nkeys);
+  while (true) {
+    this->priv.alen = (t_32u)(1) << bits;
+    if (this->priv.alen > MAX_SET) {  // set operation
+      goto STOP;  // false; break while true
+      }
+    if ((t_float)this->priv.nkeys / (t_float)this->priv.alen
+        < this->calc_opt.min_load_factor) {
+      goto STOP;  // false; break while true
+      }
+    found = true;
+    this->gen_opt.a_shr = UB4BITS-bits;
+    this->gen_opt.a_mask = this->priv.alen-1;
+    // this->gen_opt.hash_used_max => below
+    this->gen_opt.hash_gen_max = this->priv.alen-1;
+    this->gen_opt.check_value = true;
+    this->gen_opt.check_range = false;
+    /*@/// if (this->check_direct(...))  goto STOP; */
+    if (this->check_direct(HM_SIMPLE_AND, 0))
+      goto STOP;  // true
+    if (this->check_direct(HM_SIMPLE_SHR, 0))
+      goto STOP;  // true
+    if (assigned(this->cb.a_gen[HM_SIMPLE_ROL])) {
+      for (i = 1; i <= 31; ++i) {
+        if (this->check_direct(HM_SIMPLE_ROL, i))
+          goto STOP;  // true
+        }
+      }
+    if (assigned(this->cb.a_gen[HM_SIMPLE_ROL_XOR])) {
+      for (i = 1; i <= 31; ++i) {
+        if (this->check_direct(HM_SIMPLE_ROL_XOR, i))
+          goto STOP;  // true
+        }
+      }
+    if (assigned(this->cb.a_gen[HM_SIMPLE_ROL_ADD])) {
+      for (i = 1; i <= 31; ++i) {
+        if (this->check_direct(HM_SIMPLE_ROL_ADD, i))
+          goto STOP;  // true
+        }
+      }
+    if (assigned(this->cb.a_gen[HM_SIMPLE_ROL_SUB])) {
+      for (i = 1; i <= 31; ++i) {
+        if (this->check_direct(HM_SIMPLE_ROL_SUB, i))
+          goto STOP;  // true
+        }
+      }
+
+    if (assigned(this->cb.a_gen[HM_SIMPLE_MUL])) {
+      mul = SEARCH_START;
+      for (i = 1; i <= this->calc_opt.nr_mul_checks; ++i) {
+        if (this->check_direct(HM_SIMPLE_MUL, mul))
+          goto STOP;  // true
+        mul += SEARCH_OFFSET;
+        }
+      }
+    /*@\\\0000001827+6F98*/
+    found = false;
+    if (this->calc_opt.minimal) {
+      goto STOP;  // false; break while true; no other tries
+      }
+    ++bits;
+    }
+STOP:
+
+  if (found) {
+    // this->gen_opt.hash_used_max => calc_max
+    this->calc_max();
+    }
+  return found;
+  }
+/*@\\\+15ED*/
+
+/*@/// mycall void to_perfect_hash::initalen() */
+mycall void to_perfect_hash::initalen(
+  ) {
+/*
+  returns
+    alen, initial alen
+    blen, initial blen
+    slen, power of two greater or equal to max hash value
+*/
+// guess initial values for alen and blen
+/*@/// doc */
+/*
+ * Find initial alen, blen
+ * Initial alen and blen values were found empirically.  Some factors:
+ *
+ * If slen<256 there is no scramble, so tab[b] needs to cover 0..slen-1.
+ *
+ * alen and blen must be powers of 2 because the values in 0..alen-1 and
+ * 0..blen-1 are produced by applying a bitmask to the initial hash function.
+ *
+ * alen must be less than slen, in fact less than nkeys, because otherwise
+ * there would often be no i such that a^p_scramble[i] is in 0..nkeys-1 for
+ * all the *a*s associated with a given *b*, so there would be no legal
+ * value to assign to tab[b].  This only matters when we're doing a minimal
+ * perfect hash.
+ *
+ * It takes around 800 trials to find distinct (a,b) with nkey=slen*(5/8)
+ * and alen*blen = slen*slen/32.
+ *
+ * Values of blen less than slen/4 never work, and slen/2 always works.
+ *
+ * We want blen as small as possible because it is the number of bytes in
+ * the huge array we must create for the perfect hash.
+ *
+ * When nkey <= slen*(5/8), blen=slen/4 works much more often with
+ * alen=slen/8 than with alen=slen/4.  Above slen*(5/8), blen=slen/4
+ * doesn't seem to care whether alen=slen/8 or alen=slen/4.  I think it
+ * has something to do with 5/8 = 1/8 * 5.  For example examine 80000,
+ * 85000, and 90000 keys with different values of alen.  This only matters
+ * if we're doing a minimal perfect hash.
+ *
+ * When alen*blen <= 1<<UB4BITS, the initial hash must produce one integer.
+ * Bigger than that it must produce two integers, which increases the
+ * cost of the hash per character hashed.
+ */
+/*@\\\0000000507+00CE*/
+// begin
+  this->priv.slen = (t_32u)(1) << ceil_log2(this->priv.nkeys);
+  // code moved from findhash
+
+  if (this->calc_opt.minimal) {
+    /*@/// minimal */
+    switch (ceil_log2(this->priv.slen)) {
+      /*@/// case 0: */
+      case 0: {
+        this->priv.alen = 1;
+        this->priv.blen = 1;
+        }
+      /*@\\\+9376*/
+      /*@/// case 1..8: */
+      case 1:
+      case 2:
+      case 3:
+      case 4:
+      case 5:
+      case 6:
+      case 7:
+      case 8: {
+        // if (! this->calc_opt.minimal)  // always false!
+        // BUG 4 in Jenkins' code
+        if (this->calc_opt.fast)  // this was probably intended
+          this->priv.alen = this->priv.slen;
+        else
+          this->priv.alen = this->priv.slen/2;
+        this->priv.blen = this->priv.slen/2;
+        break;
+        }
+      /*@\\\0000000937+D5FB*/
+      /*@/// case 9..17: */
+      case 9:
+      case 10:
+      case 11:
+      case 12:
+      case 13:
+      case 14:
+      case 15:
+      case 16:
+      case 17: {
+        if (this->calc_opt.fast) {
+          this->priv.alen = this->priv.slen/2;
+          this->priv.blen = this->priv.slen/4;
+          }
+        else if (this->priv.slen/4 < this->calc_opt.use_scramble) {
+          if (this->priv.nkeys <= this->priv.slen*0.52) {
+            this->priv.alen = this->priv.slen/8;
+            this->priv.blen = this->priv.slen/8;
+            }
+          else {
+            this->priv.alen = this->priv.slen/4;
+            this->priv.blen = this->priv.slen/4;
+            }
+          }
+        else {
+          if (this->priv.nkeys <= this->priv.slen*(5.0/8.0))
+            this->priv.alen = this->priv.slen/8;
+          else if (this->priv.nkeys <= this->priv.slen*(3.0/4.0))
+            this->priv.alen = this->priv.slen/4;
+          else
+            this->priv.alen = this->priv.slen/2;
+          this->priv.blen = this->priv.slen/4;  // always give the small size a shot
+          }
+        break;
+        }
+      /*@\\\0000000A1B+9543*/
+      /*@/// case 18: */
+      case 18: {
+        if (this->calc_opt.fast) {
+          this->priv.alen = this->priv.slen/2;
+          this->priv.blen = this->priv.slen/2;
+          }
+        else {
+          this->priv.alen = this->priv.slen/8;  // never require the multiword hash
+          if (this->priv.nkeys <= this->priv.slen*(5.0/8.0))
+            this->priv.blen = this->priv.slen/4;
+          else
+            this->priv.blen = this->priv.slen/2;
+          }
+        break;
+        }
+      /*@\\\000000021B+905F*/
+      /*@/// case 19..20: */
+      case 19:
+      case 20: {
+        if (this->priv.nkeys <= this->priv.slen*(5.0/8.0)) {
+          this->priv.alen = this->priv.slen/8;
+          this->priv.blen = this->priv.slen/4;
+          }
+        else {
+          this->priv.alen = this->priv.slen/2;
+          this->priv.blen = this->priv.slen/2;
+          }
+        break;
+        }
+      /*@\\\0000000206+A676*/
+      /*@/// else */
+      default: {
+        this->priv.alen = this->priv.slen/2;  // just find a hash as quick as possible
+        this->priv.blen = this->priv.slen/2;  // we'll be thrashing virtual memory at this size
+        break;
+        }
+      /*@\\\+0379*/
+      }
+    /*@\\\0000000301+8E8F*/
+    }
+  else {
+    /*@/// normal */
+    if ( (this->priv.nkeys >= 1024) &&  // parameter?
+         ( (this->priv.nkeys > this->priv.slen*this->calc_opt.keyspace_factor) ||
+           ( (this->calc_opt.fast) &&
+             (this->priv.nkeys > this->priv.slen*0.8) ))) {
+      this->priv.slen = this->priv.slen * 2;
+      }
+
+    if (this->priv.slen < UB4BITS)
+      this->priv.blen = this->priv.slen;  // go for function speed not space
+    else if (this->priv.slen/4 <= 1 << 14) {
+      if (this->priv.nkeys <= this->priv.slen*0.56)
+        this->priv.blen = this->priv.slen/32;
+      else if (this->priv.nkeys <= this->priv.slen*0.74)
+        this->priv.blen = this->priv.slen/16;
+      else
+        this->priv.blen = this->priv.slen/8;
+      }
+    else {
+      if (this->priv.nkeys <= this->priv.slen*0.6)
+        this->priv.blen = this->priv.slen/16;
+      else if (this->priv.nkeys <= this->priv.slen*0.8)
+        this->priv.blen = this->priv.slen/8;
+      else
+        this->priv.blen = this->priv.slen/4;
+      }
+
+    if ((this->calc_opt.fast) &&
+        (this->priv.blen < this->priv.slen/8))
+      this->priv.blen = this->priv.slen/8;
+
+
+    if (this->priv.slen > 131072) {
+      // blen must be defined! BUG 3 in Jenkins' code
+      this->priv.alen = ((t_32u)(1) << (UB4BITS-ceil_log2(this->priv.blen)));
+        // distinct keys => distinct (A,B)
+      }
+    else
+      this->priv.alen = this->priv.slen;  // no reason to restrict alen to slen/2
+
+    if (this->priv.alen < 1)
+      this->priv.alen = 1;
+    if (this->priv.blen < 1)
+      this->priv.blen = 1;
+    /*@\\\+4EC1*/
+    }
+  }
+/*@\\\0000000C0B+202B*/
+/*@/// mycall void to_perfect_hash::scrambleinit() */
+mycall void to_perfect_hash::scrambleinit(
+  ) {
+// initialize p_scramble[] with distinct random values in 0..slen-1
+// var
+  t_int i;
+  t_32u log_smax;
+  t_32u limit;
+// begin
+  /*@/// realloc this->priv.p_scramble */
+  if (this->priv.blen < this->calc_opt.use_scramble)  // see augment
+    limit = this->priv.slen;  // should be <>0
+  else
+    limit = 255+1;
+
+  if (this->priv.allocated_scramble == limit)
+    return;  // no change!
+  if (this->priv.p_scramble != nil) {
+    free(this->priv.p_scramble);
+    this->priv.p_scramble = nil;
+    }
+  this->priv.p_scramble =
+    (t_32u*)calloc(limit, sizeof(this->priv.p_scramble[0]));
+  this->priv.allocated_scramble = limit;
+  /*@\\\0000000C2F+AB44*/
+
+  // fill p_scramble[] with distinct random integers in 0..slen-1
+  log_smax = ceil_log2(this->priv.slen);
+  for (i = 0; i < (t_int)(limit); ++i) {
+    this->priv.p_scramble[i] = permute(i, log_smax);
+    }
+  }
+/*@\\\0000000801+2E24*/
+/*@/// mycall void to_perfect_hash::setlow() */
+mycall void to_perfect_hash::setlow(
+  ) {
+// find the highest and lowest bit where any key differs
+// var
+  t_int i;
+  tpr_key mykey;
+  t_32u firstkey;
+  t_32u my_diffbits;
+// begin
+  // mark the interesting bits in this->state_engine.mask
+  my_diffbits = 0;  // copy of this->priv.diffbits
+  mykey = this->priv.keys;
+  firstkey = mykey->key_k;  // mykey can not be nil here, see generate_hash()
+  while (mykey != nil) {
+    my_diffbits = my_diffbits | (firstkey ^ mykey->key_k);
+    if (my_diffbits == (t_32u)(-1))
+      break;  // early exit
+    mykey = mykey->next_k;
+    }  // while mykey != nil
+
+  // find the lowest interesting bit
+  for (i = 0; i <= UB4BITS-1; ++i) {
+    // if ((my_diffbits & ((t_32u)(1) << i)) != 0)
+    if (((my_diffbits >> i) & 1) != 0)
+      break;  // for i
+    }  // for i
+  this->priv.lowbit = i;
+
+  // find the highest interesting bit
+  for (i = UB4BITS-1; i >= 0; --i) {
+    // if ((my_diffbits & ((t_32u)(1) << i)) != 0)
+    if (((my_diffbits >> i) & 1) != 0)
+      break;  // for i
+    }  // for i
+  this->priv.highbit = i;
+  this->priv.diffbits = my_diffbits;
+  }
+/*@\\\0000000105+12A5*/
+/*@/// mycall void to_perfect_hash::inithex(...) */
+mycall void to_perfect_hash::inithex(
+  t_32u salt  // used to initialize the hash function
+  ) {
+/*@/// doc */
+/*
+ * Initialize (a,b) when keys are integers.
+ *
+ * Normally there's an initial hash which produces a number.  That hash takes
+ * an initializer.  Changing the initializer causes the initial hash to
+ * produce a different (uniformly distributed) number without any extra work.
+ *
+ * Well, here we start with a number.  There's no initial hash.  Any mixing
+ * costs extra work.  So we go through a lot of special cases to minimize the
+ * mixing needed to get distinct (a,b).  For small sets of keys, it's often
+ * fastest to skip the final hash and produce the perfect hash from the number
+ * directly.
+ *
+ * The target user for this is switch statement optimization.  The common case
+ * is 3 to 16 keys, and instruction counts matter.  The competition is a
+ * binary tree of branches.
+ *
+ * Return TRUE if we found a perfect hash and no more work is needed.
+ * Return FALSE if we just did an initial hash and more work is needed.
+ */
+/*@\\\+00CE*/
+/*@/// doc */
+/*
+ * Guns aren't enough.  Bring out the Bomb.  Use tab[].
+ * This finds the initial (a,b) when we need to use tab[].
+ *
+ * We need to produce a different (a,b) every time this is called.  Try all
+ * reasonable cases, fastest first.
+ *
+ * The initial mix (which this determines) can be filled into final starting
+ * at line[1].  val is set and a,b are declared.  The final hash (at line[7])
+ * is a^tab[b] or a^p_scramble[tab[b]].
+ *
+ * The code will probably look like this, minus some stuff:
+ *     val += CONSTANT;
+ *     val ^= (val<<16);
+ *     val += (val>>8);
+ *     val ^= (val<<4);
+ *     b = (val >> j) & 7;
+ *     a = (val + (val<<k)) >> 29;
+ *     return a^p_scramble[tab[b]];
+ * Note that *a* and tab[b] will be computed in parallel by most modern chips.
+ *
+ * i is the current state of the state machine.
+ * j and k are counters in the loops the states simulate.
+ */
+/*@\\\+00CE*/
+// var
+  t_32u alog;
+  t_32u blog;
+  t_32u addk;
+// begin
+  if (this->priv.diffbits == 0)  // optimized to be calculated only when needed
+    this->setlow();
+
+  if (salt == 1) {
+    this->state_engine.i = 1;
+    this->state_engine.j = 0;
+    this->state_engine.k = 0;
+    }
+
+  alog = ceil_log2(this->priv.alen);
+  blog = ceil_log2(this->priv.blen);
+  while (true) {
+    switch (this->state_engine.i) {
+      /*@/// case 1:   < HM_AB_1 */
+      case 1: {
+        // a = val>>30;  b=val&3
+        if (! assigned(this->cb.a_gen[HM_AB_1])) {
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        this->gen_opt.hash_method = HM_AB_1;
+        this->gen_opt.a_shl = UB4BITS-1-this->priv.highbit;
+        this->gen_opt.a_shr = UB4BITS-alog;
+        this->gen_opt.b_shr = this->priv.lowbit;
+        this->gen_opt.b_mask = this->priv.blen-1;
+
+        ++this->state_engine.i;
+        return;
+        break;
+        }
+      /*@\\\+A33A*/
+      /*@/// case 2:   < HM_AB_2 */
+      case 2: {
+        // a = val&3;  b=val>>30
+        if (! assigned(this->cb.a_gen[HM_AB_2])) {
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        this->gen_opt.hash_method = HM_AB_2;
+        this->gen_opt.a_shr = this->priv.lowbit;
+        this->gen_opt.a_mask = this->priv.alen-1;
+        this->gen_opt.b_shl = UB4BITS-1-this->priv.highbit;
+        this->gen_opt.b_shr = UB4BITS-blog;
+
+        ++this->state_engine.i;
+        return;
+        break;
+        }
+      /*@\\\+B2C4*/
+      /*@/// case 3:   < HM_AB_1 */
+      case 3: {
+        /*
+         * cases 3,4,5:
+         * for (k=lowbit; k<=highbit; ++k)
+         *   for (j=lowbit; j<=highbit; ++j)
+         *     b = (val>>j)&3;
+         *     a = (val<<k)>>30;
+         */
+        if (! assigned(this->cb.a_gen[HM_AB_1])) {
+          this->state_engine.i = 6;
+          continue;  // while true
+          }
+
+        this->state_engine.k = this->priv.lowbit;
+        this->state_engine.j = this->priv.lowbit;
+        ++this->state_engine.i;
+        break;
+        }
+      /*@\\\+C24C*/
+      /*@/// case 4:   < " */
+      case 4: {
+        if (this->state_engine.j >= this->priv.highbit) {
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        this->gen_opt.hash_method = HM_AB_1;
+        this->gen_opt.a_shl = UB4BITS-1-this->state_engine.k;
+        this->gen_opt.a_shr = UB4BITS-alog;
+        this->gen_opt.b_shr = this->state_engine.j;
+        this->gen_opt.b_mask = this->priv.blen-1;
+
+        ++this->state_engine.j;
+        while (this->state_engine.j < this->priv.highbit) {
+          if (( (this->priv.diffbits >> (this->state_engine.j)) &
+                (this->priv.blen-1) ) > 2)
+            break;  // while
+          ++this->state_engine.j;
+          }  // while
+        return;
+        break;
+        }
+      /*@\\\+3598*/
+      /*@/// case 5:     " */
+      case 5: {
+        ++this->state_engine.k;
+        while (this->state_engine.k < this->priv.highbit) {
+          if (( (( this->priv.diffbits <<
+                   (UB4BITS-1-this->state_engine.k) ) >> alog) &
+                (this->priv.alen-1) ) > 0)
+            break;  // while
+          ++this->state_engine.k;
+          }
+        if (this->state_engine.k >= this->priv.highbit) {
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+        this->state_engine.j = this->priv.lowbit;
+        this->state_engine.i = 4;
+        break;
+        }
+      /*@\\\+8E15*/
+      /*@/// case 6:     HM_AB_3 */
+      case 6: {
+        /*
+         * cases 6,7,8:
+         * for (k=0; k<UB4BITS-alog; ++k)
+         *   for (j=0; j<UB4BITS-blog; ++j)
+         *     val = val+f(salt);
+         *     val ^= (val >> 16);
+         *     val += (val << 8);
+         *     val ^= (val >> 4);
+         *     b = (val >> j) & 3;
+         *     a = (val + (val << k)) >> 30;
+         */
+        if (! assigned(this->cb.a_gen[HM_AB_3])) {
+          this->state_engine.i = 9;
+          continue;  // while true
+          }
+
+        this->state_engine.k = 0;
+        this->state_engine.j = 0;
+        ++this->state_engine.i;
+        break;
+        }
+      /*@\\\+29CD*/
+      /*@/// case 7:   < " */
+      case 7: {
+        // Just) something that will surely work
+        addk = 0x9e3779b9*salt;
+
+        if (this->state_engine.j > UB4BITS-blog) {
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        this->gen_opt.hash_method = HM_AB_3;
+        this->gen_opt.salt = addk;
+        this->gen_opt.do_xor_shr_16 =
+          (this->priv.highbit - this->priv.lowbit > 16-1);
+        this->gen_opt.do_add_shl_8 =
+          (this->priv.highbit - this->priv.lowbit > 8-1);
+        this->gen_opt.a_shl = this->state_engine.k;
+        this->gen_opt.a_shr = UB4BITS-alog;
+        this->gen_opt.b_shr = this->state_engine.j;
+        this->gen_opt.b_mask = this->priv.blen-1;
+
+        ++this->state_engine.j;
+        return;
+        break;
+        }
+      /*@\\\+B6A3*/
+      /*@/// case 8:     " */
+      case 8: {
+        ++this->state_engine.k;
+        if (this->state_engine.k > UB4BITS-alog) {
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        this->state_engine.j = 0;
+        this->state_engine.i = 7;
+        break;
+        }
+      /*@\\\+0F20*/
+      /*@/// case 9:     HM_AB_X1 */
+      case 9: {
+        if (! assigned(this->cb.a_gen[HM_AB_X1])) {
+          this->state_engine.i = 11;
+          continue;  // while true
+          }
+
+        this->state_engine.k = 0;
+        this->state_engine.j = 0;
+        ++this->state_engine.i;
+        break;
+        }
+      /*@\\\+18C6*/
+      /*@/// case 10:  < " */
+      case 10: {
+        addk = (SEARCH_START+salt*SEARCH_OFFSET) | 1;
+
+        if (this->state_engine.j > (UB4BITS-alog)*(UB4BITS-blog)) {
+          // same count as method 6/7/8
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        if (alog+blog+this->priv.lowbit > UB4BITS) {
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        this->gen_opt.hash_method = HM_AB_X1;
+        this->gen_opt.salt = addk;
+        this->gen_opt.a_shr = UB4BITS-(alog+blog+this->priv.lowbit);
+        this->gen_opt.a_mask = this->priv.alen-1;
+        this->gen_opt.b_shr = UB4BITS-blog;
+        this->gen_opt.b_mask = this->priv.blen-1;
+
+        ++this->state_engine.j;
+        return;
+        break;
+        }
+      /*@\\\0000000201+730D*/
+      /*@/// case 11:    HM_AB_X2 */
+      case 11: {
+        if (! assigned(this->cb.a_gen[HM_AB_X2])) {
+          this->state_engine.i = 13;
+          continue;  // while true
+          }
+
+        this->state_engine.k = 0;
+        this->state_engine.j = 0;
+        ++this->state_engine.i;
+        break;
+        }
+      /*@\\\+6B48*/
+      /*@/// case 12:  < " */
+      case 12: {
+        addk = SEARCH_START+salt*SEARCH_OFFSET;
+
+        if (this->state_engine.j > (UB4BITS-alog)*(UB4BITS-blog)) {
+          // same count as method 6/7/8
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        this->gen_opt.hash_method = HM_AB_X2;
+        this->gen_opt.salt = addk;
+        this->gen_opt.a_shr = (UB4BITS-alog-blog) & (UB4BITS-1);
+        this->gen_opt.a_mask = this->priv.alen-1;
+        this->gen_opt.b_shr = UB4BITS-blog;
+        this->gen_opt.b_mask = this->priv.blen-1;
+
+        ++this->state_engine.j;
+        return;
+        break;
+        }
+      /*@\\\000000010F+B86F*/
+      /*@/// case 13:    HM_AB_X3 */
+      case 13: {
+        if (! assigned(this->cb.a_gen[HM_AB_X3])) {
+          this->state_engine.i = 15;
+          continue;  // while true
+          }
+
+        this->state_engine.k = 0;
+        this->state_engine.j = 0;
+        ++this->state_engine.i;
+        break;
+        }
+      /*@\\\+8A5B*/
+      /*@/// case 14:  < " */
+      case 14: {
+        addk = SEARCH_START+salt*SEARCH_OFFSET;
+
+        if (this->state_engine.j > (UB4BITS-alog)*(UB4BITS-blog)) {
+          // same count as method 6/7/8
+          ++this->state_engine.i;
+          continue;  // while true
+          }
+
+        this->gen_opt.hash_method = HM_AB_X3;
+        this->gen_opt.salt = addk;
+        this->gen_opt.a_mask = this->priv.alen-1;
+        this->gen_opt.b_shr = UB4BITS-blog;
+
+        ++this->state_engine.j;
+        return;
+        break;
+        }
+      /*@\\\+FDD1*/
+      /*@/// case 15:    - */
+      case 15: {
+        if (assigned(this->cb.a_gen[HM_AB_3]))
+          this->state_engine.i = 6;
+        else if (assigned(this->cb.a_gen[HM_AB_X1]) ||
+                 assigned(this->cb.a_gen[HM_AB_X2]) ||
+                 assigned(this->cb.a_gen[HM_AB_X3]))
+          this->state_engine.i = 9;
+        else
+          this->state_engine.i = 1;
+        break;
+        }
+      /*@\\\0000000227+78B1*/
+      }
+    }
+  }
+/*@\\\0000002101+F10A*/
+/*@/// mycall tq_reason to_perfect_hash::inittab() */
+mycall tq_reason to_perfect_hash::inittab(
+  ) {
+/*@/// doc */
+/*
+ * Run a hash function on the key to get a and b
+ * Returns:
+    REASON_SUCCESS,  // found distinct (a,b) for all keys, put keys in tabb[]
+    REASON_FAILURE,  // didn't find distinct (a,b) for all keys
+    REASON_LIMIT,  // limit reached, early abort
+    REASON_DUPE);  // a real dupe found
+ */
+/*@\\\0000000801+00CE*/
+/*
+ * put keys in tabb according to tr_key->b_k
+ * check if (the initial hash might work
+ */
+// var
+  tpr_key mykey;
+  tpr_key otherkey;
+  tpr_bstuff p_bstuff;
+  tf_calc_ab f_calc_ab;
+// begin
+  f_calc_ab = a_f_calc_ab[this->gen_opt.hash_method];
+  ZERO_MEM(*this->priv.p_tabb, sizeof(this->priv.p_tabb[0])*this->priv.blen);
+
+  // Two keys with the same (a,b) guarantees a collision
+  mykey = this->priv.keys;
+  while (mykey != nil) {
+    f_calc_ab(this->gen_opt, *mykey);
+    p_bstuff = &this->priv.p_tabb[mykey->b_k];
+    otherkey = p_bstuff->list_b;
+    while (otherkey != nil) {
+      if (mykey->a_k == otherkey->a_k) {
+        // this->checkdup(mykey, otherkey);
+        if (mykey->key_k == otherkey->key_k) {
+          return REASON_DUPE;
+          }
+
+        return REASON_FAILURE;
+        }
+      otherkey = otherkey->nextb_k;
+      }
+
+    if (p_bstuff->listlen_b > this->calc_opt.listlen_limit) {
+      return REASON_LIMIT;  // speedup
+      }
+
+    ++p_bstuff->listlen_b;
+    mykey->nextb_k = p_bstuff->list_b;
+    p_bstuff->list_b = mykey;
+    mykey = mykey->next_k;
+    }
+
+  return REASON_SUCCESS;
+  }
+/*@\\\0000000104+FA14*/
+/*@/// mycall t_bool to_perfect_hash::apply(...) */
+mycall t_bool to_perfect_hash::apply(
+  tr_hstuff *tabh,  // arry
+  tr_qstuff *tabq,  // array
+  t_32u tail,
+  t_bool rollback  // FALSE applies augmenting path, TRUE rolls back
+  ) {
+// Try to apply an augmenting list
+// var
+  t_32u hash;
+  tpr_key mykey;
+  tpr_bstuff pb;
+  t_32u child;
+  t_32u parent;
+  t_32u stabb;  // p_scramble[tab[b]]
+  t_32u *my_p_scramble;  // array
+// begin
+  my_p_scramble = this->priv.p_scramble;  // local copy to speed up
+  // walk from child to parent
+  child = tail-1;
+  while (child != 0) {
+    parent = tabq[child].parent_q;  // find child's parent
+    pb     = tabq[parent].b_q;  // find parent's list of siblings
+
+    // erase old hash values
+    stabb = my_p_scramble[pb->val_b];
+    mykey = pb->list_b;
+    while (mykey != nil) {
+      hash = mykey->a_k ^ stabb;
+      if (mykey == tabh[hash].key_h) {
+        // erase hash for all of child's siblings
+        tabh[hash].key_h = nil;
+        }
+      mykey = mykey->nextb_k;
+      }  // while mykey != nil
+
+    // change pb->val_b, which will change the hashes of all parent siblings
+    if (rollback)
+      pb->val_b = tabq[child].oldval_q;
+    else
+      pb->val_b = tabq[child].newval_q;
+
+    // set new hash values
+    stabb = my_p_scramble[pb->val_b];
+    mykey = pb->list_b;
+    while (mykey != nil) {
+      hash = mykey->a_k ^ stabb;
+      if (rollback) {
+        if (parent == 0)
+          goto CONT;  // continue while, root never had a hash
+        }
+      else if (tabh[hash].key_h != nil) {
+        // very rare: roll back any changes
+        this->apply(tabh, tabq, tail, true);  // recusive call
+        return false;  // failure, collision
+        }
+      tabh[hash].key_h = mykey;
+CONT:
+      mykey = mykey->nextb_k;
+      }  // while mykey != nil
+    child = parent;
+    }  // while child != 0
+  return true;
+  }
+/*@\\\0000003F04+6934*/
+/*@/// mycall tq_reason to_perfect_hash::augment(...) */
+mycall tq_reason to_perfect_hash::augment(
+  tr_hstuff *tabh,  // array; which key is associated with which hash, indexed by hash
+  tr_qstuff *tabq,  // array; queue of *b* values, this is the spanning tree
+  tpr_bstuff item,  // &tabb[b] for the b to be mapped
+  t_32u highwater   // a value higher than any now in tabb[].water_b
+  ) {
+/*@/// doc */
+/*
+-------------------------------------------------------------------------------
+augment(): Add item to the mapping.
+
+Construct a spanning tree of *b*s with *item* as root, where each
+parent can have all its hashes changed (by some new val_b) with
+at most one collision, and each child is the b of that collision.
+
+I got this from Tarjan's "Data Structures and Network Algorithms".  The
+path from *item* to a *b* that can be remapped with no collision is
+an "augmenting path".  Change values of tab[b] along the path so that
+the unmapped key gets mapped and the unused hash value gets used.
+
+Assuming 1 key per b, if m out of n hash values are still unused,
+you should expect the transitive closure to cover n/m nodes before
+an unused node is found.  Sum(i=1..n)(n/i) is about nlogn, so expect
+this approach to take about nlogn time to map all single-key b's.
+-------------------------------------------------------------------------------
+*/
+/*@\\\+00CE*/
+/*@/// var */
+// var
+  t_32u q;                    // current position walking through the queue
+  t_32u tail;            // tail of the queue.  0 is the head of the queue.
+  t_32u limit;
+
+  tpr_bstuff myb;                                    // the b for this node
+  t_32u i;                                 // possible value for myb->val_b
+
+  tpr_bstuff childb;                           // the b that this i maps to
+  tpr_key mykey;                          // for walking through myb's keys
+
+  tpr_key childkey;
+  t_32u hash;
+
+  tpr_bstuff hitb;
+  tpr_qstuff my_qstuff;
+  t_32u my_scramble;
+  tr_bstuff *my_p_tabb;  // array
+  t_32u *my_p_scramble;  // array
+  t_32u my_highhash;
+  t_32u my_count_limit;
+  t_bool my_trans;
+  t_32u my_count;
+/*@\\\+40D1*/
+  tq_reason res;
+// begin
+  my_p_tabb = this->priv.p_tabb;  // local copy to speed up
+  my_p_scramble = this->priv.p_scramble;  // local copy to speed up
+  my_highhash = this->priv.highhash;  // local copy to speed up
+  my_count_limit = this->calc_opt.count_limit;  // local copy to speed up
+  my_trans = this->priv.trans;  // local copy to speed up
+
+  if (this->priv.blen < this->calc_opt.use_scramble)
+    limit = this->priv.slen;  // should be  != 0
+  else
+    limit = 255+1;
+
+  // initialize the root of the spanning tree
+  tabq[0].b_q = item;
+  tail = 1;
+
+  my_count = 0;
+  // construct the spanning tree by walking the queue, add children to tail
+  q = 0;
+  do {  // tail changes in loop
+    /*@/// loop q */
+    myb = tabq[q].b_q;  // the b for this node
+
+    for (i = 0; i < limit; ++i) {  // possible value for myb->val_b
+      ++my_count;
+      if (my_count > my_count_limit) {
+        res = REASON_LIMIT;
+        goto STOP;  // already too much work for this key, give up
+        }
+      /*@/// loop i */
+      my_scramble = my_p_scramble[i];
+      childb = nil;  // the b that this i maps to
+
+      mykey = myb->list_b;  // for walking through myb's keys
+      while (mykey != nil) {
+        /*@/// while mykey */
+        hash = mykey->a_k ^  my_scramble;
+
+        if (hash >= my_highhash) {
+          // break;  // while mykey, out of bounds
+          goto CONT;  // continue for i
+          // should trigger only for (near) minimal hash
+          }
+        childkey = tabh[hash].key_h;
+
+        if (childkey != nil) {
+          hitb = &my_p_tabb[childkey->b_k];
+
+          if (childb == nil) {
+            childb = hitb;  // remember this as childb
+            if (hitb->water_b == highwater) {  // hitb=childb
+              // break;  // while mykey, already explored
+              goto CONT;  // continue for i
+              }
+            }
+          else {
+            if (childb != hitb) {
+              // break;  // while mykey, hit at most one child b
+              goto CONT;  // continue for i
+              }
+            }
+          }
+        /*@\\\0000000115+DE38*/
+        mykey = mykey->nextb_k;
+        }  // while mykey
+
+      // if (mykey != nil)
+      //   continue;  // for i, myb with i has multiple collisions
+
+      // add childb to the queue of reachable things
+      my_qstuff = &tabq[tail];
+      my_qstuff->b_q = childb;
+      my_qstuff->oldval_q = myb->val_b;            // need this for rollback
+      my_qstuff->newval_q = i;     // how to make parent (myb) use this hash
+      my_qstuff->parent_q = q;
+      ++tail;
+
+      if (childb != nil)
+        childb->water_b = highwater;
+      else {
+        // found an *i* with no collisions?
+        // try to apply the augmenting path
+        if (this->apply(tabh, tabq, tail, false)) {
+          res = REASON_SUCCESS;
+          // EXIT;  // success, item was added to the perfect hash
+          goto STOP;  // success
+          }
+
+        --tail;  // don't know how to handle such a child!
+        }
+      /*@\\\0000000601+4DCB*/
+    CONT: {}
+      }  // for i
+    /*@\\\0000000901+4DD3*/
+    ++q;
+    } while ( (my_trans) &&  // don't do transitive closure, i.e. give up
+              (q < tail) );  // do..while q
+  res = REASON_FAILURE;  // while q exceeded, i.e. could not augment
+STOP:
+  this->gen_opt.count_for_limit += my_count;
+  // finalize statistics
+  if (my_count > this->gen_opt.max_count)
+    this->gen_opt.max_count = my_count;
+  return res;
+  }
+/*@\\\0000000801+D264*/
+/*@/// mycall tq_reason to_perfect_hash::perfect(...) */
+mycall tq_reason to_perfect_hash::perfect(
+  tr_hstuff *tabh,  // tar_hstuff
+  tr_qstuff *tabq  // tar_qstuff
+  ) {
+// find a mapping that makes this a perfect hash
+// var
+  t_32u maxkeys;  // maximum number of keys for any b
+  t_32u i, j;
+  tr_bstuff *my_p_tabb;  // array
+  tq_reason res;
+// begin
+  // clear any state from previous attempts
+  // assume this->priv.blen != 0
+  my_p_tabb = this->priv.p_tabb;  // local copy to speed up
+  ZERO_MEM(*tabh, sizeof(tabh[0])*this->priv.slen);  // BUG 1 in Jenkins' code
+  ZERO_MEM(*tabq, sizeof(tabq[0])*(this->priv.blen+1));
+
+  maxkeys = 0;
+  for (i = 0; i < this->priv.blen; ++i) {
+    if (my_p_tabb[i].listlen_b > maxkeys)
+      maxkeys = my_p_tabb[i].listlen_b;
+    }
+
+  TRACE("maxkeys="+num(maxkeys));
+
+  // In descending order by number of keys, map all *b*s
+  for (j = maxkeys; j >= 1; --j) {
+    for (i = 0; i < this->priv.blen; ++i) {
+      if (my_p_tabb[i].listlen_b == j) {
+        res = this->augment(tabh, tabq, &my_p_tabb[i], i+1);
+        if (res != REASON_SUCCESS) {
+          TRACE(
+            "fail to map group of size "+num(j)+" for tab size "+
+            num(this->priv.blen));
+          return res;
+          }
+        }
+      }
+    }
+
+  // Success!  We found a perfect hash of all keys into 0..nkeys-1.
+  return REASON_SUCCESS;
+  }
+/*@\\\+B48C*/
+/*@/// mycall void to_perfect_hash::finalize_perfect() */
+mycall void to_perfect_hash::finalize_perfect(
+  ) {
+// Optimized to get rid of tabb containing only 0 values
+// fill gen_opt
+// var
+  t_32u i;
+  t_bool zero;
+// begin
+  /*@/// kill tabb if all zero */
+  if (this->priv.blen > 0) {
+    zero = true;
+    for (i = 0; i < this->priv.blen; ++i) {
+      if (this->priv.p_tabb[i].val_b != 0) {
+        zero = false;
+        break;  // for i
+        }
+      }  // for i
+    if (zero) {
+      TRACE("tabb not needed");
+      if (this->priv.p_tabb != nil) {
+        free(this->priv.p_tabb);
+        this->priv.p_tabb = nil;
+        }
+      this->priv.blen = 0;
+      }
+    }
+  /*@\\\0000000A05+C82D*/
+  /*@/// fill gen_opt */
+  if (this->gen_opt.p_tabb != nil) {
+    free(this->gen_opt.p_tabb);
+    this->gen_opt.p_tabb = nil;
+    }
+  this->gen_opt.p_scramble = this->priv.p_scramble;
+  this->gen_opt.scramble_len = 0;
+  this->gen_opt.scramble_max = 0;
+  this->gen_opt.tabb_len = this->priv.blen;
+  this->gen_opt.tabb_max = 0;
+  if (this->priv.blen > 0) {
+    if (false) {}
+    else if ((this->priv.slen <= 255+1) ||
+            (this->priv.blen < this->calc_opt.use_scramble)) {
+      this->gen_opt.tabb_max = this->priv.slen-1;
+      this->gen_opt.p_tabb = (t_32u*)
+        calloc(this->priv.blen, sizeof(this->gen_opt.p_tabb[0]));
+      for (i = 0; i < this->priv.blen; ++i) {
+        this->gen_opt.p_tabb[i] =
+          this->priv.p_scramble[this->priv.p_tabb[i].val_b];
+        }
+      }
+    else {
+      this->gen_opt.tabb_max = 255;
+      this->gen_opt.scramble_len = 256;
+      this->gen_opt.scramble_max = this->priv.slen-1;
+      this->gen_opt.p_tabb = (t_32u*)
+        calloc(this->priv.blen, sizeof(this->gen_opt.p_tabb[0]));
+      for (i = 0; i < this->priv.blen; ++i)
+        this->gen_opt.p_tabb[i] = this->priv.p_tabb[i].val_b;
+      }
+    }
+  // this->gen_opt.hash_used_max => calc_max
+  this->calc_max();
+
+  this->gen_opt.hash_gen_max = this->priv.slen-1;
+  this->gen_opt.check_value = true;
+  this->gen_opt.check_range = false;
+  /*@\\\+CB63*/
+  }
+/*@\\\0000000901+AAC9*/
+/*@/// mycall t_bool to_perfect_hash::findhash() */
+mycall t_bool to_perfect_hash::findhash(
+  ) {
+/*@/// doc */
+/*
+  ** Try to find a perfect hash function.
+  ** Return the successful initializer for the initial hash.
+  ** Return 0 if (no perfect hash could be found.
+*/
+/*@\\\0000000103+00CE*/
+// var
+  t_32u bad_initkey;  // how many times did initkey() fail?
+  t_32u bad_perfect;  // how many times did perfect() fail?
+  t_32u salt;  // trial initializer for initial hash
+  t_32u maxalen;
+  tr_hstuff *p_tabh;  // array; table of keys indexed by hash value
+  tr_qstuff *p_tabq;  // array; table of stuff indexed by queue value,
+                      // used by augment()
+  tq_reason rslinit;
+  t_bool res;
+// begin
+  this->priv.diffbits = 0;
+
+  if (! assigned(this->cb.a_gen[HM_AB_1]) &&
+      ! assigned(this->cb.a_gen[HM_AB_2]) &&
+      ! assigned(this->cb.a_gen[HM_AB_3]) &&
+      ! assigned(this->cb.a_gen[HM_AB_X1]) &&
+      ! assigned(this->cb.a_gen[HM_AB_X2]) &&
+      ! assigned(this->cb.a_gen[HM_AB_X3])) {
+    return false;
+    }
+
+  this->gen_opt.count_for_limit = 0;
+  this->gen_opt.max_count = 0;
+
+  p_tabh = nil;
+  p_tabq = nil;
+  // try
+
+    // guess initial values for slen, alen and blen
+    this->initalen();
+
+    this->scrambleinit();
+
+    if (this->calc_opt.minimal)
+      maxalen = this->priv.slen / 2;
+    else
+      maxalen = this->priv.slen;
+
+    if (this->calc_opt.minimal)
+      this->priv.highhash = this->priv.nkeys;
+    else {
+      // this->priv.highhash = this->priv.slen;
+      this->priv.highhash = min_32(
+        this->priv.slen,
+        (t_32u)(this->priv.nkeys * this->calc_opt.near_minimal_factor));
+        // round
+      }
+    this->priv.trans =
+      (! this->calc_opt.fast) ||
+      (this->calc_opt.minimal) ||
+      (this->calc_opt.near_minimal_factor < 2.0);
+
+    // allocate working memory
+    /*@/// realloc this->priv.p_tabb */
+    if (this->priv.p_tabb != nil) {
+      free(this->priv.p_tabb);
+      this->priv.p_tabb = nil;
+      }
+    this->priv.p_tabb = (tr_bstuff*)calloc(this->priv.blen, sizeof(tr_bstuff));
+    /*@\\\0000000527+F0D4*/
+    p_tabq = (tr_qstuff*)calloc(this->priv.blen+1, sizeof(tr_qstuff));
+    p_tabh = (tr_hstuff*)calloc(this->priv.slen, sizeof(tr_hstuff));
+      // BUG 1 in Jenkins' code
+
+    // Actually find the perfect hash
+    bad_initkey = 0;
+    bad_perfect = 0;
+    salt = 1;
+    TRACE("Using "+num(this->priv.alen)+" / "+num(this->priv.blen));
+    while (true) {
+      /*@/// Try to find distinct (A,B) for all keys */
+      // Try to find distinct (A,B) for all keys
+      TRACE_ODO(num(salt));
+
+      // the initial hash of the keys
+      this->inithex(salt);
+      rslinit = this->inittab();
+      switch (rslinit) {
+        /*@/// case REASON_FAILURE,REASON_LIMIT: didn't find distinct (a,b) */
+        case REASON_FAILURE:
+        case REASON_LIMIT: {
+          // didn't find distinct (a,b)
+          ++bad_initkey;
+          if (bad_initkey >= this->calc_opt.retry_initkey) {
+            // Try to put more bits in (A,B) to make distinct (A,B) more likely
+            if (this->priv.alen < maxalen) {
+              this->priv.alen = this->priv.alen*2;
+              }
+            else if (this->priv.blen < this->priv.slen) {
+              this->priv.blen = this->priv.blen*2;
+              /*@/// realloc p_tabq */
+              if (p_tabq != nil) {
+                free(p_tabq);
+                p_tabq = nil;
+                }
+              p_tabq = (tr_qstuff*)calloc(this->priv.blen+1, sizeof(tr_qstuff));
+              /*@\\\0000000530+3869*/
+              /*@/// realloc this->priv.p_tabb */
+              if (this->priv.p_tabb != nil) {
+                free(this->priv.p_tabb);
+                this->priv.p_tabb = nil;
+                }
+              this->priv.p_tabb =
+                (tpr_bstuff)calloc(this->priv.blen, sizeof(tr_bstuff));
+              /*@\\\0000000539+F0D4*/
+              scrambleinit();
+              }
+            else {
+              TRACE("fatal error: Cannot perfect hash: cannot find distinct (A,B)");
+              res = false;
+              goto EXIT;
+              }
+            TRACE(
+              num(salt)+": enlarging table to "+num(this->priv.alen)+" / "+
+              num(this->priv.blen));
+            bad_initkey = 0;
+            bad_perfect = 0;
+            this->state_engine.i = 1;  // try simpler versions first
+            }
+          goto CONT;  // two keys have same (a,b) pair
+          break;
+          }
+        /*@\\\0000001A3D+2144*/
+        /*@/// case REASON_DUPE: a real dupe found */
+        case REASON_DUPE: {
+          res = false;
+          goto EXIT;
+          break;
+          }
+        /*@\\\0000000401+4D42*/
+        case REASON_SUCCESS: break;
+        }
+
+      TRACE(num(salt)+": found distinct (A,B) ["+num(this->state_engine.i)+"]");
+
+      // Given distinct (A,B) for all keys, build a perfect hash
+      rslinit = this->perfect(p_tabh, p_tabq);
+      if (rslinit != REASON_SUCCESS) {
+        /*@/// no success */
+        if (rslinit == REASON_LIMIT) {
+          TRACE("max count reached");
+          // bad_perfect += 10;
+          }
+        if ((this->calc_opt.work_limit != 0) &&
+           (this->gen_opt.count_for_limit > this->calc_opt.work_limit)) {
+          TRACE("max work reached");
+          bad_perfect = this->calc_opt.retry_perfect;
+          this->gen_opt.count_for_limit = 0;
+          }
+
+        ++bad_perfect;
+        if (bad_perfect >= this->calc_opt.retry_perfect) {
+          if ((this->priv.blen < this->priv.slen)) {
+            this->priv.blen = this->priv.blen * 2;
+            /*@/// realloc p_tabq */
+            if (p_tabq != nil) {
+              free(p_tabq);
+              p_tabq = nil;
+              }
+            p_tabq = (tr_qstuff*)calloc(this->priv.blen+1, sizeof(tr_qstuff));
+            /*@\\\0000000530+3869*/
+            /*@/// realloc this->priv.p_tabb */
+            if (this->priv.p_tabb != nil) {
+              free(this->priv.p_tabb);
+              this->priv.p_tabb = nil;
+              }
+            this->priv.p_tabb =
+              (tpr_bstuff)calloc(this->priv.blen, sizeof(tr_bstuff));
+            /*@\\\0000000539+F0D4*/
+            scrambleinit();
+            // --salt;  // we know this salt got distinct (A,B)
+            this->state_engine.i = 1;  // better retry with simpler versions
+            TRACE(
+              num(salt)+": enlarging table to "+num(this->priv.alen)+" / "+
+              num(this->priv.blen));
+            }
+          else {
+            TRACE("fatal error: Cannot perfect hash: cannot build tab[]");
+            res = false;
+            goto EXIT;
+            }
+          bad_perfect = 0;
+          }
+        /*@\\\0000000314+E62E*/
+        goto CONT;  // continue while true
+        }
+
+      // BREAK;  // while true
+      goto DONE;  // break while true
+      /*@\\\0000001201+A291*/
+CONT:
+      ++salt;
+      }  // while true
+
+DONE:
+    TRACE(num(salt)+": built perfect hash table of size "+num(this->priv.blen));
+    this->finalize_perfect();
+    res = true;
+
+EXIT: {}
+  // finally
+    // free working memory
+    if (p_tabh != nil)
+      free(p_tabh);
+    if (p_tabq != nil)
+      free(p_tabq);
+
+  return res;
+  }
+/*@\\\0000001229+F101*/
+
+/*@/// mycall void to_perfect_hash::calc_max() */
+mycall void to_perfect_hash::calc_max(
+  ) {
+// var
+  t_32u max;
+  t_32u v;
+  tpr_key mykey;
+  t_32u key_u;
+  t_32u min_u;
+  t_32u max_u;
+  t_32s key_s;
+  t_32s min_s;
+  t_32s max_s;
+  t_32u bits;
+  t_bool use_signed;
+  tf_calc_ab f_calc_ab;
+// begin
+  f_calc_ab = a_f_calc_ab[this->gen_opt.hash_method];
+  max = 0;
+  mykey = this->priv.keys;
+  while (mykey != nil) {
+    f_calc_ab(this->gen_opt, *mykey);
+    v = calc_hash_tab(this->gen_opt, *mykey);
+    if (v > max)
+      max = v;
+    mykey = mykey->next_k;
+    }
+  this->gen_opt.hash_used_max = max;
+  this->gen_opt.is_minimal = (max+1 == this->priv.nkeys);
+
+  this->priv.p_inv_hash = (tr_hstuff*)calloc(max+1, sizeof(tr_hstuff));
+  mykey = this->priv.keys;
+  while (mykey != nil) {
+    v = calc_hash_tab(this->gen_opt, *mykey);
+    this->priv.p_inv_hash[v].key_h = mykey;
+    mykey = mykey->next_k;
+    }
+
+  mykey = this->priv.keys;
+  key_u = mykey->key_k;
+  min_u = key_u;
+  max_u = key_u;
+  key_s = (t_32s)(mykey->key_k);
+  min_s = key_s;
+  max_s = key_s;
+  while (mykey != nil) {
+    key_u = mykey->key_k;
+    if (min_u > key_u)
+      min_u = key_u;
+    if (max_u < key_u)
+      max_u = key_u;
+    key_s = (t_32s)(mykey->key_k);
+    if (min_s > key_s)
+      min_s = key_s;
+    if (max_s < key_s)
+      max_s = key_s;
+    mykey = mykey->next_k;
+    }
+
+  if (max_u <= 0xff) {
+    bits = 8;
+    use_signed = false;
+    }
+  else if ((-0x80 <= min_s) && (max_s <= 0x7f)) {
+    bits = 8;
+    use_signed = true;
+    }
+  else if (max_u <= 0xffff) {
+    bits = 16;
+    use_signed = false;
+    }
+  else if ((-0x8000 <= min_s) && (max_s <= 0x7fff)) {
+    bits = 16;
+    use_signed = true;
+    }
+  else {
+    bits = 32;
+    use_signed = false;
+    }
+
+  this->gen_opt.use_signed = use_signed;
+  this->gen_opt.bits = bits;
+  if (use_signed) {
+    this->gen_opt.low = min_u;
+    this->gen_opt.high = max_u;
+    }
+  else {
+    this->gen_opt.low = (t_32u)(min_s);
+    this->gen_opt.high = (t_32u)(max_s);
+    }
+  }
+/*@\\\0000000E01+5E07*/
+/*@\\\0000000E01+0E34*/
+/*@\\\0001000011*/

Property changes on: lib/CodeGen/SelectionDAG/hashlib.cpp
___________________________________________________________________
Added: svn:author
   + jneumann
Added: svn:keywords
   + Date Revision Author Id URL

Index: lib/CodeGen/SelectionDAG/gen_hash.inc
===================================================================
--- lib/CodeGen/SelectionDAG/gen_hash.inc	(revision 0)
+++ lib/CodeGen/SelectionDAG/gen_hash.inc	(revision 0)
@@ -0,0 +1,1039 @@
+//===-- gen_hash.inc (SelectionDAGBuilder.cpp) - Selection-DAG building ---===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This implements switch lowering by hashing using the library hashlib.
+//
+//===----------------------------------------------------------------------===//
+
+
+// TODO: Double dispatch raises error (MVT::Other needed), see visitJumpTable
+//       Hence double dispatch is currently commented out
+// TODO: (?) Replace double dispatch jump table by bit test for #dest=2
+// TODO: (?) Double dispatch using 1/2/4 bit table
+// TODO: Lots of parameters steering double dispatch and table value sizes
+// TODO: For double dispatch: Re-issue as new switch statement instead of jmp[],
+//       see also rev.199025, no endless loop
+//       Could this be a job for SelectionDAGBuilder::Clusterify?
+//       See also lib/Transforms/Utils/SimplifyCFG.cpp, class SwitchLookupTable
+// TODO: Comments on the table accesses in *.s files; how?
+// TODO: Pre-hash for 64 bit input values / labels; allowed?
+// TODO: Treat ranges (pushback) => BigRanges
+//       I don't know where to put the code
+//       (they currently land after jmp [] and are optimized away)
+//       Currently deactivated by switch, i.e. all values are hashed
+// TODO: Compare # large ranges with # cases matched by hashing
+// TODO: JTable/DTable need only have hash_used_max+1 elements
+//       This is a reason to keep hashing minimal
+// TODO: (?) Before ind. jump: Replace
+//       if (VTable[h]!=x)  goto default  // cond. jump
+//    => if (VTable[h]!=x)  h=any_hash_value_which_maps_to_default  // cmov
+// TODO: (?) Before ind. jump: Replace
+//       if (h>max_h)  goto default  // cond. jump
+//    => if (h>max_h)  h=any_hash_value_which_maps_to_default  // cmov
+// TODO: Instead of deciding whether to hash at all before hashlib runs
+//       it might be better to let it run and decide whether the result
+//       is good enough by using some cost function
+// TODO: Better decision if MUL is slow (reversible, gen_mul, gen_ab_x*)
+// Check: UMUL_LOHI implied by MUL?
+// TODO: Prefer gen_ab_3 or gen_ab_x3?
+// TODO: Even if MUL disallowed still use reversible hashing (without mul)
+// TODO: Even if JT disallowed still use hashing (few jump targets)
+//
+// What can be done to prevent AVs in case of heap overflow?
+
+static void gen_rev(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_rev: "
+  //  << "ror_32(x+" << (t_32u)(-Opt.salt) << ", " << Opt.a_shr
+  //  << ") * " << Opt.a_mask
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue V = *HCtx->Source;
+
+  // ror_32(x - gen_opt.salt, gen_opt.a_shr) * gen_opt.a_mask;
+  if (Opt.salt != 0) {
+    V = SDB->DAG.getNode(
+      ISD::SUB, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.salt, VT));
+  }
+
+  if (Opt.a_shr != 0) {
+    V = SDB->DAG.getNode(
+      ISD::ROTR, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_shr, VT));
+  }
+
+  if (Opt.a_mask != 1) {
+    V = SDB->DAG.getNode(
+      ISD::MUL, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_mask, VT));
+  }
+
+  HCtx->Target = V;
+}
+
+static void gen_and(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_and: "
+  //  << "x & " << Opt.a_mask
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue V = *HCtx->Source;
+
+  // x & gen_opt.a_mask;
+  if (Opt.a_mask != (t_32u)(-1)) {
+    V = SDB->DAG.getNode(
+      ISD::AND, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_mask, VT));
+  }
+
+  HCtx->Target = V;
+}
+static void gen_shr(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_shr: "
+  //  << "x >> " << Opt.a_shr
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue V = *HCtx->Source;
+
+  // x >> gen_opt.a_shr;
+  if (Opt.a_shr != 0) {
+    V = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_shr, VT));
+  }
+
+  HCtx->Target = V;
+}
+static void gen_rol(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_rol: "
+  //  << "rol_32(x, " << Opt.salt << ") & " << Opt.a_mask
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue V = *HCtx->Source;
+
+  // rol_32(x, Opt.salt) & Opt.a_mask;
+  if (Opt.salt != 0) {
+    V = SDB->DAG.getNode(
+      ISD::ROTL, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.salt, VT));
+  }
+
+  if (Opt.a_mask != (t_32u)(-1)) {
+    V = SDB->DAG.getNode(
+      ISD::AND, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_mask, VT));
+  }
+
+  HCtx->Target = V;
+}
+static void gen_rol_xor(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_rol: "
+  //  << "(rol_32(x, " << Opt.salt << ") ^ x) & " << Opt.a_mask
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue V = *HCtx->Source;
+
+  // (rol_32(x, Opt.salt) ^ x) & Opt.a_mask;
+  if (Opt.salt != 0) {
+    V = SDB->DAG.getNode(
+      ISD::ROTL, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.salt, VT));
+  }
+
+  V = SDB->DAG.getNode(
+    ISD::XOR, SDB->getCurSDLoc(), VT,
+    V,
+    *HCtx->Source);
+
+  if (Opt.a_mask != (t_32u)(-1)) {
+    V = SDB->DAG.getNode(
+      ISD::AND, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_mask, VT));
+  }
+
+  HCtx->Target = V;
+}
+static void gen_rol_add(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_rol: "
+  //  << "(rol_32(x, " << Opt.salt << ") + x) & " << Opt.a_mask
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue V = *HCtx->Source;
+
+  // (rol_32(x, Opt.salt) + x) & Opt.a_mask;
+  if (Opt.salt != 0) {
+    V = SDB->DAG.getNode(
+      ISD::ROTL, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.salt, VT));
+  }
+
+  V = SDB->DAG.getNode(
+    ISD::ADD, SDB->getCurSDLoc(), VT,
+    V,
+    *HCtx->Source);
+
+  if (Opt.a_mask != (t_32u)(-1)) {
+    V = SDB->DAG.getNode(
+      ISD::AND, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_mask, VT));
+  }
+
+  HCtx->Target = V;
+}
+static void gen_rol_sub(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_rol: "
+  //  << "(rol_32(x, " << Opt.salt << ") - x) & " << Opt.a_mask
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue V = *HCtx->Source;
+
+  // (rol_32(x, Opt.salt) - x) & Opt.a_mask;
+  if (Opt.salt != 0) {
+    V = SDB->DAG.getNode(
+      ISD::ROTL, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.salt, VT));
+  }
+
+  V = SDB->DAG.getNode(
+    ISD::SUB, SDB->getCurSDLoc(), VT,
+    V,
+    *HCtx->Source);
+
+  if (Opt.a_mask != (t_32u)(-1)) {
+    V = SDB->DAG.getNode(
+      ISD::AND, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_mask, VT));
+  }
+
+  HCtx->Target = V;
+}
+static void gen_mul(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_mul: "
+  //  << "(t_32u)(x * " << Opt.salt << ") >> " << Opt.a_shr
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue V = *HCtx->Source;
+
+  // (t_32u)(x * gen_opt.salt) >> gen_opt.a_shr;
+  if (Opt.salt != 0) {
+    V = SDB->DAG.getNode(
+      ISD::MUL, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.salt, VT));
+  }
+
+  if (Opt.a_shr != 0) {
+    V = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      V,
+      SDB->DAG.getConstant(Opt.a_shr, VT));
+  }
+
+  HCtx->Target = V;
+}
+
+static void calc_tab(const tr_gen_opt &Opt, t_pointer GenStuff,
+                     SDValue &a, SDValue &b) {
+  //DEBUG(dbgs() << "calc_tab"
+  //  << "\n");
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  if (Opt.tabb_len == 0) {
+    // a
+    // do not calculate b
+    HCtx->Target = a;
+  } else if (Opt.tabb_len == 1) {
+    // a ^ Opt.p_tabb[0]
+    // do not calculate b, tabb[0]=const!
+    HCtx->Target = SDB->DAG.getNode(
+      ISD::XOR, SDB->getCurSDLoc(), VT,
+      a,
+      SDB->DAG.getConstant(Opt.p_tabb[0], VT));
+  } else {
+    // a ^ Opt.p_tabb[b] / a ^ Opt.p_scramble[Opt.p_tabb[b]]
+    // ConvVal = Opt.p_tabb[b]
+    const TargetLowering *TLI = SDB->TM.getTargetLowering();
+    EVT PTy = TLI->getPointerTy();
+    SDValue HashPtr = SDB->DAG.getZExtOrTrunc(b, SDB->getCurSDLoc(), PTy);
+
+    IntegerType *ValType = IntegerType::get(*SDB->Context,HCtx->BBits);
+    // DEBUG(dbgs() << "ValBType=" << *ValType << "\n");
+
+    SDValue VTab = SDB->DAG.getConstantPool(HCtx->BTable, PTy,
+      SDB->TD->getPrefTypeAlignment(ValType));
+    unsigned Alignment = cast<ConstantPoolSDNode>(VTab)->getAlignment();
+    unsigned EltSize = (unsigned)SDB->TD->getTypeAllocSize(ValType);
+    SDValue VIdx = SDB->DAG.getNode(ISD::MUL, SDB->getCurSDLoc(), PTy,
+                                    HashPtr,
+                                    SDB->DAG.getConstant(EltSize, PTy));
+    SDValue VOfs = SDB->DAG.getNode(ISD::ADD, SDB->getCurSDLoc(), PTy,
+                                    VIdx,
+                                    VTab);
+
+    SDValue CmpVal = SDB->DAG.getLoad(
+      MVT::getIntegerVT(HCtx->BBits), SDB->getCurSDLoc(),
+      SDB->DAG.getEntryNode(),
+      VOfs,
+      MachinePointerInfo::getConstantPool(), false,
+      false, false, Alignment);
+
+    SDValue ConvVal =
+      SDB->DAG.getZExtOrTrunc(CmpVal, SDB->getCurSDLoc(), MVT::i32);
+
+    if (Opt.scramble_len != 0) {
+      // ConvVal = Opt.p_scramble[ConvVal]
+      IntegerType *ValType = IntegerType::get(*SDB->Context,HCtx->SBits);
+      // DEBUG(dbgs() << "ValSType=" << *ValType << "\n");
+
+      VTab = SDB->DAG.getConstantPool(HCtx->STable, PTy,
+        SDB->TD->getPrefTypeAlignment(ValType));
+      Alignment = cast<ConstantPoolSDNode>(VTab)->getAlignment();
+      EltSize = (unsigned)SDB->TD->getTypeAllocSize(ValType);
+      ConvVal = SDB->DAG.getZExtOrTrunc(ConvVal, SDB->getCurSDLoc(), PTy);
+      VIdx = SDB->DAG.getNode(ISD::MUL, SDB->getCurSDLoc(), PTy,
+                                 ConvVal,
+                                 SDB->DAG.getConstant(EltSize, PTy));
+      VOfs = SDB->DAG.getNode(ISD::ADD, SDB->getCurSDLoc(), PTy,
+                                 VIdx,
+                                 VTab);
+
+      CmpVal = SDB->DAG.getLoad(
+        MVT::getIntegerVT(HCtx->SBits), SDB->getCurSDLoc(),
+        SDB->DAG.getEntryNode(),
+        VOfs,
+        MachinePointerInfo::getConstantPool(), false,
+        false, false, Alignment);
+
+      ConvVal = SDB->DAG.getZExtOrTrunc(CmpVal, SDB->getCurSDLoc(), MVT::i32);
+    }
+
+    // Target = a ^ ConvVal
+    HCtx->Target = SDB->DAG.getNode(
+      ISD::XOR, SDB->getCurSDLoc(), VT,
+      a,
+      ConvVal);
+  }
+}
+static void gen_ab_1(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_ab_1"
+  //  << "\n");
+
+  // a = (x << Opt.a_shl) >> Opt.a_shr;
+  // b = (x >> Opt.b_shr) & Opt.b_mask;
+  // tab(a,b);
+
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue a = *HCtx->Source;
+  if (Opt.a_shl != 0) {
+    a = SDB->DAG.getNode(
+      ISD::SHL, SDB->getCurSDLoc(), VT,
+      a,
+      SDB->DAG.getConstant(Opt.a_shl, VT));
+  }
+  if (Opt.a_shr != 0) {
+    a = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      a,
+      SDB->DAG.getConstant(Opt.a_shr, VT));
+  }
+
+  SDValue b = *HCtx->Source;
+  if (Opt.b_shr != 0) {
+    b = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      b,
+      SDB->DAG.getConstant(Opt.b_shr, VT));
+  }
+  if (Opt.b_mask != (t_32u)(-1)) {
+    b = SDB->DAG.getNode(
+      ISD::AND, SDB->getCurSDLoc(), VT,
+      b,
+      SDB->DAG.getConstant(Opt.b_mask, VT));
+  }
+
+  calc_tab(Opt, GenStuff, a, b);
+}
+static void gen_ab_2(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_ab_2"
+  //  << "\n");
+
+  // a = (x >> Opt.a_shl) & Opt.a_mask;
+  // b = (x << Opt.b_shr) >> Opt.b_shr;
+  // tab(a,b);
+
+
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue a = *HCtx->Source;
+  if (Opt.a_shr != 0) {
+    a = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      a,
+      SDB->DAG.getConstant(Opt.a_shr, VT));
+
+  }
+  if (Opt.a_mask != (t_32u)(-1)) {
+    a = SDB->DAG.getNode(
+      ISD::AND, SDB->getCurSDLoc(), VT,
+      a,
+      SDB->DAG.getConstant(Opt.a_mask, VT));
+  }
+
+  SDValue b = *HCtx->Source;
+  if (Opt.b_shl != 0) {
+    b = SDB->DAG.getNode(
+      ISD::SHL, SDB->getCurSDLoc(), VT,
+      b,
+      SDB->DAG.getConstant(Opt.b_shl, VT));
+  }
+  if (Opt.b_shr != 0) {
+    b = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      b,
+      SDB->DAG.getConstant(Opt.b_shr, VT));
+  }
+
+  calc_tab(Opt, GenStuff, a, b);
+}
+static void gen_ab_3(const tr_gen_opt &Opt, t_pointer GenStuff) {
+  //DEBUG(dbgs() << "gen_ab_3"
+  //  << "\n");
+
+  // x = x + Opt.salt;
+  // if (Opt.do_xor_shr_16)
+  //   x = x ^ (x >> 16);
+  // if (Opt.do_add_shl_8)
+  //   x = x + (x << 8);
+  // x = x ^ (x >> 4);
+  // if (Opt.a_shl == 0)
+  //   a = x >> Opt.a_shr;
+  // else
+  //   a = ((x << Opt.a_shl) + x) >> Opt.a_shr;
+  // b = (x >> Opt.b_shr) & Opt.b_mask;
+  // tab(a,b);
+
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue a,b,t;
+  SDValue x = *HCtx->Source;
+
+  // x = x + Opt.salt;
+  if (Opt.salt != 0) {
+    x = SDB->DAG.getNode(
+      ISD::ADD, SDB->getCurSDLoc(), VT,
+      x,
+      SDB->DAG.getConstant(Opt.salt, VT));
+  }
+
+  if (Opt.do_xor_shr_16) {
+    // x = x ^ (x >> 16);
+    t = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      x,
+      SDB->DAG.getConstant(16, VT));
+    x = SDB->DAG.getNode(
+      ISD::XOR, SDB->getCurSDLoc(), VT,
+      t,
+      x);
+  }
+
+  if (Opt.do_add_shl_8) {
+    // x = x + (x << 8);
+    t = SDB->DAG.getNode(
+      ISD::SHL, SDB->getCurSDLoc(), VT,
+      x,
+      SDB->DAG.getConstant(8, VT));
+    x = SDB->DAG.getNode(
+      ISD::ADD, SDB->getCurSDLoc(), VT,
+      t,
+      x);
+  }
+
+  // x = x ^ (x >> 4);
+  t = SDB->DAG.getNode(
+    ISD::SRL, SDB->getCurSDLoc(), VT,
+    x,
+    SDB->DAG.getConstant(4, VT));
+  x = SDB->DAG.getNode(
+    ISD::XOR, SDB->getCurSDLoc(), VT,
+    t,
+    x);
+
+  if (Opt.a_shl == 0) {
+    // a = x >> Opt.a_shr;
+    a = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      x,
+      SDB->DAG.getConstant(Opt.a_shr, VT));
+  } else {
+    // a = ((x << Opt.a_shl) + x) >> Opt.a_shr;
+    a = SDB->DAG.getNode(
+      ISD::SHL, SDB->getCurSDLoc(), VT,
+      x,
+      SDB->DAG.getConstant(Opt.a_shl, VT));
+    a = SDB->DAG.getNode(
+      ISD::ADD, SDB->getCurSDLoc(), VT,
+      a,
+      x);
+    a = SDB->DAG.getNode(
+      ISD::SRL, SDB->getCurSDLoc(), VT,
+      a,
+      SDB->DAG.getConstant(Opt.a_shr, VT));
+  }
+
+  // b = (x >> Opt.b_shr) & Opt.b_mask;
+  b = SDB->DAG.getNode(
+    ISD::SRL, SDB->getCurSDLoc(), VT,
+    x,
+    SDB->DAG.getConstant(Opt.b_shr, VT));
+  b = SDB->DAG.getNode(
+    ISD::AND, SDB->getCurSDLoc(), VT,
+    b,
+    SDB->DAG.getConstant(Opt.b_mask, VT));
+
+  calc_tab(Opt, GenStuff, a, b);
+}
+static void gen_ab_x1(const tr_gen_opt &Opt, t_pointer GenStuff) {
+// currently not used
+  //DEBUG(dbgs() << "gen_ab_x1"
+  //  << "\n");
+
+  // unsigned t = x * Opt.salt;
+  // a = (t >> Opt.a_shr) & Opt.a_mask;
+  // b = t >> Opt.b_shr;
+  // tab(a,b);
+
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue x = *HCtx->Source;
+
+  SDValue Mul = SDB->DAG.getNode(ISD::MUL, SDB->getCurSDLoc(), VT,
+                                 x,
+                                 SDB->DAG.getConstant(Opt.salt, VT));
+
+  SDValue a = SDB->DAG.getNode(
+    ISD::SRL, SDB->getCurSDLoc(), VT,
+    Mul,
+    SDB->DAG.getConstant(Opt.a_shr, VT));
+  a = SDB->DAG.getNode(
+    ISD::AND, SDB->getCurSDLoc(), VT,
+    a,
+    SDB->DAG.getConstant(Opt.a_mask, VT));
+
+  SDValue b = SDB->DAG.getNode(
+    ISD::SRL, SDB->getCurSDLoc(), VT,
+    Mul,
+    SDB->DAG.getConstant(Opt.b_shr, VT));
+
+  calc_tab(Opt, GenStuff, a, b);
+}
+static void gen_ab_x3(const tr_gen_opt &Opt, t_pointer GenStuff) {
+// currently not used
+  //DEBUG(dbgs() << "gen_ab_x3"
+  //  << "\n");
+
+  // typedef unsigned long long LL;
+  // LL t = (LL)x * (LL)Opt.salt;
+  // a = (unsigned)(t >> 32) & Opt.a_mask;
+  // b = (unsigned)t >> Opt.b_shr;
+  // tab(a,b);
+
+
+  HashContextType *HCtx = (HashContextType *)(GenStuff);
+
+  EVT VT = MVT::i32;  // HCtx->Source->getValueType();
+  const SelectionDAGBuilder *SDB = HCtx->SDB;
+
+  SDValue x = *HCtx->Source;
+
+#if 0
+  SDValue Factor = SDB->DAG.getConstant(Opt.salt, VT);
+  Factor = SDB->DAG.getZExtOrTrunc(Factor, SDB->getCurSDLoc(), MVT::i64);
+  SDValue ExpandedX = SDB->DAG.getZExtOrTrunc(x, SDB->getCurSDLoc(), MVT::i64);
+  SDValue Mul = SDB->DAG.getNode(ISD::MUL, SDB->getCurSDLoc(), MVT::i64,
+                                 ExpandedX,
+                                 Factor);
+
+  SDValue a = SDB->DAG.getNode(
+    ISD::SRL, SDB->getCurSDLoc(), MVT::i64,
+    Mul,
+    SDB->DAG.getConstant(32, MVT::i64));
+  a = SDB->DAG.getZExtOrTrunc(a, SDB->getCurSDLoc(), VT);
+  a = SDB->DAG.getNode(
+    ISD::AND, SDB->getCurSDLoc(), VT,
+    a,
+    SDB->DAG.getConstant(Opt.a_mask, VT));
+
+  SDValue b = SDB->DAG.getZExtOrTrunc(Mul, SDB->getCurSDLoc(), VT);
+  b = SDB->DAG.getNode(
+    ISD::SRL, SDB->getCurSDLoc(), VT,
+    b,
+    SDB->DAG.getConstant(Opt.b_shr, VT));
+#else
+  SDVTList VTs = SDB->DAG.getVTList(VT, VT);
+  SDValue Mul = SDB->DAG.getNode(ISD::UMUL_LOHI, SDB->getCurSDLoc(), VTs,
+                                 x,
+                                 SDB->DAG.getConstant(Opt.salt, VT));
+
+  SDValue a = SDB->DAG.getNode(
+    ISD::AND, SDB->getCurSDLoc(), VT,
+    Mul.getValue(1),
+    SDB->DAG.getConstant(Opt.a_mask, VT));
+
+  SDValue b = SDB->DAG.getNode(
+    ISD::SRL, SDB->getCurSDLoc(), VT,
+    Mul.getValue(0),
+    SDB->DAG.getConstant(Opt.b_shr, VT));
+#endif
+
+  calc_tab(Opt, GenStuff, a, b);
+}
+
+bool SelectionDAGBuilder::handleHashSwitchCase(CaseRec &CR,
+                                               CaseRecVector &WorkList,
+                                               const Value *SV,
+                                               MachineBasicBlock *Default,
+                                               MachineBasicBlock *SwitchBB) {
+
+  // first checks
+  const TargetLowering *TLI = TM.getTargetLowering();
+  EVT PTy = TLI->getPointerTy();
+
+  if (! areJTsAllowed(*TLI))
+    return false;
+
+  if (!TLI->isOperationLegal(ISD::ROTL, PTy))
+    return false;
+
+  if (!TLI->isOperationLegal(ISD::ROTR, PTy))
+    return false;
+
+  if (!TLI->isOperationLegal(ISD::SHL, PTy))
+    return false;
+
+  if (!TLI->isOperationLegal(ISD::SRL, PTy))
+    return false;
+
+  Case& FrontCase = *CR.Range.first;
+  Case& BackCase  = *(CR.Range.second-1);
+
+  const APInt &First = cast<ConstantInt>(FrontCase.Low)->getValue();
+  const APInt &Last  = cast<ConstantInt>(BackCase.High)->getValue();
+
+  if (First.getBitWidth() > 32) {
+    DEBUG(dbgs()
+      << "handleHashSwitchCase: Too many bits, NeededBits="
+      << First.getBitWidth()
+      << "\n");
+    return false;
+  }
+
+  APInt TSize(First.getBitWidth(), 0);  // sum of all ranges
+  typedef std::map<const MachineBasicBlock *, uint64_t> MBBMap;
+  MBBMap DestMap;  // map of all used jump targets
+  std::vector<MachineBasicBlock *> DestVector;
+  std::pair<MBBMap::iterator,bool> ret;
+  uint64_t NrIfs = 0;  // # if needed for a decision tree
+  uint64_t NrRanges = 0;  // # of ranges of single values
+  uint64_t MaxRange = 0;  // largest range
+  uint64_t DestIndex = 0;  // # different jump targets
+  for (CaseItr I = CR.Range.first, E = CR.Range.second; I != E; ++I) {
+    uint64_t Size = I->size().getLimitedValue(UINT64_MAX);
+    if (Size <= SwitchHashMaxRange) {
+      ret = DestMap.insert(MBBMap::value_type(I->BB,DestIndex));
+      if (ret.second) {
+        ++DestIndex;
+        DestVector.push_back(I->BB);
+      }
+      TSize += I->size();
+      ++NrRanges;
+      if (MaxRange < Size)
+        MaxRange = Size;
+      if (Size <= 1)
+        NrIfs += 1;
+      else
+        NrIfs += 2;
+    }
+  }
+
+  DEBUG(dbgs()
+      // << "# SwitchHashMinIf=" << SwitchHashMinIf
+      << "# ranges=" << NrRanges
+      << ", MaxRange=" << MaxRange
+      << ", TSize=" << TSize
+      << ", # ifs=" << NrIfs
+      << ", # DestMap=" << DestMap.size()  // =DestIndex
+      << "\n");
+
+  if (TSize.ult(TLI->getMinimumJumpTableEntries())) {
+    DEBUG(dbgs()
+      << "handleHashSwitchCase: Not enough entries\n");
+    return false;
+  }
+
+  uint64_t IntTSize = TSize.getLimitedValue(UINT64_MAX/100);
+
+  if (IntTSize > SwitchHashMaxCases) {
+    DEBUG(dbgs()
+      << "Too many cases, IntTSize=" << IntTSize
+      << "\n");
+    return false;
+  }
+
+  if (IntTSize * SwitchHashMinUsage > NrIfs * 100) {
+    DEBUG(dbgs()
+      << "Hash not dense enough, IntTSize=" << IntTSize << ", NrIfs=" << NrIfs
+      << "\n");
+    return false;
+  }
+
+
+  to_perfect_hash *PerfHash = new to_perfect_hash;
+
+  // PerfHash->calc_opt.* =
+  if (SwitchHashMulTries != 0)
+    PerfHash->calc_opt.nr_mul_checks = SwitchHashMulTries;
+  // listlen_limit
+  // retry_initkey
+  // retry_perfect
+  PerfHash->calc_opt.use_scramble = 512;
+  // count_limit;
+  // work_limit
+  // min_load_factor
+  // near_minimal_factor
+  // keyspace_factor
+  // minimal
+  // fast
+
+  // PerfHash->cb.* =
+  // see visitSwitch
+  //   bit tests                   0x00000001
+  //   small ranges                0x00000002
+  //   jump table                  0x00000004
+  if ( (SwitchMethods & 0x00000010) &&
+       TLI->isOperationLegal(ISD::MUL, PTy) )
+    PerfHash->cb.a_gen[HM_REVERSIBLE] = gen_rev;
+  if (SwitchMethods &   0x00000100)
+    PerfHash->cb.a_gen[HM_SIMPLE_AND] = gen_and;
+  if (SwitchMethods &   0x00000200)
+    PerfHash->cb.a_gen[HM_SIMPLE_SHR] = gen_shr;
+  if (SwitchMethods &   0x00000400)
+    PerfHash->cb.a_gen[HM_SIMPLE_ROL] = gen_rol;
+  if (SwitchMethods &   0x00000800)
+    PerfHash->cb.a_gen[HM_SIMPLE_ROL_XOR] = gen_rol_xor;
+  if (SwitchMethods &   0x00001000)
+    PerfHash->cb.a_gen[HM_SIMPLE_ROL_ADD] = gen_rol_add;
+  if (SwitchMethods &   0x00002000)
+    PerfHash->cb.a_gen[HM_SIMPLE_ROL_SUB] = gen_rol_sub;
+  if ( (SwitchMethods & 0x00004000) &&
+       TLI->isOperationLegal(ISD::MUL, PTy) )
+    PerfHash->cb.a_gen[HM_SIMPLE_MUL] = gen_mul;
+  if (SwitchMethods &   0x00010000)
+    PerfHash->cb.a_gen[HM_AB_1] = gen_ab_1;
+  if (SwitchMethods &   0x00020000)
+    PerfHash->cb.a_gen[HM_AB_2] = gen_ab_2;
+  if (SwitchMethods &   0x00040000)
+    PerfHash->cb.a_gen[HM_AB_3] = gen_ab_3;
+  if ( (SwitchMethods & 0x00100000) &&
+       TLI->isOperationLegal(ISD::MUL, PTy) )
+    PerfHash->cb.a_gen[HM_AB_X1] = gen_ab_x1;
+  if ( (SwitchMethods & 0x00400000) &&
+       TLI->isOperationLegal(ISD::UMUL_LOHI, PTy) )
+    PerfHash->cb.a_gen[HM_AB_X3] = gen_ab_x3;
+
+  // PerfHash->add_key(...)
+  bool OK = true;
+  assert(BigRanges.empty());
+  // BigRanges.clear();
+  for (CaseItr I = CR.Range.first, E = CR.Range.second; I != E; ++I) {
+    uint64_t Size = I->size().getLimitedValue(UINT64_MAX);
+    if (Size > SwitchHashMaxRange) {
+      // FIXME: activate line below when big range handling works
+      // BigRanges.push_back(*I);
+      OK = false;
+    } else {
+      APInt N = cast<ConstantInt>(I->Low)->getValue(),
+            N2 = cast<ConstantInt>(I->High)->getValue();
+      for (; N.sle(N2); ++N)
+        PerfHash->add_key((uint32_t)(*N.getRawData()), (t_pointer)this);
+    }
+  }
+
+  // other checks
+  if (! OK) {
+    DEBUG(dbgs()
+      << "handleHashSwitchCase: Too large ranges detected\n");
+    delete(PerfHash);
+    BigRanges.clear();
+    return false;
+  }
+
+  if (! PerfHash->generate_hash()) {
+    DEBUG(dbgs()
+      << "handleHashSwitchCase: No perfect hash found\n");
+    delete(PerfHash);
+    BigRanges.clear();
+    return false;
+  }
+
+  if ( (NrIfs < SwitchHashMinIf) &&
+       ( (PerfHash->gen_opt.hash_method != HM_REVERSIBLE) ||
+         (PerfHash->gen_opt.a_mask != 1) ) ) {
+    DEBUG(dbgs()
+      << "Hash not enough ifs, NrIfs=" << NrIfs
+      << "\n");
+    delete(PerfHash);
+    BigRanges.clear();
+    return false;
+  }
+
+  DEBUG(dbgs()
+    << "handleHashSwitchCase: Perfect hash found\n");
+  // generate code
+  // Get the MachineFunction which holds the current MBB.  This is used when
+  // inserting any additional MBBs necessary to represent the switch.
+  MachineFunction *CurMF = FuncInfo.MF;
+
+  // Figure out which block is immediately after the current one.
+  MachineFunction::iterator BBI = CR.CaseBB;
+  ++BBI;
+
+  const BasicBlock *LLVMBB = CR.CaseBB->getBasicBlock();
+
+  // Create a new basic block to hold the code for loading the address
+  // of the jump table, and jumping to it.  Update successor information;
+  // we will either branch to the default case for the switch, or the jump
+  // table.
+  MachineBasicBlock *JumpTableBB = CurMF->CreateMachineBasicBlock(LLVMBB);
+  CurMF->insert(BBI, JumpTableBB);
+
+  addSuccessorWithWeight(CR.CaseBB, Default);
+  addSuccessorWithWeight(CR.CaseBB, JumpTableBB);
+
+  // Build a vector of destination BBs, corresponding to each target
+  // of the jump table. If the value of the jump table slot corresponds to
+  // a case statement, push the case's BB onto the vector, otherwise, push
+  // the default BB.
+
+  if (! PerfHash->gen_opt.is_minimal) {
+    ret = DestMap.insert(MBBMap::value_type(Default,DestIndex));
+    if (ret.second) {
+      ++DestIndex;
+      DestVector.push_back(Default);
+    }
+  }
+
+  uint32_t HashRange;
+  if (PerfHash->gen_opt.check_range)
+    HashRange = PerfHash->gen_opt.hash_used_max+1;
+  else
+    HashRange = PerfHash->gen_opt.hash_gen_max+1;
+  // dbgs() << "HashRange=" << HashRange << "\n";
+
+  bool DoubleDispatch =
+    SwitchDoubleDispatch &&
+    (DestMap.size() <= 65536) &&  // use bytes/words
+    (DestMap.size() > 1) &&       // to get this special case running
+    (HashRange >= 8) &&              // TODO: 8 => parameter
+    (HashRange > DestMap.size()*2);  // TODO: 2 => parameter
+  uint32_t BB_range;
+  int DBits = 0;
+  std::vector<MachineBasicBlock*> DestBBs;
+  Constant *DTable = 0;
+  if (DoubleDispatch) {
+    // double dispatch
+    if (DestMap.size() <= 256)
+      DBits = 8;
+    else
+      DBits = 16;
+    BB_range = DestMap.size();
+    {for (uint32_t H = 0; H < BB_range; ++H) {
+      DestBBs.push_back(DestVector[H]);
+    }}
+
+    IntegerType *ValType = IntegerType::get(*Context, DBits);
+    Constant **DestIndex = new Constant *[HashRange];
+
+    // Fill with dummys
+    {
+      uint32_t index = DestMap.find(Default)->second;
+      for (uint32_t H = 0; H < HashRange; ++H)
+        DestIndex[H] = ConstantInt::get(ValType, index, false);
+    }
+    // And now the real entries
+    {for (CaseItr I = CR.Range.first, E = CR.Range.second; I != E; ++I) {
+      uint32_t index = DestMap.find(I->BB)->second;
+      for (APInt N = cast<ConstantInt>(I->Low)->getValue(),
+                 N2 = cast<ConstantInt>(I->High)->getValue();
+          N.sle(N2); ++N) {
+        uint32_t H = PerfHash->calc_hash((uint32_t)(*N.getRawData()));
+        assert((H < HashRange) && "hash out of range");
+        DestIndex[H] = ConstantInt::get(ValType, index, false);
+      }
+    }}
+
+    // Create a ConstantArray of the constants.
+    DTable = ConstantArray::get(
+      ArrayType::get(ValType, HashRange),
+          ArrayRef<Constant *>(DestIndex, HashRange));
+  } else {
+    // normal dispatch
+    BB_range = HashRange;
+    // Fill with dummys
+    {for (uint32_t H = 0; H < BB_range; ++H) {
+      DestBBs.push_back(Default);
+    }}
+    // And now the real entries
+    {for (CaseItr I = CR.Range.first, E = CR.Range.second; I != E; ++I) {
+      for (APInt N = cast<ConstantInt>(I->Low)->getValue(),
+                 N2 = cast<ConstantInt>(I->High)->getValue();
+          N.sle(N2); ++N) {
+        uint32_t H = PerfHash->calc_hash((uint32_t)(*N.getRawData()));
+        assert((H < HashRange) && "hash out of range");
+        DestBBs[H] = I->BB;
+        // dbgs() << I->BB << "\n";
+      }
+    }}
+  }
+
+  // Calculate weight for each unique destination in CR.
+  DenseMap<MachineBasicBlock*, uint32_t> DestWeights;
+  if (FuncInfo.BPI) {
+    for (CaseItr I = CR.Range.first, E = CR.Range.second; I != E; ++I) {
+      DenseMap<MachineBasicBlock*, uint32_t>::iterator Itr =
+          DestWeights.find(I->BB);
+      if (Itr != DestWeights.end())
+        Itr->second += I->ExtraWeight;
+      else
+        DestWeights[I->BB] = I->ExtraWeight;
+    }
+  }
+
+  // Update successor info. Add one edge to each unique successor.
+  BitVector SuccsHandled(CR.CaseBB->getParent()->getNumBlockIDs());
+  for (std::vector<MachineBasicBlock*>::iterator I = DestBBs.begin(),
+         E = DestBBs.end(); I != E; ++I) {
+    if (!SuccsHandled[(*I)->getNumber()]) {
+      SuccsHandled[(*I)->getNumber()] = true;
+      DenseMap<MachineBasicBlock*, uint32_t>::iterator Itr =
+          DestWeights.find(*I);
+      addSuccessorWithWeight(JumpTableBB, *I,
+                             Itr != DestWeights.end() ? Itr->second : 0);
+    }
+  }
+
+  // Create a jump table index for this jump table.
+  unsigned JTEncoding = TLI->getJumpTableEncoding();
+  unsigned JTI = CurMF->getOrCreateJumpTableInfo(JTEncoding)
+                       ->createJumpTableIndex(DestBBs);
+
+  // Set the jump table information so that we can codegen it as a second
+  // MachineBasicBlock
+  JumpTable JT(-1U, JTI, JumpTableBB, Default, DTable, DBits);
+  JT.JSize = DestMap.size();   // not necessarily correct but sufficient
+  JT.SingleBB = FrontCase.BB;
+
+  PerfHash->ref_count = 1;  // prepare and allow for copying JTH struct
+  JumpTableHeader JTH(First, Last, SV, CR.CaseBB, (CR.CaseBB == SwitchBB),
+                      PerfHash);
+  if (CR.CaseBB == SwitchBB)
+    visitJumpTableHeader(JT, JTH, SwitchBB);
+
+  JTCases.push_back(JumpTableBlock(JTH, JT));
+
+  // FIXME: This does not put the code into the right location (after ind. jump)
+  if (! BigRanges.empty()) {
+    BigRange = CaseRange(BigRanges.begin(), BigRanges.end());
+    MachineFunction *CurMF = FuncInfo.MF;
+    // const BasicBlock *LLVMBB = CR.CaseBB->getBasicBlock();
+    const BasicBlock *LLVMBB = Default->getBasicBlock();
+    // const BasicBlock *LLVMBB = SwitchBB->getBasicBlock();
+    MachineBasicBlock *BigBB = CurMF->CreateMachineBasicBlock(LLVMBB);
+    CurMF->insert(BBI, BigBB);
+    WorkList.push_back(CaseRec(BigBB, CR.LT, CR.GE, BigRange));
+  }
+
+  // delete(PerfHash) not here since JTH contains the reference
+
+  SplitMiddle = true;
+  return true;
+}

Property changes on: lib/CodeGen/SelectionDAG/gen_hash.inc
___________________________________________________________________
Added: svn:author
   + jneumann

Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
===================================================================
--- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h	(revision 200612)
+++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h	(working copy)
@@ -23,6 +23,7 @@
 #include "llvm/Support/CallSite.h"
 #include "llvm/Support/ErrorHandling.h"
 #include <vector>
+#include "hashlib.hpp"
 
 namespace llvm {
 
@@ -235,7 +236,9 @@
 
   struct JumpTable {
     JumpTable(unsigned R, unsigned J, MachineBasicBlock *M,
-              MachineBasicBlock *D): Reg(R), JTI(J), MBB(M), Default(D) {}
+              MachineBasicBlock *D, Constant *DT = 0, int DB = 0):
+      Reg(R), JTI(J), MBB(M), Default(D),
+      JSize(0), SingleBB(0), DTable(DT), DBits(DB) {}
 
     /// Reg - the virtual register containing the index of the jump table entry
     //. to jump to.
@@ -247,16 +250,48 @@
     /// Default - the MBB of the default bb, which is a successor of the range
     /// check MBB.  This is when updating PHI nodes in successors.
     MachineBasicBlock *Default;
+
+    uint64_t JSize;  // jump table size, for special case of only one entry
+    MachineBasicBlock *SingleBB;  // for one entry; the one and only jump target
+    Constant *DTable;  // for double-dispatch
+    int DBits;  // for double-dispatch
   };
+  CaseVector BigRanges;
+  bool SplitMiddle;
+  CaseRange BigRange;
+
   struct JumpTableHeader {
+    // See also SelectionDAGBuilder::handleHashSwitchCase
     JumpTableHeader(APInt F, APInt L, const Value *SV, MachineBasicBlock *H,
-                    bool E = false):
-      First(F), Last(L), SValue(SV), HeaderBB(H), Emitted(E) {}
+                    bool E = false, to_perfect_hash *P = 0):
+      First(F), Last(L), SValue(SV), HeaderBB(H), Emitted(E), PerfHash(P) {
+      // cout << "JumpTableHeader\n";
+      }
+    ~JumpTableHeader() {
+      // cout << "~JumpTableHeader\n";
+      if (PerfHash) {
+        PerfHash->ref_count--;
+        // cout << "PerfHash--\n";
+        if (PerfHash->ref_count==0) {
+          // cout << "~PerfHash\n";
+          delete(PerfHash);
+          }
+        }
+      }
+    JumpTableHeader(const JumpTableHeader &X) {
+      // cout << "JumpTableHeader copy\n";
+      memcpy(this, &X, sizeof(X));
+      if (X.PerfHash) {
+        // cout << "PerfHash++\n";
+        PerfHash->ref_count++;
+        }
+      }
     APInt First;
     APInt Last;
     const Value *SValue;
     MachineBasicBlock *HeaderBB;
     bool Emitted;
+    to_perfect_hash *PerfHash;
   };
   typedef std::pair<JumpTableHeader, JumpTable> JumpTableBlock;
 
@@ -486,6 +521,7 @@
   };
 
 private:
+public:  // public because of calc_tab, make it friend?
   const TargetMachine &TM;
 public:
   /// Lowest valid SDNodeOrder. The special case 0 is reserved for scheduling
@@ -541,6 +577,7 @@
     : CurInst(NULL), SDNodeOrder(LowestSDNodeOrder), TM(dag.getTarget()),
       DAG(dag), FuncInfo(funcinfo), OptLevel(ol),
       HasTailCall(false) {
+    SplitMiddle = false;
   }
 
   void init(GCFunctionInfo *gfi, AliasAnalysis &aa,
@@ -655,6 +692,11 @@
                           const Value* SV,
                           MachineBasicBlock* Default,
                           MachineBasicBlock *SwitchBB);
+  bool handleHashSwitchCase(CaseRec& CR,
+                            CaseRecVector& WorkList,
+                            const Value* SV,
+                            MachineBasicBlock* Default,
+                            MachineBasicBlock *SwitchBB);
   bool handleBTSplitSwitchCase(CaseRec& CR,
                                CaseRecVector& WorkList,
                                const Value* SV,
Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp	(revision 200612)
+++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp	(working copy)
@@ -12,6 +12,7 @@
 //===----------------------------------------------------------------------===//
 
 #define DEBUG_TYPE "isel"
+#include "hashlib.cpp"
 #include "SelectionDAGBuilder.h"
 #include "SDNodeDbgValue.h"
 #include "llvm/ADT/BitVector.h"
@@ -66,6 +67,8 @@
 /// some float libcalls (6, 8 or 12 bits).
 static unsigned LimitFloatPrecision;
 
+// Options, sample usage: clang ... -mllvm -switch-split-middle=true ...
+
 static cl::opt<unsigned, true>
 LimitFPPrecision("limit-float-precision",
                  cl::desc("Generate low-precision inline sequences "
@@ -73,6 +76,58 @@
                  cl::location(LimitFloatPrecision),
                  cl::init(0));
 
+static cl::opt<unsigned>
+SwitchMethods("switch-methods",
+  cl::desc("Switch via perfect hashing: Bit field of allowed methods"),
+  // cl::init((unsigned)(0xffffffff)));  // all methods
+  cl::init((unsigned)(0xfffffffb)));  // all methods but classic jump table
+  // cl::init((unsigned)(0x0000000f)));  // all classic methods
+
+static cl::opt<unsigned>
+SwitchHashMinUsage("switch-hash-min-usage",
+  cl::desc("Switch via hashing: Minimum percentage usage of jump table"),
+  cl::init(30));  // counting replaced ifs (1 for single values, 2 for ranges)
+
+static cl::opt<unsigned>
+SwitchHashMaxRange("switch-hash-max-range",
+  cl::desc("Switch via hashing: max # cases in one range"),
+  cl::init((unsigned)(-1)));  // accept all ranges
+
+static cl::opt<unsigned>
+SwitchHashMinIf("switch-hash-min-if",
+  cl::desc("Switch via hashing: Minimum # of ifs to be replaced"),
+  cl::init(6));  // not for reversible hashing with factor=2**n
+
+static cl::opt<unsigned>
+SwitchHashMulTries("switch-hash-mul-tries",
+  cl::desc("Switch via hashing: # tries for multiplication"),
+  cl::init(0));  // use hashlib's default
+
+static cl::opt<unsigned>
+SwitchHashMaxCases("switch-hash-max-cases",
+  cl::desc("Switch via hashing: max # cases total"),
+  cl::init(700000));
+
+static cl::opt<bool>
+SwitchDoubleDispatch("switch-double-dispatch",
+  cl::desc("Allow switch jump table via double dispatch?"),
+  cl::init(false));  // currently defective - deactivated
+
+static cl::opt<bool>
+SwitchSplitMiddle("switch-split-middle",
+  cl::desc("Switch tree split: Split always in middle?"),
+  cl::init(false));
+
+static cl::opt<bool>
+SwitchSplitClassic("switch-split-classic",
+  cl::desc("Switch tree split: Split classically?"),
+  cl::init(false));  // see http://llvm.org/bugs/show_bug.cgi?id=18347
+
+static cl::opt<std::string>
+SwitchCaseLog("switch-case-log",
+  cl::desc("Switch: Name of a file to log switch statements"),
+  cl::init(""));
+
 // Limit the width of DAG chains. This is important in general to prevent
 // prevent DAG-based analysis from blowing up. For example, alias analysis and
 // load clustering may not complete in reasonable time. It is difficult to
@@ -1737,70 +1792,304 @@
 /// visitJumpTable - Emit JumpTable node in the current MBB
 void SelectionDAGBuilder::visitJumpTable(JumpTable &JT) {
   // Emit the code for the jump table
+
   assert(JT.Reg != -1U && "Should lower JT Header first!");
-  EVT PTy = TM.getTargetLowering()->getPointerTy();
-  SDValue Index = DAG.getCopyFromReg(getControlRoot(), getCurSDLoc(),
-                                     JT.Reg, PTy);
-  SDValue Table = DAG.getJumpTable(JT.JTI, PTy);
-  SDValue BrJumpTable = DAG.getNode(ISD::BR_JT, getCurSDLoc(),
-                                    MVT::Other, Index.getValue(1),
-                                    Table, Index);
-  DAG.setRoot(BrJumpTable);
+
+  if (JT.JSize == 1) {
+    // Just emit an unconditional jump if only one entry is present
+    SDValue BrJump = DAG.getNode(ISD::BR, getCurSDLoc(),
+                                 MVT::Other, getControlRoot(),
+                                 DAG.getBasicBlock(JT.SingleBB));
+    DAG.setRoot(BrJump);
+  }
+  else {
+    const TargetLowering *TLI = TM.getTargetLowering();
+    EVT PTy = TLI->getPointerTy();
+    SDValue Index = DAG.getCopyFromReg(getControlRoot(), getCurSDLoc(),
+                                       JT.Reg, PTy);
+
+    if (JT.DTable) {
+dbgs() << "DD " << JT.DBits << "\n";
+      /// Index = DTable[Index]
+      IntegerType *ValType = IntegerType::get(*Context, JT.DBits);
+      SDValue VTab = DAG.getConstantPool(JT.DTable, PTy,
+                                         TD->getPrefTypeAlignment(ValType));
+      unsigned Alignment = cast<ConstantPoolSDNode>(VTab)->getAlignment();
+      unsigned EltSize = (unsigned)TD->getTypeAllocSize(ValType);
+      SDValue VIdx = DAG.getNode(ISD::MUL, getCurSDLoc(), PTy,
+                                 Index,
+                                 DAG.getConstant(EltSize, PTy));
+      SDValue VOfs = DAG.getNode(ISD::ADD, getCurSDLoc(), PTy,
+                                 VIdx,
+                                 VTab);
+      // FIXME: This getLoad fails with
+      // Assertion `Chain.getValueType() == MVT::Other && "Invalid chain type"' failed
+      Index = DAG.getLoad(MVT::getIntegerVT(JT.DBits), getCurSDLoc(),
+                          DAG.getEntryNode(),  // Link type must be MVT::Other
+                          VOfs,
+                          MachinePointerInfo::getConstantPool(), false,
+                          false, false, Alignment);
+      Index = DAG.getZExtOrTrunc(Index, getCurSDLoc(), PTy);
+    }
+
+    SDValue Table = DAG.getJumpTable(JT.JTI, PTy);
+    SDValue BrJumpTable = DAG.getNode(ISD::BR_JT, getCurSDLoc(),
+                                      MVT::Other, Index.getValue(1),
+                                      Table, Index);
+    DAG.setRoot(BrJumpTable);
+  }
 }
 
+
+typedef struct {
+  SelectionDAGBuilder *SDB;
+  Constant *BTable;
+  int BBits;
+  Constant *STable;
+  int SBits;
+  SDValue *Source;
+  SDValue Target;
+  } HashContextType;
+
 /// visitJumpTableHeader - This function emits necessary code to produce index
 /// in the JumpTable from switch case.
 void SelectionDAGBuilder::visitJumpTableHeader(JumpTable &JT,
                                                JumpTableHeader &JTH,
                                                MachineBasicBlock *SwitchBB) {
-  // Subtract the lowest switch case value from the value being switched on and
-  // conditional branch to default mbb if the result is greater than the
-  // difference between smallest and largest cases.
-  SDValue SwitchOp = getValue(JTH.SValue);
-  EVT VT = SwitchOp.getValueType();
-  SDValue Sub = DAG.getNode(ISD::SUB, getCurSDLoc(), VT, SwitchOp,
-                            DAG.getConstant(JTH.First, VT));
+  to_perfect_hash *PerfHash;
 
-  // The SDNode we just created, which holds the value being switched on minus
-  // the smallest case value, needs to be copied to a virtual register so it
-  // can be used as an index into the jump table in a subsequent basic block.
-  // This value may be smaller or larger than the target's pointer type, and
-  // therefore require extension or truncating.
-  const TargetLowering *TLI = TM.getTargetLowering();
-  SwitchOp = DAG.getZExtOrTrunc(Sub, getCurSDLoc(), TLI->getPointerTy());
+  PerfHash = JTH.PerfHash;
+  if (PerfHash) {
+    // hashed jump table
+    SDValue Val = getValue(JTH.SValue);
+    EVT VT = Val.getValueType();
 
-  unsigned JumpTableReg = FuncInfo.CreateReg(TLI->getPointerTy());
-  SDValue CopyTo = DAG.getCopyToReg(getControlRoot(), getCurSDLoc(),
-                                    JumpTableReg, SwitchOp);
-  JT.Reg = JumpTableReg;
+    // if () getSExtOrTrunc
+    const TargetLowering *TLI = TM.getTargetLowering();
+    SDValue CastOp = DAG.getZExtOrTrunc(Val, getCurSDLoc(), MVT::i32);
 
-  // Emit the range check for the jump table, and branch to the default block
-  // for the switch statement if the value being switched on exceeds the largest
-  // case in the switch.
-  SDValue CMP = DAG.getSetCC(getCurSDLoc(),
-                             TLI->getSetCCResultType(*DAG.getContext(),
-                                                     Sub.getValueType()),
-                             Sub,
-                             DAG.getConstant(JTH.Last - JTH.First,VT),
-                             ISD::SETUGT);
 
-  // Set NextBlock to be the MBB immediately after the current one, if any.
-  // This is used to avoid emitting unnecessary branches to the next block.
-  MachineBasicBlock *NextBlock = 0;
-  MachineFunction::iterator BBI = SwitchBB;
+    HashContextType HashContext;
+    HashContext.BTable = 0;
+    HashContext.STable = 0;
 
-  if (++BBI != FuncInfo.MF->end())
-    NextBlock = BBI;
+    if (PerfHash->gen_opt.tabb_len >= 1) {
+      // a ^ opt.p_tabb[b]
+      // a ^ opt.p_scramble[opt.p_tabb[b]]
+      // HashContext.BTable
+      int bits;  // optimize tabb element size
+      if (PerfHash->gen_opt.tabb_len < 16)  // parameter?
+        bits = 32;
+      if (PerfHash->gen_opt.tabb_max <= 0xff)
+        bits = 8;
+      else if (PerfHash->gen_opt.tabb_max <= 0xffff)
+        bits = 16;
+      else
+        bits = 32;
+      HashContext.BBits = bits;
+      IntegerType *ValType = IntegerType::get(*Context, bits);
+      // DEBUG(dbgs() << "ValType=" << *ValType << "\n");
+      Constant **ValueBTable = new Constant *[PerfHash->gen_opt.tabb_len];
+      std::vector<MachineBasicBlock*> DestBBs(PerfHash->gen_opt.tabb_len);
+      {for (uint32_t i = 0; i < PerfHash->gen_opt.tabb_len; ++i) {
+        ValueBTable[i] =
+          ConstantInt::get(ValType, PerfHash->gen_opt.p_tabb[i], false);
+      }}
 
-  SDValue BrCond = DAG.getNode(ISD::BRCOND, getCurSDLoc(),
-                               MVT::Other, CopyTo, CMP,
-                               DAG.getBasicBlock(JT.Default));
+      // Create a ConstantArray of the constants.
+      HashContext.BTable = ConstantArray::get(
+        ArrayType::get(ValType, PerfHash->gen_opt.tabb_len),
+            ArrayRef<Constant *>(ValueBTable, PerfHash->gen_opt.tabb_len));
 
-  if (JT.MBB != NextBlock)
-    BrCond = DAG.getNode(ISD::BR, getCurSDLoc(), MVT::Other, BrCond,
-                         DAG.getBasicBlock(JT.MBB));
+      if (PerfHash->gen_opt.scramble_len != 0) {
+        // HashContext.STable
+        int bits;  // optimize scramble element size
+        if (PerfHash->gen_opt.scramble_max <= 0xff)
+          bits = 8;  // probably never happening
+        else if (PerfHash->gen_opt.scramble_max <= 0xffff)
+          bits = 16;
+        else
+          bits = 32;
+        HashContext.SBits = bits;
+        IntegerType *ValType = IntegerType::get(*Context, bits);
+        // DEBUG(dbgs() << "ValType=" << *ValType << "\n");
+        Constant **ValueSTable = new Constant *[PerfHash->gen_opt.scramble_len];
+        std::vector<MachineBasicBlock*> DestBBs(PerfHash->gen_opt.scramble_len);
+        {for (uint32_t i = 0; i < PerfHash->gen_opt.scramble_len; ++i) {
+          ValueSTable[i] =
+            ConstantInt::get(ValType, PerfHash->gen_opt.p_scramble[i], false);
+        }}
 
-  DAG.setRoot(BrCond);
+        // Create a ConstantArray of the constants.
+        HashContext.STable = ConstantArray::get(
+          ArrayType::get(ValType, PerfHash->gen_opt.scramble_len),
+              ArrayRef<Constant *>(ValueSTable, PerfHash->gen_opt.scramble_len));
+      }
+    }
+
+
+    HashContext.SDB = this;
+    HashContext.Source = &CastOp;
+    PerfHash->do_generate((t_pointer)(&HashContext));
+    SDValue HashIndex = HashContext.Target;
+
+    SDValue HashPtr = DAG.getZExtOrTrunc(HashIndex, getCurSDLoc(),
+                                         TLI->getPointerTy());
+
+    // VT: Value table (array in pool)
+    // Val: Original value
+    // Shr: table index as i32
+    // HashPtr: table index as pointer
+
+    // const DataLayout &TD = *TLI->getDataLayout();
+
+    SDValue CMP;
+    if (PerfHash->gen_opt.check_range) {
+      // cmp HashIndex,const
+      // check_value can also be set; handled by jump table
+      // TODO: maybe handle by bit field
+      CMP = DAG.getSetCC(getCurSDLoc(),
+                         TLI->getSetCCResultType(*DAG.getContext(),
+                                                 HashIndex.getValueType()),
+                         HashIndex,
+                         DAG.getConstant(PerfHash->gen_opt.hash_used_max, VT),
+                         ISD::SETUGT);
+    } else if (PerfHash->gen_opt.check_value) {
+      // cmp Val,tab[HashIndex]
+      uint32_t HashRange;
+      if (PerfHash->gen_opt.check_range)
+        HashRange = PerfHash->gen_opt.hash_used_max+1;
+      else
+        HashRange = PerfHash->gen_opt.hash_gen_max+1;
+
+
+      IntegerType *ValType = IntegerType::get(*Context,PerfHash->gen_opt.bits);
+      // DEBUG(dbgs() << "ValType=" << *ValType << "\n");
+      Constant **ValueTable = new Constant *[HashRange];
+      std::vector<MachineBasicBlock*> DestBBs(HashRange);
+      {for (uint32_t I = 0; I < HashRange; ++I) {
+        t_32u Key;
+        t_pointer KeyStuff;  // dummy
+        if (PerfHash->inv_hash(I, Key, KeyStuff))  // determines key
+          ValueTable[I] = ConstantInt::get(ValType, Key, false);
+        else
+          ValueTable[I] = ConstantInt::get(ValType, PerfHash->gen_opt.low, false);
+      }}
+
+      // Create a ConstantArray of the constants.
+      Constant *ValTable = ConstantArray::get(
+        ArrayType::get(ValType, HashRange),
+            ArrayRef<Constant *>(ValueTable, HashRange));
+
+
+      /// cmp Val,tab[HashIndex]
+      SDValue VTab = DAG.getConstantPool(ValTable, TLI->getPointerTy(),
+        TD->getPrefTypeAlignment(ValType));
+      unsigned Alignment = cast<ConstantPoolSDNode>(VTab)->getAlignment();
+      unsigned EltSize = (unsigned)TD->getTypeAllocSize(ValType);
+      SDValue VIdx = DAG.getNode(ISD::MUL, getCurSDLoc(), TLI->getPointerTy(),
+                                 HashPtr,
+                                 DAG.getConstant(EltSize, TLI->getPointerTy()));
+      SDValue VOfs = DAG.getNode(ISD::ADD, getCurSDLoc(), TLI->getPointerTy(),
+                                 VIdx,
+                                 VTab);
+
+      SDValue CmpVal = DAG.getLoad(
+        MVT::getIntegerVT(PerfHash->gen_opt.bits), getCurSDLoc(), DAG.getEntryNode(),
+        VOfs,
+        MachinePointerInfo::getConstantPool(), false,
+        false, false, Alignment);
+
+      SDValue ConvVal;
+      if (PerfHash->gen_opt.use_signed)
+        ConvVal = DAG.getSExtOrTrunc(CmpVal, getCurSDLoc(), MVT::i32);
+      else
+        ConvVal = DAG.getZExtOrTrunc(CmpVal, getCurSDLoc(), MVT::i32);
+
+      // DEBUG(dbgs() << "converted to " << PerfHash->gen_opt.bits << (PerfHash->gen_opt.use_signed?"s":"u") << "\n");
+      CMP = DAG.getSetCC(getCurSDLoc(),
+                         TLI->getSetCCResultType(*DAG.getContext(),
+                                                 HashIndex.getValueType()),
+                         CastOp,  // Val,
+                         ConvVal,
+                         ISD::SETNE);
+    } else {
+      assert(false && "TCH: no checks");
+    }
+
+    // Set NextBlock to be the MBB immediately after the current one, if any.
+    // This is used to avoid emitting unnecessary branches to the next block.
+    MachineBasicBlock *NextBlock = 0;
+    MachineFunction::iterator BBI = SwitchBB;
+
+    if (++BBI != FuncInfo.MF->end())
+      NextBlock = BBI;
+
+    unsigned JumpTableReg = FuncInfo.CreateReg(TLI->getPointerTy());
+    SDValue CopyTo = DAG.getCopyToReg(getControlRoot(), getCurSDLoc(),
+                                      JumpTableReg, HashPtr);
+    JT.Reg = JumpTableReg;
+
+    SDValue BrCond = DAG.getNode(ISD::BRCOND, getCurSDLoc(),
+                                 MVT::Other, CopyTo, CMP,
+                                 DAG.getBasicBlock(JT.Default));
+
+    if (JT.MBB != NextBlock)
+      BrCond = DAG.getNode(ISD::BR, getCurSDLoc(), MVT::Other, BrCond,
+                           DAG.getBasicBlock(JT.MBB));
+
+    DAG.setRoot(BrCond);
+  } else {
+    // normal jump table
+    // Subtract the lowest switch case value from the value being switched on
+    // and conditional branch to default mbb if the result is greater than the
+    // difference between smallest and largest cases.
+    SDValue SwitchOp = getValue(JTH.SValue);
+    EVT VT = SwitchOp.getValueType();
+    SDValue Sub = DAG.getNode(ISD::SUB, getCurSDLoc(), VT, SwitchOp,
+                              DAG.getConstant(JTH.First, VT));
+
+    // The SDNode we just created, which holds the value being switched on minus
+    // the smallest case value, needs to be copied to a virtual register so it
+    // can be used as an index into the jump table in a subsequent basic block.
+    // This value may be smaller or larger than the target's pointer type, and
+    // therefore require extension or truncating.
+    const TargetLowering *TLI = TM.getTargetLowering();
+    SwitchOp = DAG.getZExtOrTrunc(Sub, getCurSDLoc(), TLI->getPointerTy());
+
+    unsigned JumpTableReg = FuncInfo.CreateReg(TLI->getPointerTy());
+    SDValue CopyTo = DAG.getCopyToReg(getControlRoot(), getCurSDLoc(),
+                                      JumpTableReg, SwitchOp);
+    JT.Reg = JumpTableReg;
+
+    // Emit the range check for the jump table, and branch to the default block
+    // for the switch statement if the value being switched on exceeds the
+    // largest case in the switch.
+    SDValue CMP = DAG.getSetCC(getCurSDLoc(),
+                               TLI->getSetCCResultType(*DAG.getContext(),
+                                                       Sub.getValueType()),
+                               Sub,
+                               DAG.getConstant(JTH.Last - JTH.First,VT),
+                               ISD::SETUGT);
+
+    // Set NextBlock to be the MBB immediately after the current one, if any.
+    // This is used to avoid emitting unnecessary branches to the next block.
+    MachineBasicBlock *NextBlock = 0;
+    MachineFunction::iterator BBI = SwitchBB;
+
+    if (++BBI != FuncInfo.MF->end())
+      NextBlock = BBI;
+
+    SDValue BrCond = DAG.getNode(ISD::BRCOND, getCurSDLoc(),
+                                 MVT::Other, CopyTo, CMP,
+                                 DAG.getBasicBlock(JT.Default));
+
+    if (JT.MBB != NextBlock)
+      BrCond = DAG.getNode(ISD::BR, getCurSDLoc(), MVT::Other, BrCond,
+                           DAG.getBasicBlock(JT.MBB));
+
+    DAG.setRoot(BrCond);
+  }
 }
 
 /// Codegen a new tail for a stack protector check ParentMBB which has had its
@@ -2357,6 +2646,8 @@
   return true;
 }
 
+#include "gen_hash.inc"
+
 /// handleBTSplitSwitchCase - emit comparison and split binary search tree into
 /// 2 subtrees.
 bool SelectionDAGBuilder::handleBTSplitSwitchCase(CaseRec& CR,
@@ -2405,11 +2696,14 @@
            "Invalid case distance");
     // Use volatile double here to avoid excess precision issues on some hosts,
     // e.g. that use 80-bit X87 registers.
+    // Correction for edge cases giving 0 instead of 1 [?Size=>(?Size-1)]
+    // Otherwise e.g. switch 100,200,200,... yields else-if chain, bug 18347
+    int ofs = SwitchSplitClassic?0:1;
     volatile double LDensity =
-       (double)LSize.roundToDouble() /
+       (double)(LSize-ofs).roundToDouble() /
                            (LEnd - First + 1ULL).roundToDouble();
     volatile double RDensity =
-      (double)RSize.roundToDouble() /
+      (double)(RSize-ofs).roundToDouble() /
                            (Last - RBegin + 1ULL).roundToDouble();
     volatile double Metric = Range.logBase2()*(LDensity+RDensity);
     // Should always split in some non-trivial place
@@ -2429,7 +2723,7 @@
   }
 
   const TargetLowering *TLI = TM.getTargetLowering();
-  if (areJTsAllowed(*TLI)) {
+  if (areJTsAllowed(*TLI) && !SplitMiddle && !SwitchSplitMiddle) {
     // If our case is dense we *really* should handle it earlier!
     assert((FMetric > 0) && "Should handle dense range earlier!");
   } else {
@@ -2726,6 +3020,38 @@
   // search tree.
   const Value *SV = SI.getCondition();
 
+  // MDNode *n = SI.getMetadata("dbg");
+  // DILocation loc(n);  
+  // dbgs() << "case  // " << getCurSDLoc().getDebugLoc().getLine() << "\n";
+  // dbgs() << "case  // " << loc.getFilename() << " " << getCurSDLoc().getDebugLoc().getLine() << "\n";
+
+  if (SwitchCaseLog != "") {
+    // dbgs() << "case\n";
+    // FIXME: Dump source file name and line number also
+    std::string ErrorInfo;
+    raw_fd_ostream f(SwitchCaseLog.c_str(), ErrorInfo, llvm::sys::fs::F_Append);
+    // f << "case  // " << getCurSDLoc().getDebugLoc().getScope(*Context) << "\n";
+    f << "case\n";
+    for (CaseItr I = Cases.begin(), E = Cases.end();
+         I!=E; ++I) {
+      const APInt &Low = cast<ConstantInt>(I->Low)->getValue();
+      const APInt &High = cast<ConstantInt>(I->High)->getValue();
+      if (Low .getSExtValue()==High.getSExtValue()) {
+        f << "when " << (Low .getSExtValue())
+          << "  // " << I->BB
+          << "\n";
+      } else {
+        f << "when " << (Low .getSExtValue())
+          << ".."    << (High.getSExtValue())
+          << "  // " << I->BB
+          << "\n";
+      }
+    }
+    f << "end\n";
+    f << "\n";
+    f.close();
+  }
+
   // Push the initial CaseRec onto the worklist
   CaseRecVector WorkList;
   WorkList.push_back(CaseRec(SwitchMBB,0,0,
@@ -2736,25 +3062,34 @@
     CaseRec CR = WorkList.back();
     WorkList.pop_back();
 
-    if (handleBitTestsSwitchCase(CR, WorkList, SV, Default, SwitchMBB))
+    if ((SwitchMethods & 0x00000001) &&
+        handleBitTestsSwitchCase(CR, WorkList, SV, Default, SwitchMBB))
       continue;
 
     // If the range has few cases (two or less) emit a series of specific
     // tests.
-    if (handleSmallSwitchRange(CR, WorkList, SV, Default, SwitchMBB))
+    if ((SwitchMethods & 0x00000002) &&
+        handleSmallSwitchRange(CR, WorkList, SV, Default, SwitchMBB))
       continue;
 
     // If the switch has more than N blocks, and is at least 40% dense, and the
     // target supports indirect branches, then emit a jump table rather than
     // lowering the switch to a binary tree of conditional branches.
     // N defaults to 4 and is controlled via TLS.getMinimumJumpTableEntries().
-    if (handleJTSwitchCase(CR, WorkList, SV, Default, SwitchMBB))
+    if ((SwitchMethods & 0x00000004) &&
+        handleJTSwitchCase(CR, WorkList, SV, Default, SwitchMBB))
       continue;
 
+    if ((SwitchMethods & 0xfffffff0) &&
+        handleHashSwitchCase(CR, WorkList, SV, Default, SwitchMBB))
+      continue;
+
     // Emit binary tree. We need to pick a pivot, and push left and right ranges
     // onto the worklist. Leafs are handled via handleSmallSwitchRange() call.
     handleBTSplitSwitchCase(CR, WorkList, SV, Default, SwitchMBB);
   }
+  SplitMiddle = false;  // switch back to default behavior
+  BigRanges.clear();
 }
 
 void SelectionDAGBuilder::visitIndirectBr(const IndirectBrInst &I) {
-------------- next part --------------
A non-text attachment was scrubbed...
Name: switchgen.cpp
Type: text/x-c++src
Size: 2575 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140201/818fb507/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hashtest.cpp
Type: text/x-c++src
Size: 9169 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140201/818fb507/attachment-0001.cpp>
-------------- next part --------------
Preliminary: Special identifiers

n    number of given switch labels
N    size of the (intermediate) hash range, usually n rounded up to the next power of 2
x    the value to be switch upon


Why (perfect) hashing?

Hashing enables a jump table like approach even for sparse label sets.
Let's concentrate on perfect hashing in order to keep the code smaller and faster.
Hashing is faster than a decision tree provided there are enough labels, i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8 labels assuming random values of x.
The memory consumption of the hashing tables is comparable to the memory footage of the code of a decision tree.
A decision tree consumes a lot of branch prediction resources.
For minimal perfect hashing the jump table does not need to contain dummy labels and therefore has n instead of (almost) N elements.
"Switch-to-lookup tables": If all jump targets are sufficiently similar hashing opens the possibility to use table which contain the differences. Thereby the indirect jump is eliminated and we get a huge performance gain.


Why not hashing?

A decision tree or even an else-if chain is faster than hashing for very few labels.
The branch prediction logic usually does not work as satisfactorily as for decision trees if fed with cyclical access patterns; i.e. the break-even point in this scenario is higher.
Hashing can not deal with large label ranges since very single label is treated. A possible compromise/solution might be to first catch all single labels and sufficiently short ranges by hashing and then treat the remaining ranges in a decision tree.


Why reversible hashing?

Reversible hashing does not need a value table and a simple range check of the calculated hash value is sufficient.
Reversible hashing is very simple to implement.
The usual jump table approach is a special case of reversible hashing.


Why simple hashing?

A comparison against a value table is sufficient.
Simple hashing does not need an extra table.


Why not simple hashing?

Simple hashing needs a lot of hash function candidates to be tested until a suitable is found; this costs time in the compiler.
For large label sets the chance to find such a function is very small; the applicability is therefore reduced to quite small sets.
Minimal perfect hashing is usually not achievable.


Why Jenkins' hashing?

Jenkins' hashing produces very fast and compact code and needs one extra table (BTable) with usually less elements than n; the elements itself are unsigned integers which can become as high N-1.
The code first generates two hash values h1=f1(x) and h2=f2(x) where f1 and f2 limit their range by shifting or masking, then the final hash as h = h1 ^ BTable[h1].
Minimal perfect hashing is achievable, albeit often at the cost of larger tables.
For sufficiently high N the table memory can optionally be reduced by introducing an extra scrambling table (STable); BTable then consists of bytes whereas STable contains 256 words or DWords. Obviously this reduction of memory costs another indirection in the resulting code.
As far as I tested Jenkins' hashing can be used for label sets of several thousand elements; often even one million labels can be treated.
The application in a compiler is to lower switch statements. Switch statements with much more than 1000 labels are almost never present in practice.
Jenkins' hashing usually is sufficiently fast.
By the way: You will find the following remark in lib/Target/README.txt:
==>
Investigate lowering of sparse switch statements into perfect hash tables:
http://burtleburtle.net/bob/hash/perfect.html
<==


Why not Jenkins' hashing?

The search time of Jenkins' hashing might become unacceptably long for extraordinary large label sets.
In practice this should never happen, however in this case the label set will be split in two parts are the process of lowering switches will be repeated.
An alternative (for very large label sets) could be a different perfect hash algorithm such as CHM.


Why CHM?

CHM is extraordinarily fast at generating the extra table with a runtime of probabilistic O(n).
CHM can be used to hash almost arbitrarily many labels.
CHM produces an ordered minimal perfect hashing function.


Why not CHM?

CHM needs extra table space of 2.09 * n elements; this is larger compared to Jenkins' method.
CHM (in the implementation of cmph-2.0) is much more complicated than the code of Jenkins' hashing:
h1 = f1(x) % N;
h2 = f2(x) % N;
if (h1 == h2 && ++h2 >= N)  h2 = 0;
h = (Table[h1] + Table[h2]) % n;
By enlarging N to a power of 2 and other tweaking we might replace the 3 modulo operations by shifting or masking at the cost of more table space. We might also eliminate the conditional statement if in this case we choose different hash functions and retry the calculation.
There is at least one more table access compared to Jenkins' method.
-------------- next part --------------
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.

2014-02-01

Comments on the current implementation of switch via hashing in LLVM 

The patch modifies lib/CodeGen/SelectionDAG/SelectionDAGBuilder.*.
The patch for http://llvm.org/bugs/show_bug.cgi?id=18347 is already embedded but can be controlled by a switch.
Place gen_hash.inc and hashlib.* in lib/CodeGen/SelectionDAG/.
gen_hash.inc currently is a separate file to ease editing for me but shall be integrated into SelectionDAGBuilder.cpp later.
hashlib.* contains the code where all magic things concerning the generation of a hash function happen; motivation, parts of the implementation and some speed tests are described in http://programming.sirrida.de/hashsuper.pdf. In contrast to that paper I dropped imperfect hashing but added perfect hashing as implemented by Bob Jenkins (see hashlib.*) instead.
hashlib.* should be moved to a different location in order to make it a proper library; also, hashlib.cpp should not be included directly.

I have provided some switches at the top of SelectionDAGBuilder.cpp to fine tune the behavior such as the selection of hash methods. These can be triggered with clang -mllvm -X=Y.

The LLVM code first tests the applicability of hashing such as the availability of the needed operations or jump tables, whether the switch value is at most 32 bit, whether the label set is big enough, whether large ranges kill the approach, and whether hashing succeeds. The limits and limitations can be discussed.
Some tables are set up as needed: tabb/BTable (for perfect hashing a la Jenkins), scramble/STable (for large tabb), ValTable for value comparison, DTable for double dispatching (to keep the jump table small; yet producing an error and thus disabled), and the jump table. All integer value tables use the smallest usable size, i.e. 8, 16, or 32 bit and the values can be signed or unsigned.
A jump table with only one element is replaced by a simple jump. This can happen if minimal perfect hashing is achieved and all (non-default) labels address the same jump target.
The code generator callback functions are plain functions but might as well become static member functions of SelectionDAGBuilder; then the transfer structure type HashContextType could become a local type.

Here are the masks for the available switch dispatch methods; use with e.g. -mllvm -switch-methods=0x00004003:
0x00000001  LLVM standard: bit tests; handleBitTestsSwitchCase
0x00000002  LLVM standard: small ranges; handleSmallSwitchRange
0x00000004  LLVM standard: jump table; handleJTSwitchCase
0x00000010  Reversible hashing / replacement of jump table
0x00000100  Simple hashing: and
0x00000200  Simple hashing: shr
0x00000400  Simple hashing: rol
0x00000800  Simple hashing: rol xor
0x00001000  Simple hashing: rol add
0x00002000  Simple hashing: rol sub
0x00004000  Simple hashing: mul
0x00010000  Jenkins hashing: ab 1
0x00020000  Jenkins hashing: ab 2
0x00040000  Jenkins hashing: ab 3
0x00100000  Jenkins hashing: ab x1 (mul)
0x00400000  Jenkins hashing: ab x3 (big mul)
Not all combinations work since handleBTSplitSwitchCase complains if small ranges remain; therefore handleSmallSwitchRange (2) should always be activated.

I have not yet done comprehensive tests on the generated code in all combinations and for platforms different from x86 on Ubuntu but at least I have tested the hash engine quite well. Also, some existing tests fail because different code is generated; these will have to be changed and new tests will have to be created.
The code should be tidied up.
All lines beginning with /*@/// or /*@\\\ are artifacts from my folding editor.

For the final version double dispatching should work and large ranges should be treated in a decision tree after the hashing, also http://llvm.org/bugs/show_bug.cgi?id=1255 should have been fixed in order to allow for arbitrary large ranges.

It might be that the place where I lower switch statements is too low level; a better place might be just before it is tested whether a switch statement can be lowered to a table lookup; if the hashing approach succeeds a table lookup might follow naturally replacing the jump table which is currently generated.
Another approach could be to also implement hashing in SimplifySwitch() of lib/Transforms/Utils/SimplifyCFG.cpp.

Future plans include preparing a patch for GCC too using the same hashlib. For Free Pascal the Pascal version of hashlib could be used (I have a working version for Pascal/Delphi). Also string hashing should be considered.

The coding style of hashlib is quite different from the usual LLVM style. However since it should be treated as a black box library this should not matter much. Also, it is much more C than C++ for speed and portability reasons.

Please tell me whatever quirks you find and whatever you want to be changed.
For testing don't be shy to use very large sparse label sets; in the code up to 700_000
labels are accepted but this is a more or less arbitrary value.

The TODO and FIXME comments give hints about what are open questions to me. Please help me on this!

As of documentation and testing: I am an LLVM newbie and not at all fit at reading or writing LL code which seems to be needed for the test programs.

Have fun
Jasper

(c) 2013..2014 by Jasper L. Neumann
www.sirrida.de / programming.sirrida.de
E-Mail: info at sirrida.de

===

History

2014-01-16: First public patch

2014-02-01: Update released
Changes:
Endlessly adding keys causing a heap overflow if a range contained -1 and 0.
Division by zero if a switch contained -1 and 0 or 0x7fffffff and 0x80000000.
Added the gen_ab_x* methods for Jenkins' hashing using a multiplication.
Sources somewhat tidied up.
Allowed reversible hashing for 4 cases if the factor is a power of 2.
I renamed some of the compiler switches introduced with my first patch.