[llvm-dev] (RFC) JumpMaps: switch statement optimization
Witold Waligora via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 14 05:17:09 PST 2017
Eli,
> 2. How many versions of __jumpmap_find do you need?
In our out-of-tree implementation we currently encode case values as-is,
thus we need 4 versions of __jumpmap_find function (i8,i16,i32,i64).
We have an analysis pass that helps us figure out which kinds of maps
are actually profitable.
For example, if we discover that we only have one i8 jumpmap but many
i16 jumpmaps, then we would zero-extend the i8 map to i16 so that we
don't use __jumpmap_find_i8 at all and it can be removed.
> If you're going for
> size, you probably want to put 1 or 2 byte relative jump offsets into
> the table, if possible. You'd also want to shrink the keys if possible:
> for a small range, we could use 2-byte, or even 1-byte keys. You could
> also encode deltas into the table, rather than absolute values; the
> deltas are likely to be smaller than the keys, and you're iterating over
> the table anyway.
Yes, many options are possible. In general, as long as jumpmap_find()
function can undo what the compiler did to the map you'd be fine. We
plan to keep this opt flexible when upstreaming.
> 3. Is the linear search a problem? I mean, the linear nature of it
> doesn't matter for small tables, but eventually you get to a point where
> it just doesn't make sense. Linearly iterating over 1000 entries in a
> table is clearly going to be slower than a binary search. Or is the
> idea that you'd do a binary search down to say, 20 entries, then call
> __jumpmap_find?
Our __jumpmap_find does binary-search internally and it turned out to be
both faster and smaller than what compiler would generate otherwise
(bittest clustering).
There is some concern about how well jumpmaps could integrate with
existing switch clustering code for a pure speed Target.
> 4. How often does this pattern come up, in your experience?
This is highly dependent on how large switches will be profitable to
fold as a jumpmap on a particular target.
For us the break-even point is at 5 cases which is very small and fires
quite often.
I can't say for X86/speed, but as code size opt, this fires on any
switch statement >= 5 that fails to fold as a JumpTable.
Witold
More information about the llvm-dev
mailing list