[LLVMdev] Idea for optimization (test for remainder)
jn at sirrida.de
Sat Mar 22 01:44:25 PDT 2014
Hello Benjamin, hello folks!
>>>> Consider the expression (x % d) == c where d and c are constants.
>>>> For simplicity let us assume that x is unsigned and 0 <= c < d.
>>>> Let us further assume that d = a * (1 << b) and a is odd.
>>>> Then our expression can be transformed to
>>>> rotate_right(x-c, b) * inverse_mul(a) <= (high_value(x) - c) / d .
>>>> Example [(x % 250) == 3]:
>>>> sub eax,3
>>>> ror eax,1
>>>> imul eax,eax,0x26e978d5 // multiplicative inverse of 125
>>>> cmp eax,17179869 // (0xffffffff-3) / 250
>>>> jbe OK
>>> Yep, this is a long-standing issue in the peephole optimizer.
>>> It's not easily fixed because
>>> 1. We don't want to do it early (i.e. before codegen) because
>>> the resulting expression is harder to analyze wrt. value range.
>>> 2. We can't do it late (in DAGCombiner) because it works top-down
>>> and has already expanded the operation into the code you posted
>>> above by the time it sees the compare.
>> Well, I tried a solution using the instruction combiner,
>> and it turned out well.
>> I attach a working patch for unsigned values.
>> The signed version will come later if this patch is accepted.
The signed case is much more complicated but nevertheless generates the
> The problem I see with this approach is that InstCombine is
> primarily a canonicalization pass.
My approach is very similar to the already existing function
FoldICmpDivCst which converts x/d==c or similar to a range check, so it
ought to be right place.
> Your change lets it generate a ton of code,
> making life harder for our analysis passes
> (think of what happens when your transformation changes
> a loop condition).
Do you really consider 4 operations on one expression using 4 constants
for the worst case a "ton of code"?
> We should do this kind of thing later.
I have seen that the compiler does other optimizations on the generated
code such as exchanging multiplication and addition or re-using
> We can't do it in the DAG combining stage where we
> expand other divides because it works top-down.
> Maybe we can put this in CodeGenPrepare, which works on
> IR and runs just before DAG lowering.
> It also has the ability to insert branching in case you
> need it for range checks.
As far as I can see, for this optimization there is no need for range
checks; I never need more than the mentioned code.
> The pass is a bit of hack though,
> maybe someone else has a better idea.
I hope that others join the discussion.
To make things easier to track, I have created a bug entry:
>> How could I detect and include an additional range check
>> which is possible with the same amount of generated code?
>> By the way: Is there something like a floored
>> or Euclidian remainder/modulo operation
>> (see http://en.wikipedia.org/wiki/Modulo_operation)?
>> How is it realized?
More information about the llvm-commits