[LLVMdev] Idea for optimization (test for remainder)

Sat Mar 22 01:44:25 PDT 2014

Hello Benjamin, hello folks!

 >>>> Consider the expression (x % d) == c where d and c are constants.
 >>>> For simplicity let us assume that x is unsigned and 0 <= c < d.
 >>>> Let us further assume that d = a * (1 << b) and a is odd.
 >>>> Then our expression can be transformed to
 >>>> rotate_right(x-c, b) * inverse_mul(a) <= (high_value(x) - c) / d .
 >>>> Example [(x % 250) == 3]:
 >>>>    sub eax,3
 >>>>    ror eax,1
 >>>>    imul eax,eax,0x26e978d5  // multiplicative inverse of 125
 >>>>    cmp eax,17179869  // (0xffffffff-3) / 250
 >>>>    jbe OK
 >>>> [...]

 >>> Yep, this is a long-standing issue in the peephole optimizer.
 >>> It's not easily fixed because
 >>> 1. We don't want to do it early (i.e. before codegen) because
 >>>     the resulting expression is harder to analyze wrt. value range.
 >>> 2. We can't do it late (in DAGCombiner) because it works top-down
 >>>     and has already expanded the operation into the code you posted
 >>>     above by the time it sees the compare.

 >> Well, I tried a solution using the instruction combiner,
 >> and it turned out well.
 >> I attach a working patch for unsigned values.
 >> The signed version will come later if this patch is accepted.

The signed case is much more complicated but nevertheless generates the 
same code.

 > The problem I see with this approach is that InstCombine is
 > primarily a canonicalization pass.

My approach is very similar to the already existing function 
FoldICmpDivCst which converts x/d==c or similar to a range check, so it 
ought to be right place.

 > Your change lets it generate a ton of code,
 > making life harder for our analysis passes
 > (think of what happens when your transformation changes
 > a loop condition).

Do you really consider 4 operations on one expression using 4 constants 
for the worst case a "ton of code"?

 > We should do this kind of thing later.

Why?
I have seen that the compiler does other optimizations on the generated 
code such as exchanging multiplication and addition or re-using 
subexpressions.

 > We can't do it in the DAG combining stage where we
 > expand other divides because it works top-down.
 > Maybe we can put this in CodeGenPrepare, which works on
 > IR and runs just before DAG lowering.
 > It also has the ability to insert branching in case you
 > need it for range checks.

As far as I can see, for this optimization there is no need for range 
checks; I never need more than the mentioned code.

 > The pass is a bit of hack though,
 > maybe someone else has a better idea.

I hope that others join the discussion.

To make things easier to track, I have created a bug entry:
http://llvm.org/bugs/show_bug.cgi?id=19206

 >> How could I detect and include an additional range check
 >> which is possible with the same amount of generated code?

Ping.

 >> By the way: Is there something like a floored
 >> or Euclidian remainder/modulo operation
 >> (see http://en.wikipedia.org/wiki/Modulo_operation)?
 >> How is it realized?

Ping.

Best regards
Jasper