[LLVMdev] Idea for optimization (test for remainder)

Thu Mar 20 04:31:16 PDT 2014

On 08.03.2014, at 20:51, Jasper Neumann <jn at sirrida.de> wrote:

> Hello Benjamin, hello folks!
> 
> [from the LLVM developer forum, comment corrected]
> >> Consider the expression (x % d) == c where d and c are constants.
> >> For simplicity let us assume that x is unsigned and 0 <= c < d.
> >> Let us further assume that d = a * (1 << b) and a is odd.
> >> Then our expression can be transformed to
> >> rotate_right(x-c, b) * inverse_mul(a) <= (high_value(x) - c) / d .
> >> Example [(x % 250) == 3]:
> >>   sub eax,3
> >>   ror eax,1
> >>   imul eax,eax,0x26e978d5  // multiplicative inverse of 125
> >>   cmp eax,17179869  // (0xffffffff-3) / 250
> >>   jbe OK
> >> [...]
> 
> > Yep, this is a long-standing issue in the peephole optimizer.
> > It's not easily fixed because
> 
> > 1. We don't want to do it early (i.e. before codegen) because
> >    the resulting expression is harder to analyze wrt. value range.
> > 2. We can't do it late (in DAGCombiner) because it works top-down
> >    and has already expanded the operation into the code you posted
> >    above by the time it sees the compare.
> 
> Well, I tried a solution using the instruction combiner, and it turned out well. I attach a working patch for unsigned values. The signed version will come later if this patch is accepted.

The problem I see with this approach is that InstCombine is primarily a canonicalization pass. Your change lets it generate a ton of code, making life harder for our analysis passes (think of what happens when your transformation changes a loop condition). We should do this kind of thing later. We can't do it in the DAG combining stage where we expand other divides because it works top-down.

Maybe we can put this in CodeGenPrepare, which works on IR and runs just before DAG lowering. It also has the ability to insert branching in case you need it for range checks. The pass is a bit of hack though, maybe someone else has a better idea.

- Ben

> How could I detect and include an additional range check which is possible with the same amount of generated code?
> 
> By the way: Is there something like a floored or Euclidian remainder/modulo operation (see http://en.wikipedia.org/wiki/Modulo_operation)? How is it realized?
> 
> Best regards
> Jasper
> <patch1.txt>