[PATCH] D79369: [InstCombine] "X - (X / C) * C == 0" to "X & C-1 == 0"

Mon May 4 23:07:57 PDT 2020

lebedev.ri added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:1362-1363
+/// Fold decomposed version of "X % C == 0" to "X & C-1 == 0".
+/// "X % C" can also be represented as "X - (X / C) * C" which is optimized
+/// into "((X / -C1) >> C2)) + X" so the latter can be folded to
+/// "X & C-1" for icmp eq/ne 0
----------------
lebedev.ri wrote:
> I think we should first instead fold `X - (X / C) * C`/`((X / -C1) >> C2)) + X` into `X % C`.
> We have integer division/remainder either way, but with remainder it's less instructions.
> 
> But that won't help here, because your target pattern `@is_rem32_pos_decomposed_i8`
> is currently folded into 
> ```
> define i1 @is_rem32_pos_decomposed_i8(i8 %x) {
>   %d.neg = sdiv i8 %x, -32
>   %m.neg = shl i8 %d.neg, 5
>   %s = sub i8 0, %x
>   %r = icmp eq i8 %m.neg, %s
>   ret i1 %r
> }
> ```
> instead of
> ```
> define i1 @tgt(i8 %x) {
>   %d.neg = sdiv i8 %x, -32
>   %m.neg = shl i8 %d.neg, 5
>   %m.neg.add = add i8 %m.neg, %x
>   %r = icmp eq i8 %m.neg.add, 0
>   ret i1 %r
> }
> ```
> For that, i'd think we should extend `foldIRemByPowerOfTwoToBitTest()`.
Edit: err, actually, disregard `foldIRemByPowerOfTwoToBitTest()` comment.
We should also fold
```
define i1 @is_rem32_pos_decomposed_i8(i8 %x) {
  %d.neg = sdiv i8 %x, -32
  %m.neg = shl i8 %d.neg, 5
  %s = sub i8 0, %x
  %r = icmp eq i8 %m.neg, %s
  ret i1 %r
}
```
into
```
define i1 @is_rem32_pos_decomposed_i8(i8 %x) {
  %s = srem i8 %x, 32
  %r = icmp eq i8 %s, 0
  ret i1 %r
}
```
and then `foldIRemByPowerOfTwoToBitTest()` will just work.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79369/new/

https://reviews.llvm.org/D79369