[PATCH] D124183: [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)

Thu Apr 21 11:55:46 PDT 2022

spatel added a comment.

In D124183#3465290 <https://reviews.llvm.org/D124183#3465290>, @bcl5980 wrote:

> One other reason I insist one-use is we can give programmer choice. If we remove one-use, there is no choice for programmer.
> https://godbolt.org/z/Wrzh5aczf
> https://godbolt.org/z/1nGf88E87

Ideally, the compiler would not give the programmer a "choice" of multiply asm output on these examples. We want to be able to reduce logically equivalent code to the same IR (canonicalization) and then optimize it to the ideal target assembly.

Leaving holes in the canonicalization makes it harder to optimize maximally and consistently. That can also be frustrating for users when they think they have written the same program, but it compiles to different output. 
...but we will never be able to canonicalize all code patterns - there are just too many potential variations in source and IR.

As noted, the general rule for instcombine is "don't increase the instruction count". In addition, if we can reduce the number of uses of variables or reduce the dependency chain, then that is usually good. That is the reason we would leave out the one-use check in the test example for this patch - "C" can be calculated directly from "A" rather than using the intermediate "B" value.

The rules can be confusing because we have several exceptions:

1. As noted here, some operations may be more expensive (or not legal) for some targets, and it can be difficult to invert/decompose the transform in the backend, so we avoid some transforms.
2. Sometimes by removing an intermediate use it becomes harder for the backend to match a longer sequence as a target-specific operation, so we avoid a transform. There are recent examples of `icmp` one-use transforms where this happens.
3. There are operations like `div` that we know are difficult to analyze in IR and create expensive codegen, so we convert those to multiple simpler instructions in IR.
4. We may choose a form like "icmp ne (and X, 1), 0" rather than "trunc X to i1" because analysis was historically better for icmp (and now it might just be too much work to change).

I don't think we need to revert D123453 <https://reviews.llvm.org/D123453>. As discussed there, we could have made that patch consistent with this transform (no one-use check). I assumed from the codegen examples that there could be a performance regression without one-use, but there could also be improvements. We can remove the one-use as a follow-up to that patch and see if anyone complains.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124183/new/

https://reviews.llvm.org/D124183