[PATCH] D128123: [SDAG] try to replace subtract-from-constant with xor

Mon Jul 11 13:35:47 PDT 2022

spatel added a comment.

In D128123#3643353 <https://reviews.llvm.org/D128123#3643353>, @bjope wrote:

> Hi @spatel
>
> I've seen some regressions with this (or maybe the similar update in instcombine that replace sub by xor. Those patterns typically involve a sub that is being used as index in a GEP. And I think that both SCEV for regular IR and our downstream machine IR scalar evolution is having a hard time to understand that the xor is a subtract in disguise. So instead of ending up with a tight loop with negative stride for the memory accesses we now end up with xor operations inside the loop. Not quite sure how to deal with the regressions. Maybe the scalar evolution implementations can be improved here. Or maybe our downstream ISel need to select sub instead of xor (if the reverse transform is easy).
>
> My target has lots of different instructions that involve add/sub. We can fold it into "multiply and add/subtract", we can fold it into addressing modes as an offset or by using a post-update on the pointer, we can do add/subtract for both "general purpose" registers and for "pointer" registers. However, logical operations are more limited (specially when it comes to pointer arithmetic since we also will end up moving values between GPR:s that can do the logical operations and the pointer registers that can be used for addressing).
>
> Since this is a one instruction instead of another single instruction (we do not reduce amount of instructions in these combines. I'm interested to understand what deems a xor to be better than sub. Are logical operations considered "better" than arithmetic operations in general, or what is the rule?

Thanks for letting me know. The main codegen motivation for this transform is that it can allow using an immediate-form xor instruction rather than a separate load of the immediate when used as Op0 of a subtract. That kind of improvement is seen in the RISCV and AArch64 diffs. Our folds for bitwise logic tend to be better than sub too, and known-bits / demanded-bits are also easier with xor. The bit-tracking improvement is why I figured we should also do this in instcombine.

But it was already a borderline codegen transform because of the AMDGPU diffs, so we do very likely need to restrict this with a TLI hook. If you have examples of the regressions you're seeing that can be translated to an in-tree target, that would be great.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128123/new/

https://reviews.llvm.org/D128123