[PATCH] D107692: [DAGCombine] Prevent the transform of combine for multi-use operand

Mon Aug 9 06:50:00 PDT 2021

spatel added reviewers: dmgreen, lebedev.ri, craig.topper, spatel, RKSimon.
spatel added a comment.

The transform that we want to make can be shown with this example (and so this test should be added somewhere as a preliminary commit):

  define i32 @src(i32 %x, i32 %y) {
    %add = add i32 %x, 65535 ; --> turn this into -1 because that can be encoded as an immediate operand
    %srl = lshr i32 %y, 16
    %r = and i32 %srl, %add
    ret i32 %r
  }

https://alive2.llvm.org/ce/z/Ehadpq

For AArch and PowerPC and possibly all targets (x86 would benefit too), we want this to use a -1 (or `sub 1`), not an `add 65535`:

  sub w8, w0, #1
  and w0, w8, w1, lsr #16

vs.

  mov w8, #65535
  add w8, w0, w8
  and w0, w8, w1, lsr #16

But if you try this test with main today, it doesn't work:
https://godbolt.org/z/fe7h1zYv6

...because this transform has another bug: it doesn't match the commuted variant.

Over in PR51321, I mentioned that I could delete the whole transform and not see any test changes. That's because SimplifyDemandedBits does the same transform. But it has a similar problem - it only catches the case where it processes the shift before it sees the add. So it misses the same commuted pattern. I'm not sure how we'd fix that in DemandedBits; it seems like a generic ordering problem for commutative ops.

So this transform has at least 3 problems (assuming we want to keep it instead of enhancing DemandedBits):

1. It miscompiles the multi-use case because it is coded weirdly and missed a constraint.
2. It misses the commuted case.
3. It misses what is likely profitable for all targets by overspecifying the legality constraints.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107692/new/

https://reviews.llvm.org/D107692