[PATCH] D62100: [DAGCombine][X86][AMDGPU][AArch64] (srl (shl x, c1), c2) with c1 != c2 handling

Sat May 18 15:18:17 PDT 2019

lebedev.ri added subscribers: arsenm, compnerd.
lebedev.ri marked 2 inline comments as done.
lebedev.ri added a comment.

Looked at changes:

- I'll leave x86 vector stuff for later. since i actually wanted to look into reverse trasnform, and looked into this only for consistency.
- I don't know what to do with AArch64 regression. I can hide it with `shouldFoldConstantShiftPairToMask()`, but it is there regardless (tests added). Thoughts?
- That leaves AMDGPU?

================
Comment at: test/CodeGen/AArch64/arm64-bitfield-extract.ll:396
 ; LLC-NEXT:    ldr w8, [x0]
+; LLC-NEXT:    and w8, w8, #0x3ffffff8
 ; LLC-NEXT:    bfxil w8, w1, #16, #3
----------------
After actually looking at `-debug` output, this regression happens because of `SimplifyDemandedBits()`,
which ignores `AArch64TargetLowering::isDesirableToCommuteWithShift()` override.
So when we get to `AArch64TargetLowering::isBitfieldExtractOp()`, we have
```
t18: i32 = srl t15, Constant:i64<2>
  t15: i32 = or t26, t24
    t26: i32 = and t7, Constant:i32<1073741816>
      t7: i32,ch = load<(load 4 from %ir.y, align 8)> t0, t2, undef:i64
        t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t1: i64 = Register %0
        t6: i64 = undef
      t25: i32 = Constant<1073741816>
    t24: i32 = and t12, Constant:i32<4>
      t12: i32 = srl t4, Constant:i64<16>
        t4: i32,ch = CopyFromReg t0, Register:i32 %1
          t3: i32 = Register %1
        t11: i64 = Constant<16>
      t23: i32 = Constant<4>
  t17: i64 = Constant<2>
```
I'm not sure how to turn this pattern on it's head to produce `ubfx` again.

================
Comment at: test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll:686-687
 ; SI-NEXT:    s_waitcnt vmcnt(0)
-; SI-NEXT:    v_lshlrev_b32_e32 v0, 31, v0
-; SI-NEXT:    v_lshrrev_b32_e32 v0, 1, v0
+; SI-NEXT:    v_lshlrev_b32_e32 v0, 30, v0
+; SI-NEXT:    v_and_b32_e32 v0, 2.0, v0
 ; SI-NEXT:    buffer_store_dword v0, off, s[4:7], 0
----------------
@arsenm will AMDGPU prefer 2 shifts or shift+mask here?

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62100/new/

https://reviews.llvm.org/D62100