[PATCH] D159533: [DAG] getNode() - fold (zext (trunc x)) -> x iff the upper bits are known zero - add SRL support

Wed Sep 20 17:36:40 PDT 2023

jrbyrnes added a comment.

Can you please bring in tests from https://github.com/llvm/llvm-project/pull/66965

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10775
+    auto ShiftOp = dyn_cast<ConstantSDNode>(Op->getOperand(2));
+    if (!ShiftOp)
+      return std::nullopt;
----------------
Can we also fail on vector types

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10791
+                                       Depth + 1, StartingIndex)
+               : calculateByteProvider(Op.getOperand(0),
+                                       Index - (BytesProvided - ByteShift),
----------------
This seems to be broken if Index = 1; BytesProvided; = 2 and ByteShift = 3 -- we will attempt to get the 2nd byte of a 16 bit operand which is out of range (only 0,1 are valid bytes).

Perhaps it was meant to be: `Index + (BytesProvided - ByteShift)` ? But this still doesnt account for the case where `ByteShift > n * BytesProvided`. 

How do you feel about

```
  case ISD::FSHR: {
    auto ShiftOp = dyn_cast<ConstantSDNode>(Op->getOperand(2));
    if (!ShiftOp || Op.getValueType().isVector())
      return std::nullopt;

    uint64_t BitsProvided = Op.getValueSizeInBits();
    if (BitsProvided % 8 != 0)
      return std::nullopt;

    uint64_t BitShift = ShiftOp->getAPIntValue().urem(BitsProvided);
    if (BitShift % 8)
      return std::nullopt;

    uint64_t ConcatSizeInBytes = BitsProvided / 4;
    uint64_t ByteShift = BitShift / 8;

    auto NewIndex = (Index + ByteShift) % ConcatSizeInBytes;
    auto BytesProvided = BitsProvided / 8;
    auto NextOp = Op.getOperand(NewIndex >= BytesProvided ? 0 : 1);
    NewIndex %= BytesProvided;
    return calculateSrcByte(NextOp, StartingIndex, NewIndex);
  }
```

================
Comment at: llvm/test/CodeGen/AMDGPU/permute.ll:127
 ; GCN-NEXT:    s_waitcnt vmcnt(0)
-; GCN-NEXT:    v_alignbit_b32 v2, v2, s0, 24
+; GCN-NEXT:    v_perm_b32 v2, s0, v2, v3
 ; GCN-NEXT:    flat_store_dword v[0:1], v2
----------------
I can look into this after this patch lands.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159533/new/

https://reviews.llvm.org/D159533