[llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for bitfield extract (PR #132381)

Nicolai Hähnle via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 24 13:17:52 PDT 2025


================
@@ -225,6 +228,103 @@ void RegBankLegalizeHelper::lower(MachineInstr &MI,
     MI.eraseFromParent();
     break;
   }
+  case Div_BFE: {
+    Register Dst = MI.getOperand(0).getReg();
+    assert(MRI.getType(Dst) == LLT::scalar(64));
+    bool Signed = isa<GIntrinsic>(MI) ? MI.getOperand(1).getIntrinsicID() ==
+                                            Intrinsic::amdgcn_sbfe
+                                      : MI.getOpcode() == AMDGPU::G_SBFX;
+    unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
+    // Extract bitfield from Src, LSBit is the least-significant bit for the
+    // extraction (field offset) and Width is size of bitfield.
+    Register Src = MI.getOperand(FirstOpnd).getReg();
+    Register LSBit = MI.getOperand(FirstOpnd + 1).getReg();
+    Register Width = MI.getOperand(FirstOpnd + 2).getReg();
+    // Comments are for signed bitfield extract, similar for unsigned. x is sign
+    // bit. s is sign, l is LSB and y are remaining bits of bitfield to extract.
+
+    // Src >> LSBit Hi|Lo: x?????syyyyyyl??? -> xxxx?????syyyyyyl
+    unsigned SHROpc = Signed ? AMDGPU::G_ASHR : AMDGPU::G_LSHR;
+    auto SHRSrc = B.buildInstr(SHROpc, {{VgprRB, S64}}, {Src, LSBit});
+
+    auto ConstWidth = getIConstantVRegValWithLookThrough(Width, MRI);
+
+    // Expand to Src >> LSBit << (64 - Width) >> (64 - Width)
+    // << (64 - Width): Hi|Lo: xxxx?????syyyyyyl -> syyyyyyl000000000
+    // >> (64 - Width): Hi|Lo: syyyyyyl000000000 -> ssssssssssyyyyyyl
+    if (!ConstWidth) {
+      auto Amt = B.buildSub(VgprRB_S32, B.buildConstant(SgprRB_S32, 64), Width);
+      auto SignBit = B.buildShl({VgprRB, S64}, SHRSrc, Amt);
+      B.buildInstr(SHROpc, {Dst}, {SignBit, Amt});
+      MI.eraseFromParent();
+      return;
+    }
----------------
nhaehnle wrote:

Why do you do shr -> shl -> shr? It's cheaper to just do shl -> shr with appropriate widths.

Another issue here: What if only the data source operand is divergent, and the other inputs are uniform? I feel like we should handle that appropriately.

https://github.com/llvm/llvm-project/pull/132381


More information about the llvm-commits mailing list