[llvm-branch-commits] [llvm] [AMDGPU] Support Wave Reduction for true-16 types - 1 (PR #194809)

Thu Apr 30 06:06:51 PDT 2026

================
@@ -6154,9 +6165,18 @@ static MachineBasicBlock *lowerWaveReduce(MachineInstr &MI,
       BuildMI(*ComputeLoop, I, DL, TII->get(SFFOpc), FF1Reg)
           .addReg(ActiveBitsReg);
       if (is32BitOpc || is16BitOpc) {
+        Register ReadLaneSrc = SrcReg;
+        if (useRealTrue16) {
+          // Copy the 16-bit src to a 32-bit vgpr for the v_readlane
+          Register SrcReg32 =
+              MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
----------------
arsenm wrote:

This can't be a straight copy, you need to widen to the 32-bit register with REG_SEQUENCE + IMPLICIT_DEF or INSERT_SUBREG 

https://github.com/llvm/llvm-project/pull/194809