[llvm-branch-commits] [llvm] [AMDGPU] Support one immediate folding for global load (PR #178608)

Fri Jan 30 05:01:19 PST 2026

================
@@ -2037,13 +2037,36 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, SDValue Addr,
     LHS = Addr.getOperand(0);
 
     if (!LHS->isDivergent()) {
-      // add (i64 sgpr), (*_extend (i32 vgpr))
       RHS = Addr.getOperand(1);
-      ScaleOffset = SelectScaleOffset(N, RHS, Subtarget->hasSignedGVSOffset());
+
       if (SDValue ExtRHS = matchExtFromI32orI32(
               RHS, Subtarget->hasSignedGVSOffset(), CurDAG)) {
+        // add (i64 sgpr), (*_extend (scale (i32 vgpr)))
         SAddr = LHS;
         VOffset = ExtRHS;
+        if (NeedIOffset && !ImmOffset &&
+            CurDAG->isBaseWithConstantOffset(ExtRHS)) {
+          // add (i64 sgpr), (*_extend (add (scale (i32 vgpr)), (i32 imm)))
----------------
arsenm wrote:

alive2 will be correct, assuming you wrote the IR that actually matches the hardware addressing mode. The tricky part is being sure that it matches what the hardware actually does. Your proof matches my understanding of what the hardware does. So yes, the overflow on 32-bit is a problem and this addressing calculation should be reassociated earlier 

https://github.com/llvm/llvm-project/pull/178608