[llvm] ccfabfb - Fix subrange liveness checking at rematerialization

Tue Aug 16 10:50:20 PDT 2022

Author: Nicolas Miller
Date: 2022-08-16T10:50:09-07:00
New Revision: ccfabfbb1f91ffb936f7d0ce877a24c1c1ebaea8

URL: https://github.com/llvm/llvm-project/commit/ccfabfbb1f91ffb936f7d0ce877a24c1c1ebaea8
DIFF: https://github.com/llvm/llvm-project/commit/ccfabfbb1f91ffb936f7d0ce877a24c1c1ebaea8.diff

LOG: Fix subrange liveness checking at rematerialization

This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.

The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.

This patch also adds a case to the original test for the subrange checking that trigger the issue described above.

The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278

And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209

Essentially the greedy register allocator attempts to move the following instruction:

```
%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec
```

>From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges:

```
%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0 at 2224r 1 at 2240r  L0000000000000003 [2224r,3440r:0) 0 at 2224r  L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0 at 2240r
```

So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load.

On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.

With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.

Original Author: npmiller

Reviewed by: rampitec

Submitted by: rampitec

Differential Revision: https://reviews.llvm.org/D131884

Added: 
    

Modified: 
    llvm/lib/CodeGen/LiveRangeEdit.cpp
    llvm/test/CodeGen/AMDGPU/remat-dead-subreg.mir

Removed: 
    


################################################################################
diff  --git a/llvm/lib/CodeGen/LiveRangeEdit.cpp b/llvm/lib/CodeGen/LiveRangeEdit.cpp
index abf36b3f4c674..0d599fc09ffa0 100644

--- a/llvm/lib/CodeGen/LiveRangeEdit.cpp
+++ b/llvm/lib/CodeGen/LiveRangeEdit.cpp
@@ -134,9 +134,11 @@ bool LiveRangeEdit::allUsesAvailableAt(const MachineInstr *OrigMI,
       return false;
 
     // Check that subrange is live at UseIdx.
-    if (MO.getSubReg()) {
+    if (li.hasSubRanges()) {
       const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
-      LaneBitmask LM = TRI->getSubRegIndexLaneMask(MO.getSubReg());
+      unsigned SubReg = MO.getSubReg();
+      LaneBitmask LM = SubReg ? TRI->getSubRegIndexLaneMask(SubReg)
+                              : MRI.getMaxLaneMaskForVReg(MO.getReg());
       for (LiveInterval::SubRange &SR : li.subranges()) {
         if ((SR.LaneMask & LM).none())
           continue;

diff  --git a/llvm/test/CodeGen/AMDGPU/remat-dead-subreg.mir b/llvm/test/CodeGen/AMDGPU/remat-dead-subreg.mir
index a306d8eb12464..a9e49456a7d90 100644
--- a/llvm/test/CodeGen/AMDGPU/remat-dead-subreg.mir
+++ b/llvm/test/CodeGen/AMDGPU/remat-dead-subreg.mir
@@ -79,3 +79,28 @@ body:             |
     %6:vreg_64 = V_MOV_B64_PSEUDO %2, implicit $exec
     S_NOP 0, implicit %1.sub0, implicit %1.sub3
 ...
+---
+name:            dead_subreg_whole_reg
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    ; GCN-LABEL: name: dead_subreg_whole_reg
+    ; GCN: $m0 = IMPLICIT_DEF
+    ; GCN-NEXT: renamable $sgpr0_sgpr1 = S_MOV_B64 1, implicit $m0
+    ; GCN-NEXT: renamable $sgpr2_sgpr3 = S_MOV_B64 renamable $sgpr0_sgpr1
+    ; GCN-NEXT: SI_SPILL_S64_SAVE killed renamable $sgpr2_sgpr3, %stack.0, implicit $exec, implicit $sp_reg :: (store (s64) into %stack.0, align 4, addrspace 5)
+    ; GCN-NEXT: renamable $sgpr4_sgpr5 = S_MOV_B64 2, implicit $m0
+    ; GCN-NEXT: renamable $sgpr2_sgpr3 = S_MOV_B64 3, implicit $m0
+    ; GCN-NEXT: dead %4:vgpr_32 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit killed $sgpr4_sgpr5, implicit killed $sgpr2_sgpr3
+    ; GCN-NEXT: renamable $sgpr2_sgpr3 = SI_SPILL_S64_RESTORE %stack.0, implicit $exec, implicit $sp_reg :: (load (s64) from %stack.0, align 4, addrspace 5)
+    ; GCN-NEXT: dead %5:vreg_64 = V_MOV_B64_PSEUDO killed $sgpr2_sgpr3, implicit $exec
+    ; GCN-NEXT: S_NOP 0, implicit killed renamable $sgpr0
+    $m0 = IMPLICIT_DEF
+    %0:sreg_64_xexec = S_MOV_B64 1, implicit $m0
+    %1:sreg_64 = S_MOV_B64 %0:sreg_64_xexec
+    %2:sreg_64 = S_MOV_B64 2, implicit $m0
+    %3:sreg_64 = S_MOV_B64 3, implicit $m0
+    %4:vgpr_32 = V_MOV_B32_e32 %0.sub0:sreg_64_xexec, implicit $exec, implicit %2, implicit %3
+    %5:vreg_64 = V_MOV_B64_PSEUDO %1, implicit $exec
+    S_NOP 0, implicit %0.sub0
+...