[llvm] f76ffc1 - [MCP] Invalidate copy for super register in copy source

Jeffrey Byrnes via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 11 09:03:39 PDT 2023


Author: Jeffrey Byrnes
Date: 2023-08-11T09:01:18-07:00
New Revision: f76ffc1f406ecdb5a8329e8a10a56f0ce2f6220c

URL: https://github.com/llvm/llvm-project/commit/f76ffc1f406ecdb5a8329e8a10a56f0ce2f6220c
DIFF: https://github.com/llvm/llvm-project/commit/f76ffc1f406ecdb5a8329e8a10a56f0ce2f6220c.diff

LOG: [MCP] Invalidate copy for super register in copy source

We must also track the super sources of a copy, otherwise we introduce a sort of subtle bug.

Consider:

1.  DEF r0:r1
2.  USE r1
3.  r6:r9 = COPY r10:r13
4.  r14:15 = COPY r0:r1
5.  USE r6
6.. r1:4 = COPY r6:9

BackwardCopyPropagateBlock processes the instructions from bottom up. After processing 6., we will have propagatable copy for r1-r4 and r6-r9. After 5., we invalidate and erase the propagatble copy for r1-r4 and r6 but not for r7-r9.

The issue is that when processing 3., data structures still say we have valid copies for dest regs r7-r9 (from 6.). The corresponding defs for these registers in 6. are r1:r4, which we mark as registers to invalidate. When invalidating, we find the copy that corresponds to r1 is 4. (this was added when processing 4.), and we say that r1 now maps to unpropagatable copies. Thus, when we process 2., we do not have a valid copy, but when we process 1. we do -- because the mapped copy for subregister r0 was never invalidated.

The net result is to propagate the copy from 4. to 1., and replace DEF r0:r1 with DEF r14:r15. Then, we have a use before def in 2.

The main issue is that we have an inconsitent state between which def regs and which src regs are valid. When processing 5., we mark all the defs in 6. as invalid, but only the subreg use as invalid. Either we must only invalidate the individual subreg for both uses and defs, or the super register for both.

Differential Revision: https://reviews.llvm.org//D157564

Change-Id: I99d5e0b1a0d735e8ea3bd7d137b6464690aa9486

Added: 
    llvm/test/CodeGen/AMDGPU/mcp-use-before-def.mir

Modified: 
    llvm/lib/CodeGen/MachineCopyPropagation.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 3453e6c0b8be79..488ef31e1dd1e3 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -134,28 +134,31 @@ class CopyTracker {
                           const TargetInstrInfo &TII, bool UseCopyInstr) {
     // Since Reg might be a subreg of some registers, only invalidate Reg is not
     // enough. We have to find the COPY defines Reg or registers defined by Reg
-    // and invalidate all of them.
-    SmallSet<MCRegister, 8> RegsToInvalidate;
-    RegsToInvalidate.insert(Reg);
+    // and invalidate all of them. Similarly, we must invalidate all of the
+    // the subregisters used in the source of the COPY.
+    SmallSet<MCRegUnit, 8> RegUnitsToInvalidate;
+    auto InvalidateCopy = [&](MachineInstr *MI) {
+      std::optional<DestSourcePair> CopyOperands =
+          isCopyInstr(*MI, TII, UseCopyInstr);
+      assert(CopyOperands && "Expect copy");
+
+      auto Dest = TRI.regunits(CopyOperands->Destination->getReg().asMCReg());
+      auto Src = TRI.regunits(CopyOperands->Source->getReg().asMCReg());
+      RegUnitsToInvalidate.insert(Dest.begin(), Dest.end());
+      RegUnitsToInvalidate.insert(Src.begin(), Src.end());
+    };
+
     for (MCRegUnit Unit : TRI.regunits(Reg)) {
       auto I = Copies.find(Unit);
       if (I != Copies.end()) {
-        if (MachineInstr *MI = I->second.MI) {
-          std::optional<DestSourcePair> CopyOperands =
-              isCopyInstr(*MI, TII, UseCopyInstr);
-          assert(CopyOperands && "Expect copy");
-
-          RegsToInvalidate.insert(
-              CopyOperands->Destination->getReg().asMCReg());
-          RegsToInvalidate.insert(CopyOperands->Source->getReg().asMCReg());
-        }
-        RegsToInvalidate.insert(I->second.DefRegs.begin(),
-                                I->second.DefRegs.end());
+        if (MachineInstr *MI = I->second.MI)
+          InvalidateCopy(MI);
+        if (MachineInstr *MI = I->second.LastSeenUseInCopy)
+          InvalidateCopy(MI);
       }
     }
-    for (MCRegister InvalidReg : RegsToInvalidate)
-      for (MCRegUnit Unit : TRI.regunits(InvalidReg))
-        Copies.erase(Unit);
+    for (MCRegUnit Unit : RegUnitsToInvalidate)
+      Copies.erase(Unit);
   }
 
   /// Clobber a single register, removing it from the tracker's copy maps.

diff  --git a/llvm/test/CodeGen/AMDGPU/mcp-use-before-def.mir b/llvm/test/CodeGen/AMDGPU/mcp-use-before-def.mir
new file mode 100644
index 00000000000000..8ca35d1dd53a35
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/mcp-use-before-def.mir
@@ -0,0 +1,26 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs  -run-pass=machine-cp -o - %s | FileCheck %s
+
+# machine copy prop should not introduce use before def
+---
+name: back_copy_block
+body:           |
+  bb.0:
+    ; CHECK-LABEL: name: back_copy_block
+    ; CHECK: $vgpr20_vgpr21_vgpr22_vgpr23 = IMPLICIT_DEF
+    ; CHECK-NEXT: $vgpr10_vgpr11_vgpr12_vgpr13 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr0_vgpr1 = V_MOV_B64_e64 killed $vgpr20_vgpr21, implicit $exec
+    ; CHECK-NEXT: renamable $vgpr20 = V_MOV_B32_e32 killed $vgpr1, implicit $exec
+    ; CHECK-NEXT: renamable $vgpr6_vgpr7_vgpr8_vgpr9 = COPY renamable $vgpr10_vgpr11_vgpr12_vgpr13
+    ; CHECK-NEXT: renamable $vgpr20 = V_MOV_B32_e32 killed $vgpr6, implicit $exec
+    ; CHECK-NEXT: S_ENDPGM 0, amdgpu_allvgprs
+    $vgpr20_vgpr21_vgpr22_vgpr23 = IMPLICIT_DEF
+    $vgpr10_vgpr11_vgpr12_vgpr13 = IMPLICIT_DEF
+    renamable $vgpr0_vgpr1 = V_MOV_B64_e64 killed renamable $vgpr20_vgpr21, implicit $exec
+    renamable $vgpr20 = V_MOV_B32_e32 killed renamable $vgpr1, implicit $exec
+    renamable $vgpr6_vgpr7_vgpr8_vgpr9 = COPY killed renamable $vgpr10_vgpr11_vgpr12_vgpr13
+    renamable $vgpr14_vgpr15 = COPY killed renamable $vgpr0_vgpr1
+    renamable $vgpr20 = V_MOV_B32_e32 killed renamable $vgpr6, implicit $exec
+    renamable $vgpr1_vgpr2_vgpr3_vgpr4 = COPY killed renamable $vgpr6_vgpr7_vgpr8_vgpr9
+    S_ENDPGM 0, amdgpu_allvgprs
+...


        


More information about the llvm-commits mailing list