[PATCH] D130622: [AMDGPU][SIFoldOperands] Clear kills when folding subreg COPY

Carl Ritson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 27 04:14:24 PDT 2022


critson created this revision.
critson added reviewers: foad, rampitec, cdevadas.
Herald added subscribers: kosarev, jsilvanus, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl, arsenm.
Herald added a project: All.
critson requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

Clear all kill flags on super register when folding a COPY of
subregister.  This is necessary because the kills may now be
out of order with the uses.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D130622

Files:
  llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
  llvm/test/CodeGen/AMDGPU/si-fold-copy-sub-reg.mir


Index: llvm/test/CodeGen/AMDGPU/si-fold-copy-sub-reg.mir
===================================================================
--- /dev/null
+++ llvm/test/CodeGen/AMDGPU/si-fold-copy-sub-reg.mir
@@ -0,0 +1,45 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+---
+name:            fold_subreg_kill
+tracksRegLiveness: true
+body:             |
+  ; GCN-LABEL: name: fold_subreg_kill
+  ; GCN: bb.0:
+  ; GCN-NEXT:   successors: %bb.1(0x80000000)
+  ; GCN-NEXT:   liveins: $sgpr0_sgpr1
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr0_sgpr1
+  ; GCN-NEXT:   [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[COPY]](p4), 9, 0 :: (load (s128), align 4, addrspace 4)
+  ; GCN-NEXT:   [[COPY1:%[0-9]+]]:sreg_64_xexec = COPY [[S_LOAD_DWORDX4_IMM]].sub2_sub3
+  ; GCN-NEXT:   [[COPY2:%[0-9]+]]:sreg_64 = COPY [[S_LOAD_DWORDX4_IMM]].sub0_sub1
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.1:
+  ; GCN-NEXT:   [[COPY3:%[0-9]+]]:sreg_32 = COPY [[COPY2]].sub1
+  ; GCN-NEXT:   [[COPY4:%[0-9]+]]:sreg_32 = COPY [[COPY2]].sub0
+  ; GCN-NEXT:   [[COPY5:%[0-9]+]]:sreg_32 = COPY [[COPY1]].sub1
+  ; GCN-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 -1
+  ; GCN-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE killed [[COPY3]], %subreg.sub0, killed [[COPY4]], %subreg.sub1, killed [[COPY5]], %subreg.sub2, killed [[S_MOV_B32_]], %subreg.sub3
+  ; GCN-NEXT:   [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+  ; GCN-NEXT:   BUFFER_STORE_DWORD_OFFSET [[DEF]], killed [[REG_SEQUENCE]], 0, 0, 0, 0, 0, implicit $exec :: (store (s32), addrspace 1)
+  bb.0:
+    liveins: $sgpr0_sgpr1
+
+    %0:sgpr_64(p4) = COPY $sgpr0_sgpr1
+    %1:sgpr_128 = S_LOAD_DWORDX4_IMM %0:sgpr_64(p4), 9, 0 :: (load (s128), align 4, addrspace 4)
+    %2:sreg_64_xexec = COPY %1.sub0_sub1:sgpr_128
+    %3:sreg_64_xexec = COPY killed %1.sub2_sub3:sgpr_128
+    %4:sreg_64 = COPY %2:sreg_64_xexec
+    %5:sreg_32 = COPY %3.sub1:sreg_64_xexec
+
+  bb.1:
+    %6:sreg_32 = COPY %4.sub1:sreg_64
+    %7:sreg_32 = COPY %4.sub0:sreg_64
+    %8:sreg_32 = COPY %5:sreg_32
+    %9:sreg_32 = S_MOV_B32 -1
+    %10:sgpr_128 = REG_SEQUENCE killed %6:sreg_32, %subreg.sub0, killed %7:sreg_32, %subreg.sub1, killed %8:sreg_32, %subreg.sub2, killed %9:sreg_32, %subreg.sub3
+    %11:vgpr_32 = IMPLICIT_DEF
+    BUFFER_STORE_DWORD_OFFSET %11:vgpr_32, killed %10:sgpr_128, 0, 0, 0, 0, 0, implicit $exec :: (store (s32), addrspace 1)
+...
+
Index: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -737,6 +737,15 @@
       CopiesToReplace.push_back(UseMI);
       OpToFold.setIsKill(false);
 
+      // If this is a COPY of a sub-register all other kill flags on the
+      // super register must be cleared as it is possible kills are now
+      // out of order.
+      if (OpToFold.getSubReg()) {
+        for (auto &Use : MRI->use_operands(OpToFold.getReg())) {
+          Use.setIsKill(false);
+        }
+      }
+
       // That is very tricky to store a value into an AGPR. v_accvgpr_write_b32
       // can only accept VGPR or inline immediate. Recreate a reg_sequence with
       // its initializers right here, so we will rematerialize immediates and


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D130622.447999.patch
Type: text/x-patch
Size: 3478 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220727/37903a59/attachment.bin>


More information about the llvm-commits mailing list