[PATCH] D57703: [AMDGPU] Consider XOR in waterfall loop as a terminator

Scott Linder via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 4 12:47:20 PST 2019


scott.linder updated this revision to Diff 185132.
scott.linder added a comment.

I agree the test does not test much at all, it was just the minimum I could think of considering none of our existing tests notice the change. I'm not sure what pass you mean by 'isel', but 'stop-after=amdgpu-isel' is too early to see the SI_INDIRECT_SRC_* psuedo expanded.

You mentioned to try at -O0 and I see spill code being inserted between the xor and branch in the current trunk. I've updated the test to just go to ISA and confirm there is no intervening instruction between the xor and branch, but I don't know if this is what you had in mind either.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57703/new/

https://reviews.llvm.org/D57703

Files:
  lib/Target/AMDGPU/SIISelLowering.cpp
  test/CodeGen/AMDGPU/indirect-addressing-term.ll


Index: test/CodeGen/AMDGPU/indirect-addressing-term.ll
===================================================================
--- /dev/null
+++ test/CodeGen/AMDGPU/indirect-addressing-term.ll
@@ -0,0 +1,20 @@
+; RUN: llc -O0 -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN %s
+; RUN: llc -O0 -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN %s
+
+; Verify that we consider the xor at the end of the waterfall loop emitted for
+; divergent indirect addressing as a terminator.
+
+declare i32 @llvm.amdgcn.workitem.id.x() #1
+
+; There should be no spill code inserted between the xor and the real terminator
+; GCN-LABEL: extract_w_offset_vgpr:
+; GCN: s_xor_b64 exec, exec,
+; GCN-NEXT: s_cbranch_execnz
+define amdgpu_kernel void @extract_w_offset_vgpr(i32 addrspace(1)* %out) {
+entry:
+  %id = call i32 @llvm.amdgcn.workitem.id.x() #1
+  %index = add i32 %id, 1
+  %value = extractelement <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, i32 %index
+  store i32 %value, i32 addrspace(1)* %out
+  ret void
+}
Index: lib/Target/AMDGPU/SIISelLowering.cpp
===================================================================
--- lib/Target/AMDGPU/SIISelLowering.cpp
+++ lib/Target/AMDGPU/SIISelLowering.cpp
@@ -2931,7 +2931,7 @@
 
   // Update EXEC, switch all done bits to 0 and all todo bits to 1.
   MachineInstr *InsertPt =
-    BuildMI(LoopBB, I, DL, TII->get(AMDGPU::S_XOR_B64), AMDGPU::EXEC)
+    BuildMI(LoopBB, I, DL, TII->get(AMDGPU::S_XOR_B64_term), AMDGPU::EXEC)
     .addReg(AMDGPU::EXEC)
     .addReg(NewExec);
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D57703.185132.patch
Type: text/x-patch
Size: 1778 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190204/32b1f765/attachment.bin>


More information about the llvm-commits mailing list