[PATCH] D18988: AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Tom Stellard via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 11 12:49:08 PDT 2016
tstellarAMD created this revision.
tstellarAMD added reviewers: arsenm, nhaehnle.
tstellarAMD added a subscriber: llvm-commits.
Herald added a subscriber: arsenm.
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation fixes a hang with the piglit
test:
spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra
http://reviews.llvm.org/D18988
Files:
lib/Target/AMDGPU/SIInsertWaits.cpp
test/CodeGen/AMDGPU/missing-store.ll
test/CodeGen/AMDGPU/salu-to-valu.ll
Index: test/CodeGen/AMDGPU/salu-to-valu.ll
===================================================================
--- test/CodeGen/AMDGPU/salu-to-valu.ll
+++ test/CodeGen/AMDGPU/salu-to-valu.ll
@@ -56,6 +56,7 @@
; SI: s_movk_i32 [[OFFSET:s[0-9]+]], 0x2ee0
; GCN: v_readfirstlane_b32 s[[PTR_LO:[0-9]+]], v{{[0-9]+}}
; GCN: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}
+; SI-NEXT: s_nop
; SI: s_load_dword [[OUT:s[0-9]+]], s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, [[OFFSET]]
; CI: s_load_dword [[OUT:s[0-9]+]], s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0xbb8
; GCN: v_mov_b32_e32 [[V_OUT:v[0-9]+]], [[OUT]]
Index: test/CodeGen/AMDGPU/missing-store.ll
===================================================================
--- test/CodeGen/AMDGPU/missing-store.ll
+++ test/CodeGen/AMDGPU/missing-store.ll
@@ -10,6 +10,7 @@
; SI: buffer_store_dword
; SI: v_readfirstlane_b32 s[[PTR_LO:[0-9]+]], v{{[0-9]+}}
; SI: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}
+; SI-NEXT: s_nop
; SI: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}
; SI: buffer_store_dword
; SI: s_endpgm
Index: lib/Target/AMDGPU/SIInsertWaits.cpp
===================================================================
--- lib/Target/AMDGPU/SIInsertWaits.cpp
+++ lib/Target/AMDGPU/SIInsertWaits.cpp
@@ -601,6 +601,12 @@
insertDPPWaitStates(I);
}
+ // Insert required wait states for SMRD reading an SGPR written by a VALU
+ // instruction.
+ if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS &&
+ I->getOpcode() == AMDGPU::V_READFIRSTLANE_B32)
+ TII->insertWaitStates(MBB, std::next(I), 4);
+
// Wait for everything before a barrier.
if (I->getOpcode() == AMDGPU::S_BARRIER)
Changes |= insertWait(MBB, I, LastIssued);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D18988.53307.patch
Type: text/x-patch
Size: 1780 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160411/1baa06a2/attachment.bin>
More information about the llvm-commits
mailing list