[llvm-branch-commits] [llvm-branch] r271731 - Merging r266105:

Tom Stellard via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Fri Jun 3 13:43:03 PDT 2016


Author: tstellar
Date: Fri Jun  3 15:43:03 2016
New Revision: 271731

URL: http://llvm.org/viewvc/llvm-project?rev=271731&view=rev
Log:
Merging r266105:

------------------------------------------------------------------------
r266105 | thomas.stellard | 2016-04-12 11:40:43 -0700 (Tue, 12 Apr 2016) | 15 lines

AMDGPU/SI: Insert wait states required after v_readfirstlane on SI

Summary:
We will be able to handle this case much better once the hazard
recognizer
is finished, but this conservative implementation  fixes a hang with the
piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

------------------------------------------------------------------------

Modified:
    llvm/branches/release_38/lib/Target/AMDGPU/SIInsertWaits.cpp
    llvm/branches/release_38/test/CodeGen/AMDGPU/missing-store.ll
    llvm/branches/release_38/test/CodeGen/AMDGPU/salu-to-valu.ll

Modified: llvm/branches/release_38/lib/Target/AMDGPU/SIInsertWaits.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/branches/release_38/lib/Target/AMDGPU/SIInsertWaits.cpp?rev=271731&r1=271730&r2=271731&view=diff
==============================================================================
--- llvm/branches/release_38/lib/Target/AMDGPU/SIInsertWaits.cpp (original)
+++ llvm/branches/release_38/lib/Target/AMDGPU/SIInsertWaits.cpp Fri Jun  3 15:43:03 2016
@@ -474,7 +474,7 @@ bool SIInsertWaits::runOnMachineFunction
   TII = static_cast<const SIInstrInfo *>(MF.getSubtarget().getInstrInfo());
   TRI =
       static_cast<const SIRegisterInfo *>(MF.getSubtarget().getRegisterInfo());
-
+  const AMDGPUSubtarget &ST = MF.getSubtarget<AMDGPUSubtarget>();
   MRI = &MF.getRegInfo();
 
   WaitedOn = ZeroCounts;
@@ -493,6 +493,12 @@ bool SIInsertWaits::runOnMachineFunction
     for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
          I != E; ++I) {
 
+      // Insert required wait states for SMRD reading an SGPR written by a VALU
+      // instruction.
+      if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS &&
+          I->getOpcode() == AMDGPU::V_READFIRSTLANE_B32)
+        TII->insertWaitStates(std::next(I), 4);
+
       // Wait for everything before a barrier.
       if (I->getOpcode() == AMDGPU::S_BARRIER)
         Changes |= insertWait(MBB, I, LastIssued);

Modified: llvm/branches/release_38/test/CodeGen/AMDGPU/missing-store.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/branches/release_38/test/CodeGen/AMDGPU/missing-store.ll?rev=271731&r1=271730&r2=271731&view=diff
==============================================================================
--- llvm/branches/release_38/test/CodeGen/AMDGPU/missing-store.ll (original)
+++ llvm/branches/release_38/test/CodeGen/AMDGPU/missing-store.ll Fri Jun  3 15:43:03 2016
@@ -10,6 +10,7 @@
 ; SI: buffer_store_dword
 ; SI: v_readfirstlane_b32 s[[PTR_LO:[0-9]+]], v{{[0-9]+}}
 ; SI: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}
+; SI-NEXT: s_nop
 ; SI: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}
 ; SI: buffer_store_dword
 ; SI: s_endpgm

Modified: llvm/branches/release_38/test/CodeGen/AMDGPU/salu-to-valu.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/branches/release_38/test/CodeGen/AMDGPU/salu-to-valu.ll?rev=271731&r1=271730&r2=271731&view=diff
==============================================================================
--- llvm/branches/release_38/test/CodeGen/AMDGPU/salu-to-valu.ll (original)
+++ llvm/branches/release_38/test/CodeGen/AMDGPU/salu-to-valu.ll Fri Jun  3 15:43:03 2016
@@ -56,6 +56,7 @@ done:
 ; SI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x2ee0
 ; GCN-DAG: v_readfirstlane_b32 s[[PTR_LO:[0-9]+]], v{{[0-9]+}}
 ; GCN: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}
+; SI-NEXT: s_nop
 ; SI: s_load_dword [[OUT:s[0-9]+]], s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, [[OFFSET]]
 ; CI: s_load_dword [[OUT:s[0-9]+]], s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0xbb8
 ; GCN: v_mov_b32_e32 [[V_OUT:v[0-9]+]], [[OUT]]




More information about the llvm-branch-commits mailing list