[PATCH] D31161: [AMDGPU] New Waitcnt Insertion Pass

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 23 11:27:55 PDT 2017


rampitec added a comment.

In https://reviews.llvm.org/D31161#709081, @kzhuravl wrote:

> In https://reviews.llvm.org/D31161#709066, @rampitec wrote:
>
> > Please add test with s_barrier.
>
>
> https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll


How about waitcnt operands in that test?



================
Comment at: lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1017
+  if (MI.getOpcode() == AMDGPU::S_BARRIER && ST->needWaitcntBeforeBarrier()) {
+    ForceZero = true;
+    EmitSwaitcnt = true;
----------------
kzhuravl wrote:
> rampitec wrote:
> > kzhuravl wrote:
> > > @rampitec wrote:
> > > For a barrier it will always insert strongest:
> > > 
> > > s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
> > > 
> > > regardless of what was an argument of the barrier.
> > > 
> > > This also seems to completely ignore atomic fences inserted around the barrier from the library, which shall be a real source of wait argument. Note, that semantics of needWaitcntBeforeBarrier() is not that we always need to insert wait with barrier, but that we may need to insert it.
> > > 
> > > Also note that existing pass does not seem to do it for a barrier.
> > For a barrier it will always insert strongest:
> > 
> > s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
> > 
> > regardless of what was an argument of the barrier.
> > 
> > This also seems to completely ignore atomic fences inserted around the barrier from the library, which shall be a real source of wait argument. Note, that semantics of needWaitcntBeforeBarrier() is not that we always need to insert wait with barrier, but that we may need to insert it.
> > 
> > Also note that existing pass does not seem to do it for a barrier.
> Atomics are currently handled in a separate pass, which we determined to be conservatively correct. We plan to integrate atomics into waitcnt insertion but later and in a separate patch.
> 
> The *needWaitcntBeforeBarrier* tells you whether you need a waitcnt before the barrier or not. For >=GFX9, waitcnt is automatically inserted before the barrier, so we do not need to generate it, and *needWaitcntBeforeBarrier* returns false for >=GFX9.
> 
> > Also note that existing pass does not seem to do it for a barrier.
> https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AMDGPU/SIInsertWaits.cpp#L633
Barrier itself does not need fence. OpenCL barrier needs fence, but these are generated in the library.


https://reviews.llvm.org/D31161





More information about the llvm-commits mailing list