[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations

Tony Tye via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 20 14:28:17 PDT 2020


t-tye marked 2 inline comments as done.
t-tye added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt.mir:103
+    $vgpr3 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.flat4)
+    $vgpr4 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.global4)
+    $vgpr0 = V_MOV_B32_e32 $vgpr3, implicit $exec
----------------
rampitec wrote:
> Can you keep just load from flat here? The other load obscures the result.
Add the extra BB3 you suggested.

The waitcnts being generated seem correct from my inspection.


================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt.mir:66
 # CHECK: FLAT_LOAD_DWORD
-# GFX89: S_WAITCNT 112
+# GFX89: S_WAITCNT 3952
 # CHECK: FLAT_LOAD_DWORDX4
----------------
rampitec wrote:
> rampitec wrote:
> > t-tye wrote:
> > > rampitec wrote:
> > > > t-tye wrote:
> > > > > rampitec wrote:
> > > > > > That one was not supposed to change? The pointer is flat here.
> > > > > Yes. Previously it was "s_waitcnt vmcnt(0) lgkmcnt(0)". Now it is "s_waitcnt vmcnt(0)" as the address space of global16 is 1 which is GLOBAL. Therefore there is no need to wait on LGKM.
> > > > It is not global, it is flat:
> > > > 
> > > > 
> > > > ```
> > > > <4 x i32>* %flat16
> > > > ...
> > > >     $vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.flat16)
> > > > ```
> > > But isn't this test checking:
> > > 
> > >     $vgpr0 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.global4)
> > >     $vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.global16)
> > > 
> > > These are referencing global4 and global16 which are:
> > > 
> > >   i32 addrspace(1)* %global4,
> > >   <4 x i32> addrspace(1)* %global16
> > > 
> > > Which are both marked as the global (1) not flat (0) address space.
> > > 
> > > Am I missing something?
> > No, it is not. Note it first checks label bb.2. And after it:
> > 
> >     $vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.flat16)
> > 
> > It is flat pointer. Not global.
> > 
> > Think about the testcase itself: it is a standalone function (not kernel) taking a generic pointer. You are checking for the question: "is is this DEFINITELY an LDS pointer?" The answer is no, so you say: "this is DEFINITELY NOT an LDS pointer".
> Or VMEM for that matter.
I believe the waitcnts are correct, and added the extra test you recommended.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89618/new/

https://reviews.llvm.org/D89618



More information about the llvm-commits mailing list