[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations
Tony Tye via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 20 00:56:42 PDT 2020
t-tye added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt.mir:66
# CHECK: FLAT_LOAD_DWORD
-# GFX89: S_WAITCNT 112
+# GFX89: S_WAITCNT 3952
# CHECK: FLAT_LOAD_DWORDX4
----------------
rampitec wrote:
> t-tye wrote:
> > rampitec wrote:
> > > That one was not supposed to change? The pointer is flat here.
> > Yes. Previously it was "s_waitcnt vmcnt(0) lgkmcnt(0)". Now it is "s_waitcnt vmcnt(0)" as the address space of global16 is 1 which is GLOBAL. Therefore there is no need to wait on LGKM.
> It is not global, it is flat:
>
>
> ```
> <4 x i32>* %flat16
> ...
> $vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.flat16)
> ```
But isn't this test checking:
$vgpr0 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.global4)
$vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.global16)
These are referencing global4 and global16 which are:
i32 addrspace(1)* %global4,
<4 x i32> addrspace(1)* %global16
Which are both marked as the global (1) not flat (0) address space.
Am I missing something?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D89618/new/
https://reviews.llvm.org/D89618
More information about the llvm-commits
mailing list