[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 20 01:16:48 PDT 2020
rampitec added a comment.
In D89618#2341074 <https://reviews.llvm.org/D89618#2341074>, @t-tye wrote:
> In D89618#2340966 <https://reviews.llvm.org/D89618#2340966>, @rampitec wrote:
>
>> JFYI how much it will help actual programs after it is fixed is unclear. It will likely change a lot of lit tests, but actual effect on real programs would depend on FE and language rules. And inlining of course, as usual.
>
> It did change 46 lit tests. I agree it is unclear how much it will help. But the GLOBAL and SCRATCH flat operations seem like they may avoid the pessimistic waitcnt 0.
Right. Out of these 46 lit tests I was looking for for a very specific one, wanting to ask to write one if it does not exist. This one does exist and it is failing.
================
Comment at: llvm/test/CodeGen/AMDGPU/waitcnt.mir:66
# CHECK: FLAT_LOAD_DWORD
-# GFX89: S_WAITCNT 112
+# GFX89: S_WAITCNT 3952
# CHECK: FLAT_LOAD_DWORDX4
----------------
t-tye wrote:
> rampitec wrote:
> > t-tye wrote:
> > > rampitec wrote:
> > > > That one was not supposed to change? The pointer is flat here.
> > > Yes. Previously it was "s_waitcnt vmcnt(0) lgkmcnt(0)". Now it is "s_waitcnt vmcnt(0)" as the address space of global16 is 1 which is GLOBAL. Therefore there is no need to wait on LGKM.
> > It is not global, it is flat:
> >
> >
> > ```
> > <4 x i32>* %flat16
> > ...
> > $vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.flat16)
> > ```
> But isn't this test checking:
>
> $vgpr0 = FLAT_LOAD_DWORD $vgpr1_vgpr2, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %ir.global4)
> $vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.global16)
>
> These are referencing global4 and global16 which are:
>
> i32 addrspace(1)* %global4,
> <4 x i32> addrspace(1)* %global16
>
> Which are both marked as the global (1) not flat (0) address space.
>
> Am I missing something?
No, it is not. Note it first checks label bb.2. And after it:
$vgpr3_vgpr4_vgpr5_vgpr6 = FLAT_LOAD_DWORDX4 $vgpr7_vgpr8, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16 from %ir.flat16)
It is flat pointer. Not global.
Think about the testcase itself: it is a standalone function (not kernel) taking a generic pointer. You are checking for the question: "is is this DEFINITELY an LDS pointer?" The answer is no, so you say: "this is DEFINITELY NOT an LDS pointer".
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D89618/new/
https://reviews.llvm.org/D89618
More information about the llvm-commits
mailing list