[PATCH] D83862: [AMDGPU] Add missing test prefixes

Fri Jul 17 09:17:46 PDT 2020

rampitec added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/perfhint.ll:33
 ; GCN-LABEL: {{^}}test_large_stride:
-; MemoryBound: 0
-; WaveLimiterHint : 1
+; GCN: MemoryBound: 0
+; GCN: WaveLimiterHint : 1
----------------
foad wrote:
> rampitec wrote:
> > foad wrote:
> > > This check fails.
> > This one is memory bound, there are practically only memory operations here. I think it needs some ALU in between to catch large stride only as intended.
> OK, fixed in f05bce86af32d7b5cf1ab28b3abf6ee473bf3ef1.
Thank you!

================
Comment at: llvm/test/CodeGen/AMDGPU/perfhint.ll:87
+; GCN: MemoryBound: 0
+; GCN: WaveLimiterHint : 0
 define amdgpu_kernel void @test_indirect_through_phi(float addrspace(1)* %arg) {
----------------
foad wrote:
> rampitec wrote:
> > foad wrote:
> > > This check fails. Perhaps D47740 never worked?
> > Looks like it did not :(
> > 
> > Anyway, this case is not memory bound even though it is indirect. This is because we have a single load followed by multiple stores, that was the point of the check.
> The problem is that after AMDGPULowerKernelArguments, the load from %arg looks like this:
> ```
>   %arg.load = load float addrspace(1)*, float addrspace(1)* addrspace(4)* %arg.kernarg.offset.cast, align 4, !invariant.load !0
>   %load = load float, float addrspace(1)* %arg.load, align 8
> ```
> which is indirect. Any ideas?
A-ha! The representation changed, but we did not catch it because of the broken test.

The first load is from constant, so it is uniform. I suppose we can ignore constant address space for this purpose. It creates much less memory traffic.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83862/new/

https://reviews.llvm.org/D83862