[PATCH] D89618: [AMDGPU] Optimize waitcnt insertion for flat memory operations

Tue Oct 20 01:29:36 PDT 2020

rampitec added a comment.

In D89618#2341115 <https://reviews.llvm.org/D89618#2341115>, @t-tye wrote:

> In D89618#2341101 <https://reviews.llvm.org/D89618#2341101>, @rampitec wrote:
>
>> In D89618#2341074 <https://reviews.llvm.org/D89618#2341074>, @t-tye wrote:
>>
>>> In D89618#2340966 <https://reviews.llvm.org/D89618#2340966>, @rampitec wrote:
>>>
>>>> JFYI how much it will help actual programs after it is fixed is unclear. It will likely change a lot of lit tests, but actual effect on real programs would depend on FE and language rules. And inlining of course, as usual.
>>>
>>> It did change 46 lit tests. I agree it is unclear how much it will help. But the GLOBAL and SCRATCH flat operations seem like they may avoid the pessimistic waitcnt 0.
>>
>> Right. Out of these 46 lit tests I was looking for a very specific one, wanting to ask to write one if it does not exist. This one does exist and it is failing.
>
> Which test is failing? All the lit tests are passing on my machine. Or are you questioning the way the CHECK tests have been updated? The original test is marking the FLAT pointer as referencing the GLOBAL address space. I assume this is what the frontend did to match the CUDA language semantics that say kernel arguments can only reference global memory. So I believe the generated code is correct unless I am missing something.

waitcnt.mir. It was updated to pass the testing and this update basically flushes the test. It has nothing to do with CUDA, language is irrelevant here. Even more when we speak about functions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89618/new/

https://reviews.llvm.org/D89618