[PATCH] D48537: AMDGPU: Add pass to lower kernel arguments to loads

Mon Jun 25 12:32:26 PDT 2018

arsenm added a comment.

In https://reviews.llvm.org/D48537#1142536, @rampitec wrote:

> In general I believe having distinct loads for kernel arguments is a right thing to do. Of course that would be preferable to keep only one mechanism and have no dual support like for noalis and SI LDS.
>  Couple notes though:
>
> 1. I think we need to have a distinct addresspace for kernarg, not just constant.
> 2. This would be handy to set names on the argument loads derived from either original argument names if present, or from an argument number. This patch introduces geps with immediate offsets which would render IR unreadable.

1. I think there's no need for a distinct address space, and would just add unnecessary complexity for all the places that would now need to consider yet another address space. Ideally I would like to eventually eliminate even constant address space. We were already using just constant for the intrinsic
2. I'm not sure what you mean. For debugging purposes the replaced value does use a derived name

Most of the benefit of noalias is probably attainable with the existing metadata, except for the vectorization problem (in which case something is probably needed to annotate inttoptr). The range problem also requires some kind of new metadata mechanism for marking the pointer ranges that may or may not be worth the effort.

Also, if this were to split argument loads into 4-byte pieces it could workaround the limitation of the vectorizer where it can't merge different sized types

================
Comment at: test/CodeGen/AMDGPU/extract_vector_elt-i16.ll:117
 ; GCN: {{buffer|global}}_store_short
-define amdgpu_kernel void @dynamic_extract_vector_elt_v3i16(i16 addrspace(1)* %out, <3 x i16> %foo, i32 %idx) #0 {
+define amdgpu_kernel void @dynamic_extract_vector_elt_v3i16(i16 addrspace(1)* %out, [8 x i32], <3 x i16> %foo, i32 %idx) #0 {
   %p0 = extractelement <3 x i16> %foo, i32 %idx
----------------
rampitec wrote:
> Why do you need all of that explicit padding in many tests?
It's enough that they can't be merged with unused gaps if the vectorizer supported that. Most tests are looking for loads of specific values, and it's harder / impossible to easily check the correct value with the merged load.

https://reviews.llvm.org/D48537