[PATCH] D20297: AMDGPU/SI: Make kernarg.segment.ptr point to implicit arguments for non HSA

Sat May 21 14:06:36 PDT 2016

jvesely added a comment.

In http://reviews.llvm.org/D20297#436251, @tstellarAMD wrote:

> > > > I think clover should move towards matching the HSA ABI closer. Most of the implicit arguments would then be user SGPR inputs like HSA uses, and the number of implicit args would be reduced.
>
> > 
>
> > > 
>
> > 
>
> > 
>
> > This is a bit confusing. If the implicit args should be in SGPRs, wouldn't we need one intrinsic per implicit arg? and how would the number of implicit args be reduced? the only redundant one is global size (libclc computes it as num_groups * local_size).
>
> > 
>
> > > I agree.  I actually started working on this last week.  I think all implicit args that aren't passed in SGPRs should be at the end of the kernarg segment.
>
> > 
>
> > 
>
> > My idea was to switch work_dim and newly implemented global_offset, as those are already appended (the rest can be switched by patches to libclc and clover). However, doesn't this contradict Matt's suggestion to pass implicit arguments in SGPRs?
>
>
> SGPR space is limited, so we won't be able to pass all implicit arguments this way, so some will need to be added to the kernarg buffer.  The types of values that should be passed in SGPRs are things that tend to be common across all runtimes, like work-group/work-item size, scratch buffer pointers, etc.

OK, so to be specific about currently passed information.
workdim, wg_size, num_group should be eventually passed in SGPRs and therefore should have their own intrinsic, correct?
should those values also be duplicated in the kernarg segment?

the rest (global_size, global_offset) are loaded via kernargs segment ptr

do you still want two pointers (beginning of kernarg and beginning of implicit args) for mesa, or is it ok to have one if all the information is present at the appended location?

In http://reviews.llvm.org/D20297#436254, @arsenm wrote:

> We could do what we are planning for OpenCL and add a struct pointer for future expansion at the end of some decided N bytes to reserve.
>
> If 256 bytes were reserved at the beginning, that would be more than enough for any implicit needs and would keep the user arg base alignment the same (not sure that really matters).

reserving 256 bytes at the beginning would waste half a KC block for r600 (and might run into limitations for other hw), so clover is probably best of appending the information to minimize space (it should work until there are kernels with variable parameters). radeonsi driver is free to change this for GCN hw.

Repository:
  rL LLVM

http://reviews.llvm.org/D20297