[PATCH] D38978: [OpenMP] Enable the lowering of implicitly shared variables in OpenMP GPU-offloaded target regions to the GPU shared memory

Fri Jan 5 02:57:50 PST 2018

gtbercea added a comment.

In https://reviews.llvm.org/D38978#967485, @tra wrote:

> Dotting the 'i's on the questions that were not replied to directly.
>
> In https://reviews.llvm.org/D38978#899205, @tra wrote:
>
> > Considering that device-side code tends to be heavily inlined, it may be prudent to add an option to control the total size of shared memory we allow to be used for this purpose.
>
>
> I'm still curious to hear what do you plan to do when your depot use grows beyond certain limit. At the very least there's the physical limit on shared memory size. Shared memory use also affects how many threads can be launched which has large impact on performance. IMO having some sort of user-controllable threshold would be very desirable.

When shared memory isn't enough to hold the shared depot, global memory will be used instead. That is a scheme which will be covered by a future patch.

> 
> 
>> In case your passes are not executed (or didn't move anything to shared memory), is there any impact on the generated PTX. I.e. can ptxas successfully optimize unused shared memory away?
> 
> This may have been addressed by the no-shared-depot.ll test. It would be nice to add few comments in the tests explaining what they do.

Done.

> 
> 
>> If the code intentionally wants to allocate something in local memory, would the allocation ever be moved to shared memory by your pass? If so, how would I prevent that?
> 
> AFAICT this functionality only applies to functions with `has-nvptx-shared-depot` attribute. Works for me.

That's right.

================
Comment at: test/CodeGen/NVPTX/insert-shared-depot.ll:29
+define void @kernel() #0 {
+; LABEL: @linsert_shared_depot
+  %A = alloca i32, align 4
----------------
tra wrote:
> 'LABEL' is not a check-prefix and `@linsert_shared_depot` is not this function's name, so I'm puzzled what this line is supposed to do.  Did you intend `<prefix>-LABEL: @kernel` ?
> 
> This appears in all the test cases in the patch.
This is modeled after the lower-alloca.ll test which has a similar label. The label is always equal to the name of the test file. In this particular case there is a typo, it should be "insert_shared_depot" not "linsert_shared_depot"

https://reviews.llvm.org/D38978