[PATCH] D38978: [OpenMP] Enable the lowering of implicitly shared variables in OpenMP GPU-offloaded target regions to the GPU shared memory

Thu Jan 4 10:30:50 PST 2018

tra added a comment.

Dotting the 'i's on the questions that were not replied to directly.

In https://reviews.llvm.org/D38978#899205, @tra wrote:

> Considering that device-side code tends to be heavily inlined, it may be prudent to add an option to control the total size of shared memory we allow to be used for this purpose.

I'm still curious to hear what do you plan to do when your depot use grows beyond certain limit. At the very least there's the physical limit on shared memory size. Shared memory use also affects how many threads can be launched which has large impact on performance. IMO having some sort of user-controllable threshold would be very desirable.

> In case your passes are not executed (or didn't move anything to shared memory), is there any impact on the generated PTX. I.e. can ptxas successfully optimize unused shared memory away?

This may have been addressed by the no-shared-depot.ll test. It would be nice to add few comments in the tests explaining what they do.

> If the code intentionally wants to allocate something in local memory, would the allocation ever be moved to shared memory by your pass? If so, how would I prevent that?

AFAICT this functionality only applies to functions with `has-nvptx-shared-depot` attribute. Works for me.

================
Comment at: lib/Target/NVPTX/NVPTXFunctionDataSharing.cpp:98
+  if (!F.hasFnAttribute("has-nvptx-shared-depot"))
+    return Modified;
+
----------------
Nit: `return false` would match the intent better.

================
Comment at: lib/Target/NVPTX/NVPTXRegisterInfo.td:75
 // Read NVPTXRegisterInfo.cpp to see how VRFrame and VRDepot are used.
-def SpecialRegs : NVPTXRegClass<[i32], 32, (add VRFrame, VRFrameLocal, VRDepot,
+def SpecialRegs : NVPTXRegClass<[i32], 32, (add VRFrame, VRFrameLocal, VRDepot, VRShared, VRFrameShared, VRSharedDepot,
                                             (sequence "ENVREG%u", 0, 31))>;
----------------
Line too long.

================
Comment at: test/CodeGen/NVPTX/insert-shared-depot.ll:4-5
+
+; PTX32: {{.*}}kernel()
+; PTX64: {{.*}}kernel()
+
----------------
You could put common checks under the same label (e.g. `CHECK`) and run tests with `-check-prefixes=PTX32,CHECK`

================
Comment at: test/CodeGen/NVPTX/insert-shared-depot.ll:29
+define void @kernel() #0 {
+; LABEL: @linsert_shared_depot
+  %A = alloca i32, align 4
----------------
'LABEL' is not a check-prefix and `@linsert_shared_depot` is not this function's name, so I'm puzzled what this line is supposed to do.  Did you intend `<prefix>-LABEL: @kernel` ?

This appears in all the test cases in the patch.

https://reviews.llvm.org/D38978