[Openmp-commits] [PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.

Wed Nov 29 03:23:37 PST 2017

sfantao added inline comments.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/data_sharing.cu:40
+  unsigned long long Mask = __BALLOT_SYNC(0xFFFFFFFF, true);
+  unsigned long long ShNum = 32 - (getThreadId() & DS_Max_Worker_Warp_Size_Log2_Mask);
+  unsigned long long Sh = Mask << ShNum;
----------------
Hahnfeld wrote:
> `DS_Max_Worker_Warp_Size` instead of its hard-coded value? Or is 32 the length of an integer? (Another reason to avoid magic numbers!)
Here 32 is both the GPU warp size and the size of the integer __popc() takes, I agree we should get an enumerator to indicate that. What is tricky here, is that the function assumes that the warp size is the same as the 32-bit integer. This assumption is made in CUDA so, I think is fine we also do it here. 

Actually, here I think the name of the function `getWarpMasterActiveThreadId ` is misleading. This function returns the number of active threads in the current warp whose ID in the warp is lower than the current one. If there are no active threads with lower ID,  then the current thread is the warp master. In other words, this function returns zero, if the current thread is the warp master.

This function seems to only be used in `IsWarpMasterActiveThread`. So I suggest remove the current implementation of `IsWarpMasterActiveThread` and rename this function to `IsWarpMasterActiveThread` and have it return:
```
return __popc(Sh) == 0u;
```

================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:74
+  DS_Max_Worker_Warp_Size_Log2 = 5,
+  DS_Max_Worker_Warp_Size_Log2_Mask = (~0u >> (32-DS_Max_Worker_Warp_Size_Log2)),
+  // The slot size that should be reserved for a working warp.
----------------
grokos wrote:
> Hahnfeld wrote:
> > First, this value needs a comment what it is used for and maybe we can drop the `Log2` is its a mast related to the value, not the logarithm.
> > 
> > From what I understand, this value has has the 5 LSBs set to 1 which should be equivalent to `DS_Max_Worker_Warp_Size - 1`? (this also avoid relying on 32 bit integers)
> I've changed its definition to the proposed `DS_Max_Worker_Warp_Size - 1` which is much clearer. This mask is used to get (threadID mod Warp_Size). I've added a comment.
Using `DS_Max_Worker_Warp_Size - 1` its fine here. This is mostly used to calculate the ID of a thread in its warp, i.e. `threadID() % warpSize` which is equivalent to ` threadID() & DS_Max_Worker_Warp_Size`. The compiler should be smart enough to convert one to the other if `warpSize` is a constant. 

I'd rather prefer do make little optimisations explicit in the code, but if one thinks this extra macro is too much, one could revisit all uses and see if `warpSize` is constant and use `threadID() % warpSize` throughout the code. I don't see any reason for making `warpSize` non-constant anyway. 

Repository:
  rL LLVM

https://reviews.llvm.org/D14254