[Openmp-commits] [PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.

Wed Nov 29 06:52:18 PST 2017

Hahnfeld added inline comments.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/data_sharing.cu:40
+  unsigned long long Mask = __BALLOT_SYNC(0xFFFFFFFF, true);
+  unsigned long long ShNum = 32 - (getThreadId() & DS_Max_Worker_Warp_Size_Log2_Mask);
+  unsigned long long Sh = Mask << ShNum;
----------------
sfantao wrote:
> Hahnfeld wrote:
> > `DS_Max_Worker_Warp_Size` instead of its hard-coded value? Or is 32 the length of an integer? (Another reason to avoid magic numbers!)
> Here 32 is both the GPU warp size and the size of the integer __popc() takes, I agree we should get an enumerator to indicate that. What is tricky here, is that the function assumes that the warp size is the same as the 32-bit integer. This assumption is made in CUDA so, I think is fine we also do it here. 
> 
> Actually, here I think the name of the function `getWarpMasterActiveThreadId ` is misleading. This function returns the number of active threads in the current warp whose ID in the warp is lower than the current one. If there are no active threads with lower ID,  then the current thread is the warp master. In other words, this function returns zero, if the current thread is the warp master.
> 
> This function seems to only be used in `IsWarpMasterActiveThread`. So I suggest remove the current implementation of `IsWarpMasterActiveThread` and rename this function to `IsWarpMasterActiveThread` and have it return:
> ```
> return __popc(Sh) == 0u;
> ```
I'm not sure I get your arguments: Let's start with the function `__popc()` which returns the number of bits set to 1 in the argument. This just happens to be a 32bit integer which has nothing to do with the warp size as far as I can see.

If this function really does what you described it deserves documentation or the code needs to be rewritten so that others can understand. This might be related to what `__BALLOT_SYNC(0xFFFFFFFF, true)` as I've already criticized below.

At the end, I don't understand what your proposal is: The function name you mention is still the same and the return statement basically inverts the value returned of `__popc()`

================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:74
+  DS_Max_Worker_Warp_Size_Log2 = 5,
+  DS_Max_Worker_Warp_Size_Log2_Mask = (~0u >> (32-DS_Max_Worker_Warp_Size_Log2)),
+  // The slot size that should be reserved for a working warp.
----------------
sfantao wrote:
> grokos wrote:
> > Hahnfeld wrote:
> > > First, this value needs a comment what it is used for and maybe we can drop the `Log2` is its a mast related to the value, not the logarithm.
> > > 
> > > From what I understand, this value has has the 5 LSBs set to 1 which should be equivalent to `DS_Max_Worker_Warp_Size - 1`? (this also avoid relying on 32 bit integers)
> > I've changed its definition to the proposed `DS_Max_Worker_Warp_Size - 1` which is much clearer. This mask is used to get (threadID mod Warp_Size). I've added a comment.
> Using `DS_Max_Worker_Warp_Size - 1` its fine here. This is mostly used to calculate the ID of a thread in its warp, i.e. `threadID() % warpSize` which is equivalent to ` threadID() & DS_Max_Worker_Warp_Size`. The compiler should be smart enough to convert one to the other if `warpSize` is a constant. 
> 
> I'd rather prefer do make little optimisations explicit in the code, but if one thinks this extra macro is too much, one could revisit all uses and see if `warpSize` is constant and use `threadID() % warpSize` throughout the code. I don't see any reason for making `warpSize` non-constant anyway. 
Do you mean `threadID() % warpSize` is equivalent to `threadID() & (DS_Max_Worker_Warp_Size - 1)`? Because otherwise I disagree.

I agree that the compiler will convert the usage. IMO `threadID() % warpSize` is //a lot more// obvious than the bit operations the compiler might use.

Repository:
  rL LLVM

https://reviews.llvm.org/D14254