[Openmp-commits] [PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.

Tue Jan 23 11:29:02 PST 2018

grokos added inline comments.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/interface.h:360-361
+// Support for reducing conditional lastprivate variables
+EXTERN void __kmpc_reduce_conditional_lastprivate(kmp_Indent *loc,
+  int32_t global_tid, int32_t varNum, void *array);
+
----------------
Hahnfeld wrote:
> Will the host runtime use the same entry point? Because it's not yet present in `libomp`
Yes, I will add the same function to the host runtime. It's not there yet, but it's on my list.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:74
+  DS_Max_Worker_Warp_Size_Log2 = 5,
+  DS_Max_Worker_Warp_Size_Log2_Mask = (~0u >> (32-DS_Max_Worker_Warp_Size_Log2)),
+  // The slot size that should be reserved for a working warp.
----------------
Hahnfeld wrote:
> grokos wrote:
> > sfantao wrote:
> > > Hahnfeld wrote:
> > > > sfantao wrote:
> > > > > Hahnfeld wrote:
> > > > > > sfantao wrote:
> > > > > > > grokos wrote:
> > > > > > > > Hahnfeld wrote:
> > > > > > > > > First, this value needs a comment what it is used for and maybe we can drop the `Log2` is its a mast related to the value, not the logarithm.
> > > > > > > > > 
> > > > > > > > > From what I understand, this value has has the 5 LSBs set to 1 which should be equivalent to `DS_Max_Worker_Warp_Size - 1`? (this also avoid relying on 32 bit integers)
> > > > > > > > I've changed its definition to the proposed `DS_Max_Worker_Warp_Size - 1` which is much clearer. This mask is used to get (threadID mod Warp_Size). I've added a comment.
> > > > > > > Using `DS_Max_Worker_Warp_Size - 1` its fine here. This is mostly used to calculate the ID of a thread in its warp, i.e. `threadID() % warpSize` which is equivalent to ` threadID() & DS_Max_Worker_Warp_Size`. The compiler should be smart enough to convert one to the other if `warpSize` is a constant. 
> > > > > > > 
> > > > > > > I'd rather prefer do make little optimisations explicit in the code, but if one thinks this extra macro is too much, one could revisit all uses and see if `warpSize` is constant and use `threadID() % warpSize` throughout the code. I don't see any reason for making `warpSize` non-constant anyway. 
> > > > > > Do you mean `threadID() % warpSize` is equivalent to `threadID() & (DS_Max_Worker_Warp_Size - 1)`? Because otherwise I disagree.
> > > > > > 
> > > > > > I agree that the compiler will convert the usage. IMO `threadID() % warpSize` is //a lot more// obvious than the bit operations the compiler might use.
> > > > > > Do you mean threadID() % warpSize is equivalent to threadID() & (DS_Max_Worker_Warp_Size - 1) ?
> > > > > 
> > > > > Yes, that is what I mean. Of course, this assumes warp size is a power of 2.
> > > > > 
> > > > > > I agree that the compiler will convert the usage. IMO threadID() % warpSize is a lot more obvious than the bit operations the compiler might use.
> > > > > 
> > > > > I don't have a strong opinion here. I agree that `threadID() % warpSize` is easier to read and I think its fine we go with that. It's just that if we intend the implementation to be `threadID() & (DS_Max_Worker_Warp_Size - 1) ` I don't see why we shouldn't write that in the code.  
> > > > 1. IMO source code is meant to be understood by humans who are normally much more familiar with mathematical operations (modulo) than bit operations - even programmers.
> > > > 2. Because it assumes a power of 2. If the compiler can prove that - great, do the optimization.
> > > > (If we happen to find performance implications, we can still change it and refactor it to a function and add the documentation why this is necessary, what it assumes and how it works)
> > > Fine by me.
> > So we'll go with the modulo solution.
> I don't see this implemented?
I removed `DS_Max_Worker_Warp_Size_Bit_Mask` completely as it is not needed anymore (we now use `x % WARPSIZE`).

================
Comment at: libomptarget/deviceRTLs/nvptx/src/option.h:20-43
+////////////////////////////////////////////////////////////////////////////////
+// following two defs must match absolute limit hardwired in the host RTL
+#define TEAMS_ABSOLUTE_LIMIT                                                   \
+  512 /* omptx limit (must match teamsAbsoluteLimit) */
+#define THREAD_ABSOLUTE_LIMIT                                                  \
+  1024 /* omptx limit (must match threadAbsoluteLimit) */
+
----------------
Hahnfeld wrote:
> Only `MAX_THREADS_PER_TEAM` and `WARPSIZE` are used, the rest can go away.
`THREAD_ABSOLUTE_LIMIT` is also used (in the definition of `MAX_THREADS_PER_TEAM`).

Repository:
  rOMP OpenMP

https://reviews.llvm.org/D14254