[PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.

Fri Jan 26 09:54:33 PST 2018

grokos marked 9 inline comments as done.
grokos added a comment.

I created a child revision with the final changes.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu:45-50
+INLINE unsigned n_sm() {
+  unsigned n_sm;
+  asm("mov.u32 %0, %%nsmid;" : "=r"(n_sm));
+  return n_sm;
+}
+
----------------
Hahnfeld wrote:
> Not used?
Probably a leftover, removed.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:72
+  // warp.
+  DS_Max_Worker_Warp_Size_Bits = 5,
+  // The slot size that should be reserved for a working warp.
----------------
Hahnfeld wrote:
> I think all other usages of `>> DS_Max_Worker_Warp_Size_Bits` can be replaced with `% WARPSIZE` or am I missing something in here?
You mean `div WARPSIZE`, not `mod`... I assume you prefer the division over shifting... I'll change those instances in the new diff. `WARPSIZE` is a compile-time constant and a power of 2, so the compiler should be able optimize the division away.

================
Comment at: libomptarget/deviceRTLs/nvptx/src/reduction.cu:238
+}
+#endif // __CUDA_ARCH__ >= 700
+
----------------
Hahnfeld wrote:
> I think this `endif` needs to go before the last closing `}` to also finish the function in the `if` case.
Good catch, I forgot to move the `#endif` as well....

Repository:
  rOMP OpenMP

https://reviews.llvm.org/D14254