[Openmp-commits] [PATCH] D14254: [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.

Jonas Hahnfeld via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Nov 8 09:18:06 PST 2017


Hahnfeld added a comment.

Comments on the first set of files (I will continue to do this in chunks as I can't review code for hours en block without getting crazy)



================
Comment at: libomptarget/deviceRTLs/nvptx/src/counter_group.h:14-15
+
+#ifndef SRC_COUNTER_GROUP_H_
+#define SRC_COUNTER_GROUP_H_
+
----------------
Can we also have this prefixed with `_OMPTARGET_NVPTX` like the other header files?


================
Comment at: libomptarget/deviceRTLs/nvptx/src/counter_group.h:17-20
+#include <stdlib.h>
+#include <stdio.h>
+
+#include <cuda.h>
----------------
Needed?

This file should probably include `option.h` which defines `Counter`


================
Comment at: libomptarget/deviceRTLs/nvptx/src/critical.cu:14-15
+
+#include <stdio.h>
+#include <complex.h>
+
----------------
needed?


================
Comment at: libomptarget/deviceRTLs/nvptx/src/data_sharing.cu:28-31
+//// Team ID in the CUDA grid
+//__device__ static unsigned getTeamId() {
+//  return blockIdx.x;
+//}
----------------
Remove if not needed


================
Comment at: libomptarget/deviceRTLs/nvptx/src/data_sharing.cu:34-35
+__device__ static unsigned getMasterThreadId() {
+  unsigned Mask = DS_Max_Worker_Warp_Size - 1;
+  return (getNumThreads() - 1) & (~Mask);
+}
----------------
Suppose we have 1024 threads, why is `1023 & (~31) = 992` the master thread?


================
Comment at: libomptarget/deviceRTLs/nvptx/src/data_sharing.cu:40
+  unsigned long long Mask = __BALLOT_SYNC(0xFFFFFFFF, true);
+  unsigned long long ShNum = 32 - (getThreadId() & DS_Max_Worker_Warp_Size_Log2_Mask);
+  unsigned long long Sh = Mask << ShNum;
----------------
`DS_Max_Worker_Warp_Size` instead of its hard-coded value? Or is 32 the length of an integer? (Another reason to avoid magic numbers!)


================
Comment at: libomptarget/deviceRTLs/nvptx/src/data_sharing.cu:48-51
+// Return true if this is the first thread in the warp.
+//static bool IsWarpMasterThread() {
+//  return (getThreadId() & DS_Max_Worker_Warp_Size_Log2_Mask) == 0u;
+//}
----------------
Remove if not needed


================
Comment at: libomptarget/deviceRTLs/nvptx/src/data_sharing.cu:132
+  unsigned WID = getWarpId();
+  unsigned CurActiveThreads = __BALLOT_SYNC(0xFFFFFFFF, true);
+
----------------
Can we have this in a function with a corresponding name?


================
Comment at: libomptarget/deviceRTLs/nvptx/src/data_sharing.cu:315-318
+//EXTERN void __kmpc_samuel_print(int64_t Bla){
+//  DSPRINT(DSFLAG,"Sam print: %016llx\n",Bla);
+//
+//}
----------------
Delete


================
Comment at: libomptarget/deviceRTLs/nvptx/src/libcall.cu:17
+
+#define TICK ((double) 1.0 / 745000000.0)
+
----------------
grokos wrote:
> This is where the hard-coded definition of the GPU clock frequency has been moved. I'll try to find a viable solution to make the library find the clock frequency dynamically.
Yes, this doesn't sound like a good idea to have that hard-coded...


================
Comment at: libomptarget/deviceRTLs/nvptx/src/omp_data.cu:1
+//===------------- objects.cu - NVPTX OpenMP GPU objects --------- CUDA -*-===//
+//
----------------
omp_data.cu


================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:70
+  // The maximum number of threads in a worker warp.
+  DS_Max_Worker_Warp_Size = 32,
+  // The number of bits required to represent the maximum number of threads in a
----------------
`1 << DS_Max_Worker_Warp_Size_Log2`?


================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:73
+  // warp.
+  DS_Max_Worker_Warp_Size_Log2 = 5,
+  DS_Max_Worker_Warp_Size_Log2_Mask = (~0u >> (32-DS_Max_Worker_Warp_Size_Log2)),
----------------
`Bits` instead of `Log2`? I think that's far more common...


================
Comment at: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:74
+  DS_Max_Worker_Warp_Size_Log2 = 5,
+  DS_Max_Worker_Warp_Size_Log2_Mask = (~0u >> (32-DS_Max_Worker_Warp_Size_Log2)),
+  // The slot size that should be reserved for a working warp.
----------------
First, this value needs a comment what it is used for and maybe we can drop the `Log2` is its a mast related to the value, not the logarithm.

>From what I understand, this value has has the 5 LSBs set to 1 which should be equivalent to `DS_Max_Worker_Warp_Size - 1`? (this also avoid relying on 32 bit integers)


Repository:
  rL LLVM

https://reviews.llvm.org/D14254





More information about the Openmp-commits mailing list