[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation

Fri Jul 5 17:35:43 PDT 2019

ABataev added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/state-queue.h:1
-//===--------- statequeue.h - NVPTX OpenMP GPU State Queue ------- CUDA -*-===//
+//===--- state-queue.h --- OpenMP target state queue ------------*- C++ -*-===//
 //
----------------
jdoerfert wrote:
> ABataev wrote:
> > jdoerfert wrote:
> > > ABataev wrote:
> > > > jdoerfert wrote:
> > > > > ABataev wrote:
> > > > > > I think it is better to mark it as Cuda, not C/C++, since it contains some Cuda specific constructs.
> > > > > I though I removed all of them, if I haven't we should. The idea is that this "core logic" code should _not_ be CUDA code but something else, e.g., C++ seems natural.
> > > > Nope, `INLINE` macro is expanded into cuda specific attributes.
> > > It is a macro and macros are not specific to CUDA. To what it expands if this file is included from the CUDA code right now is not relevant. This is a first step and we will have to move these files into the common folder (see the RFC). The macro definition will be in the device specific parts and they can then use whatever device (language) specific extensions they want without making these part specific to that device (language) extension. That is, after all, the main motivation behind these patches.
> > But anyway, you need to compile it in cuda or hip mode, not pure c++. I think it is better for tbe developer to understand from the very beginning that this code requires cuda or hip compiler.
> Given that this is "a header file", there is really no telling if "it is" C/C++/Cuda/Hip/OpenCL/Sycl/... 
> 
> All it should require is Clang (potentially in some mode). Given the new OpenMP standard, the mode could reasonably be "-fpoenmp -fopenmp-targets=...". 
Not necessarily.currently, it can be compiled with nvcc too. It does not produce bc file, but can be linked and works. So, clang is not necessarily a requirement here.
What I'm saying that it is better to inform the developer that this code requires cuda or hip compiler, not pure c++ compiler.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/state-queue.h:29
-  ElementType elements[SIZE];
-  volatile ElementType *elementQueue[SIZE];
-  volatile uint32_t head;
----------------
I would like to keep volatile modifier at least for cuda 8. Without volatile it may produce incorrect code. I would keep it until we drop support for cuda 8.
Plus, I would suggest to test it very seriously for other versions of cuda. Does it really works correctly? Ptxas may use some incorrect optimizations without volatile. Though I like the idea of removing them.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64217/new/

https://reviews.llvm.org/D64217