[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation

Fri Jul 5 19:13:34 PDT 2019

jdoerfert marked 2 inline comments as done.
jdoerfert added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/state-queue.h:1
-//===--------- statequeue.h - NVPTX OpenMP GPU State Queue ------- CUDA -*-===//
+//===--- state-queue.h --- OpenMP target state queue ------------*- C++ -*-===//
 //
----------------
hfinkel wrote:
> ABataev wrote:
> > hfinkel wrote:
> > > jdoerfert wrote:
> > > > ABataev wrote:
> > > > > jdoerfert wrote:
> > > > > > ABataev wrote:
> > > > > > > jdoerfert wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > jdoerfert wrote:
> > > > > > > > > > ABataev wrote:
> > > > > > > > > > > I think it is better to mark it as Cuda, not C/C++, since it contains some Cuda specific constructs.
> > > > > > > > > > I though I removed all of them, if I haven't we should. The idea is that this "core logic" code should _not_ be CUDA code but something else, e.g., C++ seems natural.
> > > > > > > > > Nope, `INLINE` macro is expanded into cuda specific attributes.
> > > > > > > > It is a macro and macros are not specific to CUDA. To what it expands if this file is included from the CUDA code right now is not relevant. This is a first step and we will have to move these files into the common folder (see the RFC). The macro definition will be in the device specific parts and they can then use whatever device (language) specific extensions they want without making these part specific to that device (language) extension. That is, after all, the main motivation behind these patches.
> > > > > > > But anyway, you need to compile it in cuda or hip mode, not pure c++. I think it is better for tbe developer to understand from the very beginning that this code requires cuda or hip compiler.
> > > > > > Given that this is "a header file", there is really no telling if "it is" C/C++/Cuda/Hip/OpenCL/Sycl/... 
> > > > > > 
> > > > > > All it should require is Clang (potentially in some mode). Given the new OpenMP standard, the mode could reasonably be "-fpoenmp -fopenmp-targets=...". 
> > > > > Not necessarily.currently, it can be compiled with nvcc too. It does not produce bc file, but can be linked and works. So, clang is not necessarily a requirement here.
> > > > > What I'm saying that it is better to inform the developer that this code requires cuda or hip compiler, not pure c++ compiler.
> > > > It does not require cuda or hip.
> > > I think that it's useful to think about this in the following way: We often write code that requires, at least given some configuration of the preprocessor, various extensions to standard C++. CUDA and HIP are C++ extensions. Code using __builtin_frame_address or `__attribute__((always_inline))` or vector intrinsics is code using C++ extensions. If the code can only be compiled with CUDA extensions enabled, then by all means, using .cu is appropriate. If it can be compiled without use of C++ extensions, or with a wide variety of C++ extensions, given different configurations of the preprocessor, then just calling it a "C++" source file seems reasonable.
> > Hal, I agree, but this code cannot be compiled without cuda correctly.  You need to generate the device code for these functions and because of this they must be marked as `__device__` functions. This is cuda specific modifier that tells the compiler that we don't need the host version of this code, just the device part. But if you think that this is not important, ok, mark it as c++.
> I see "*if* compiling as CUDA, we need the functions marked with `__device__`" as very similar to saying "*if* we compile into a Windows DLL we need the functions marked with `__declspec(dllexport)`". It's fine to mark it as C++ because, aside from the ABI modifiers, it's essentially "just" C++ code.
I mentioned this already but I will do it again: There is also no other CUDA code in this file. Especially, there is *no* occurrence of `__device__` in this file. If put into a context, `INLINE` could resolve to something that includes `__device__` but it could also resolve to something without `__devicde__`.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/state-queue.h:29
-  ElementType elements[SIZE];
-  volatile ElementType *elementQueue[SIZE];
-  volatile uint32_t head;
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > I would like to keep volatile modifier at least for cuda 8. Without volatile it may produce incorrect code. I would keep it until we drop support for cuda 8.
> > > Plus, I would suggest to test it very seriously for other versions of cuda. Does it really works correctly? Ptxas may use some incorrect optimizations without volatile. Though I like the idea of removing them.
> > These members are accessed only through *atomic* accesses. Why do we would require volatile in addition?
> Without `volatile` they were not initialized properly on cuda 8. This, again, seems to me like some kind of a bug in ptxas for cuda 8. Not sure about this problem in cuda 9, it requires some additional testing.
I actually did not notice that there is apparently no initialization of these values (or did I miss it?). If there is none, than that is the problem which needs fixing, e.g., `uint32_t Head = 0`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64217/new/

https://reviews.llvm.org/D64217