[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation

Tue Jul 9 14:02:50 PDT 2019

jdoerfert marked an inline comment as done.
jdoerfert added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/state-queue.h:29
-  ElementType elements[SIZE];
-  volatile ElementType *elementQueue[SIZE];
-  volatile uint32_t head;
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > jdoerfert wrote:
> > > > ABataev wrote:
> > > > > jdoerfert wrote:
> > > > > > ABataev wrote:
> > > > > > > jdoerfert wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > I would like to keep volatile modifier at least for cuda 8. Without volatile it may produce incorrect code. I would keep it until we drop support for cuda 8.
> > > > > > > > > Plus, I would suggest to test it very seriously for other versions of cuda. Does it really works correctly? Ptxas may use some incorrect optimizations without volatile. Though I like the idea of removing them.
> > > > > > > > These members are accessed only through *atomic* accesses. Why do we would require volatile in addition?
> > > > > > > Without `volatile` they were not initialized properly on cuda 8. This, again, seems to me like some kind of a bug in ptxas for cuda 8. Not sure about this problem in cuda 9, it requires some additional testing.
> > > > > > I actually did not notice that there is apparently no initialization of these values (or did I miss it?). If there is none, than that is the problem which needs fixing, e.g., `uint32_t Head = 0`.
> > > > > They are initialized, but without volatile qualifier they are not initialized in the compiled code.
> > > > I checked and I could not find the initialization of these variables anywhere. We should make sure the problem is not the UB of not initializing them before we require `volatile` here.
> > > I don't think there is an UB, we checked it some time ago. The memory is zeroinitialized, but without volatile modifier it works incorrectly at least for cuda 8.
> > > I'm trying to reduce the use of this structure with my patches, because, most probably, it relies on some undocumented features.
> > > Plus, i don't think it is effective. It implements queue to manage the resources, but I think we don't need a queue here, just a hash table. 
> > > I don't think there is an UB, we checked it some time ago. The memory is zeroinitialized, but without volatile modifier it works incorrectly at least for cuda 8.
> > 
> > I'm confused. I was expecting CUDA to have C++ semantics which requires class members to be initialized explicitly, or am I missing something here?
> > 
> > 
> > > Plus, i don't think it is effective. It implements queue to manage the resources, but I think we don't need a queue here, just a hash table.
> > 
> > That is a question for a follow up patch. I did not try to change the semantics in these ones, just the coding style and organization of the code.
> > I'm confused. I was expecting CUDA to have C++ semantics which requires class members to be initialized explicitly, or am I missing something here?
> 
> Read this https://docs.nvidia.com/cuda/archive/10.1/cuda-c-programming-guide/index.html#device-memory-specifiers
I fail to find the part that says they are initialized to zero. Could you specify where I can find that?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64217/new/

https://reviews.llvm.org/D64217