[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation

Alexey Bataev via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Sat Jul 6 05:37:57 PDT 2019

ABataev added inline comments.

Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/state-queue.h:29
-  ElementType elements[SIZE];
-  volatile ElementType *elementQueue[SIZE];
-  volatile uint32_t head;
jdoerfert wrote:
> ABataev wrote:
> > jdoerfert wrote:
> > > ABataev wrote:
> > > > jdoerfert wrote:
> > > > > ABataev wrote:
> > > > > > I would like to keep volatile modifier at least for cuda 8. Without volatile it may produce incorrect code. I would keep it until we drop support for cuda 8.
> > > > > > Plus, I would suggest to test it very seriously for other versions of cuda. Does it really works correctly? Ptxas may use some incorrect optimizations without volatile. Though I like the idea of removing them.
> > > > > These members are accessed only through *atomic* accesses. Why do we would require volatile in addition?
> > > > Without `volatile` they were not initialized properly on cuda 8. This, again, seems to me like some kind of a bug in ptxas for cuda 8. Not sure about this problem in cuda 9, it requires some additional testing.
> > > I actually did not notice that there is apparently no initialization of these values (or did I miss it?). If there is none, than that is the problem which needs fixing, e.g., `uint32_t Head = 0`.
> > They are initialized, but without volatile qualifier they are not initialized in the compiled code.
> I checked and I could not find the initialization of these variables anywhere. We should make sure the problem is not the UB of not initializing them before we require `volatile` here.
I don't think there is an UB, we checked it some time ago. The memory is zeroinitialized, but without volatile modifier it works incorrectly at least for cuda 8.
I'm trying to reduce the use of this structure with my patches, because, most probably, it relies on some undocumented features.
Plus, i don't think it is effective. It implements queue to manage the resources, but I think we don't need a queue here, just a hash table. 

  rG LLVM Github Monorepo



More information about the Openmp-commits mailing list