[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Jul 5 18:36:54 PDT 2019
ABataev added inline comments.
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/state-queue.h:29
- ElementType elements[SIZE];
- volatile ElementType *elementQueue[SIZE];
- volatile uint32_t head;
> ABataev wrote:
> > I would like to keep volatile modifier at least for cuda 8. Without volatile it may produce incorrect code. I would keep it until we drop support for cuda 8.
> > Plus, I would suggest to test it very seriously for other versions of cuda. Does it really works correctly? Ptxas may use some incorrect optimizations without volatile. Though I like the idea of removing them.
> These members are accessed only through *atomic* accesses. Why do we would require volatile in addition?
Without `volatile` they were not initialized properly on cuda 8. This, again, seems to me like some kind of a bug in ptxas for cuda 8. Not sure about this problem in cuda 9, it requires some additional testing.
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the Openmp-commits