[Openmp-commits] [PATCH] D68100: [OpenMP 5.0] declare mapper runtime implementation
George Rokos via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Jul 22 13:56:25 PDT 2020
grokos added a comment.
OK, I suspect there is a race condition involving the CUDA plugin. If I compile the test on `x86_64-pc-linux-gnu` then I always get the correct result, no matter whether we print debug output or not.
On CUDA, I tried to increase the test size from 1024 to 16M. With debug output off, I always get 16M as a result (instead of 32M) - this tells me that the CUDA kernel is launched and host code proceeds to verify the result before the kernel returns. Because the problem size is large, verification on the host always finished before the kernel returns and data is copied back from the device.
With debug output on, I get inconsistent results from execution to execution ranging from 16M to 32M, meaning that the host is busier printing output messages so verification can start later while data is being copied back.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D68100/new/
https://reviews.llvm.org/D68100
More information about the Openmp-commits
mailing list