[Openmp-commits] [PATCH] D68100: [OpenMP 5.0] declare mapper runtime implementation

George Rokos via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Jul 22 13:56:25 PDT 2020


grokos added a comment.

OK, I suspect there is a race condition involving the CUDA plugin. If I compile the test on `x86_64-pc-linux-gnu` then I always get the correct result, no matter whether we print debug output or not.

On CUDA, I tried to increase the test size from 1024 to 16M. With debug output off, I always get 16M as a result (instead of 32M) - this tells me that the CUDA kernel is launched and host code proceeds to verify the result before the kernel returns. Because the problem size is large, verification on the host always finished before the kernel returns and data is copied back from the device.

With debug output on, I get inconsistent results from execution to execution ranging from 16M to 32M, meaning that the host is busier printing output messages so verification can start later while data is being copied back.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68100/new/

https://reviews.llvm.org/D68100





More information about the Openmp-commits mailing list