[Openmp-commits] [PATCH] D115279: [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version

Wed Dec 8 08:09:33 PST 2021

ye-luo added a comment.

> I have not tried with memory that has already been locked, but I will. In any case, with this patch, if locking fails, then we revert to malloc+lock+unlock+free. This is not ideal, and this case is added for other reasons, but it should be supporting the case.

It will be better skipping lock/free if the memory is known to HSA already. I think IBM XL skips its pinned memory optimization when it sees the pointer pinned already for CUDA.
I have code managing lock/unlock via HIP. Even if a lock call from the plugin succeeds, and then a plugin unlock call succeeds, the user unlock call fails.
For this reason, check memory info is required. 
fallback to "malloc+lock+unlock+free" is the worst option.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115279/new/

https://reviews.llvm.org/D115279