[Openmp-commits] [PATCH] D115279: [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version

Carlo Bertolli via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Dec 8 08:40:26 PST 2021

carlo.bertolli added a comment.

In D115279#3179687 <https://reviews.llvm.org/D115279#3179687>, @ye-luo wrote:

>> I have not tried with memory that has already been locked, but I will. In any case, with this patch, if locking fails, then we revert to malloc+lock+unlock+free. This is not ideal, and this case is added for other reasons, but it should be supporting the case.
> It will be better skipping lock/free if the memory is known to HSA already. I think IBM XL skips its pinned memory optimization when it sees the pointer pinned already for CUDA.
> I have code managing lock/unlock via HIP. Even if a lock call from the plugin succeeds, and then a plugin unlock call succeeds, the user unlock call fails.
> For this reason, check memory info is required. 
> fallback to "malloc+lock+unlock+free" is the worst option.

The following test works for me:

  #include <hsa/hsa_ext_amd.h>
  #define N 100293
  int main() {
    int n = N;
    int *a = new int[n];
    int *a_locked = nullptr;
    hsa_status_t herr = hsa_amd_memory_lock(a, n*sizeof(int), nullptr, 0, (void **)&a_locked);
    if (herr != HSA_STATUS_SUCCESS) {
      printf("Locking failed\n");
      return 1;
    #pragma omp target parallel for map(tofrom:a_locked[:n])
    for(int i = 0; i < n; i++)
      a_locked[i] = i;
    herr = hsa_amd_memory_unlock(a);
    if (herr != HSA_STATUS_SUCCESS) {
      printf("Unlocking failed\n");
      return 1;
    int err = 0;
    for(int i = 0; i < n; i++)
      if (a[i] != i) {
        printf("Err at %d, got %d expected %d\n", i, a[i], i);
        if (err >10) break;
    delete[] a;
    return err;

@ye-luo can you please share a minimal test that is failing for you? Thanks!

  rG LLVM Github Monorepo



More information about the Openmp-commits mailing list