[Openmp-commits] [openmp] [OpenMP][libomptarget] Enable automatic unified shared memory executi… (PR #75999)

Fri Dec 22 07:37:53 PST 2023

carlobertolli wrote:

> I think this addressed all but one of my main concerns. Last thing is the testing. We should add tests for the different interactions (e.g., verify non-HSA_XNACK will copy, so will omp requires). Further, what happens if this is executed on a non-APU system? Won't it fail?

I will add checks in the current test for when auto zero-copy is not triggered and we need to see a copy, and a test for unified_shared_memory, where we want to see the same log as in zero-copy.

For non APU systems, this is my thinking:
- If the GPU supports unified memory (think amdgpu's with xnack support, such as MI200, or nvidia GPUs with unified meomory support), then we want to offer auto zero-copy to the users, but in this case it would be hidden behind an environment variable (e.g., OMPX_AUTO_ZERO_COPY or OMPX_APU_MAPS). The reasoning is: on discrete GPU systems, the best default is to make device memory allocations and h2d/d2h copies. The user needs to do something special (set the env variable, turn on xnack) to trigger auto zero-copy, because we expect it to result, on average, in worse performance.
- If the GPU does not support unified memory, then auto zero-copy won't be triggered automatically. If the user sets OMPX_AUTO_ZERO_COPY then I would imagine it would fail at the first CPU memory access from a GPU.

I will work on better diagnostic of all cases in a subsequent patch. Example: build application with unified_shared_memory and run it on a system with unified memory disabled should result in a warning that things might not work and a suggestion to turn on unified memory support. This applies to amdgpus', not sure about nvidia's.

https://github.com/llvm/llvm-project/pull/75999