[Openmp-commits] [openmp] [OpenMP][OMPT] Add OMPT callback for device data exchange 'Device-to-Device' (PR #81991)

Jan André Reuter via Openmp-commits openmp-commits at lists.llvm.org
Sat Feb 17 02:57:10 PST 2024


Thyre wrote:

Can confirm that this pull request also works for NVIDIA GPUs, though I weren't able to test it with multiple accelerators due to build issues on our HPC machines. 

```console
$ clang --version
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 5fd8e50feff94dac7e741b07c956622b7c25bc6a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/jreuter/Projects/Compilers/llvm-project/_build/_install/bin
$ clang -fopenmp --offload-arch=native reproducer.c
$ ./a.out
Callback Init: device_num=0 type=sm_75 device=0x55d75ee03a40 lookup=0x7fb3518ebb50 doc=(nil)
Allocating memory on device
  Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000001) src=(nil) src_device_num=1 dest=(nil) dest_device_num=0 bytes=4 code=0x55d75ce76853
  Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000001) src=(nil) src_device_num=1 dest=0x7fb325a00000 dest_device_num=0 bytes=4 code=0x55d75ce76853
  Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000002) src=(nil) src_device_num=1 dest=(nil) dest_device_num=0 bytes=4 code=0x55d75ce76864
  Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000002) src=(nil) src_device_num=1 dest=0x7fb325a00200 dest_device_num=0 bytes=4 code=0x55d75ce76864
Testing host to device
  Callback DataOp EMI: endpoint=1 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000003) src=0x55d75f66b200 src_device_num=1 dest=0x7fb325a00000 dest_device_num=0 bytes=4 code=0x55d75ce768ca
  Callback DataOp EMI: endpoint=2 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000003) src=0x55d75f66b200 src_device_num=1 dest=0x7fb325a00000 dest_device_num=0 bytes=4 code=0x55d75ce768ca
Testing device to device
  Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000004) src=0x7fb325a00000 src_device_num=0 dest=0x7fb325a00200 dest_device_num=0 bytes=4 code=0x55d75ce768fc
  Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000004) src=0x7fb325a00000 src_device_num=0 dest=0x7fb325a00200 dest_device_num=0 bytes=4 code=0x55d75ce768fc
Testing device to host
  Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000005) src=0x7fb325a00200 src_device_num=0 dest=0x55d75f66b200 dest_device_num=1 bytes=4 code=0x55d75ce76942
  Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000005) src=0x7fb325a00200 src_device_num=0 dest=0x55d75f66b200 dest_device_num=1 bytes=4 code=0x55d75ce76942
Checking correctness
Freeing memory on device
  Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000006) src=0x7fb325a00000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55d75ce769a4
  Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000006) src=0x7fb325a00000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55d75ce769a4
  Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000007) src=0x7fb325a00200 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55d75ce769b0
  Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3516787d0 (0x0) host_op_id=0x7fb3516787c8 (0x8000000000000007) src=0x7fb325a00200 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55d75ce769b0
Callback Fini: device_num=0
```

x86_64 still reports two transfers even though `LIBOMPTARGET_DEBUG` shows `omptarget --> copy from device to device`.

```console
$ clang --version
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 5fd8e50feff94dac7e741b07c956622b7c25bc6a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/jreuter/Projects/Compilers/llvm-project/_build/_install/bin
$ clang -fopenmp -fopenmp-targets=x86_64 reproducer.c
$ ./a.out
Callback Init: device_num=0 type=generic-64bit device=0x5644e0820950 lookup=0x7fb3d245bb50 doc=(nil)
Callback Init: device_num=1 type=generic-64bit device=0x5644e0821380 lookup=0x7fb3d245bb50 doc=(nil)
Callback Init: device_num=2 type=generic-64bit device=0x5644e08219a0 lookup=0x7fb3d245bb50 doc=(nil)
Callback Init: device_num=3 type=generic-64bit device=0x5644e08221d0 lookup=0x7fb3d245bb50 doc=(nil)
Allocating memory on device
  Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000001) src=(nil) src_device_num=4 dest=(nil) dest_device_num=0 bytes=4 code=0x5644dee0c853
  Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000001) src=(nil) src_device_num=4 dest=0x5644e0820790 dest_device_num=0 bytes=4 code=0x5644dee0c853
  Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000002) src=(nil) src_device_num=4 dest=(nil) dest_device_num=1 bytes=4 code=0x5644dee0c864
  Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000002) src=(nil) src_device_num=4 dest=0x5644e07fafb0 dest_device_num=1 bytes=4 code=0x5644dee0c864
Testing host to device
  Callback DataOp EMI: endpoint=1 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000003) src=0x5644e081b990 src_device_num=4 dest=0x5644e0820790 dest_device_num=0 bytes=4 code=0x5644dee0c8ca
  Callback DataOp EMI: endpoint=2 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000003) src=0x5644e081b990 src_device_num=4 dest=0x5644e0820790 dest_device_num=0 bytes=4 code=0x5644dee0c8ca
Testing device to device
  Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000004) src=0x5644e0820790 src_device_num=0 dest=0x5644e0820880 dest_device_num=4 bytes=4 code=0x5644dee0c8fc
  Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000004) src=0x5644e0820790 src_device_num=0 dest=0x5644e0820880 dest_device_num=4 bytes=4 code=0x5644dee0c8fc
  Callback DataOp EMI: endpoint=1 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000005) src=0x5644e0820880 src_device_num=4 dest=0x5644e07fafb0 dest_device_num=1 bytes=4 code=0x5644dee0c8fc
  Callback DataOp EMI: endpoint=2 optype=2 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000005) src=0x5644e0820880 src_device_num=4 dest=0x5644e07fafb0 dest_device_num=1 bytes=4 code=0x5644dee0c8fc
Testing device to host
  Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000006) src=0x5644e07fafb0 src_device_num=1 dest=0x5644e081b990 dest_device_num=4 bytes=4 code=0x5644dee0c942
  Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000006) src=0x5644e07fafb0 src_device_num=1 dest=0x5644e081b990 dest_device_num=4 bytes=4 code=0x5644dee0c942
Checking correctness
Freeing memory on device
  Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000007) src=0x5644e0820790 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x5644dee0c9a4
  Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000007) src=0x5644e0820790 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x5644dee0c9a4
  Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000008) src=0x5644e07fafb0 src_device_num=1 dest=(nil) dest_device_num=-1 bytes=0 code=0x5644dee0c9b0
  Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7fb3d22367d0 (0x0) host_op_id=0x7fb3d22367c8 (0x8000000000000008) src=0x5644e07fafb0 src_device_num=1 dest=(nil) dest_device_num=-1 bytes=0 code=0x5644dee0c9b0
Callback Fini: device_num=0
Callback Fini: device_num=1
Callback Fini: device_num=2
Callback Fini: device_num=3
```

<details>
<summary>Click to expand output with LIBOMPTARGET_DEBUG</summary>

```
omptarget --> Init offload library!
OMPT --> Entering connectLibrary (libomp)
OMPT --> OMPT: Trying to load library libomp.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomp_connect
OMPT --> OMPT: Library connection handle = 0x7efc564dca90
omptarget --> Call to omp_get_num_devices returning 0
OMPT --> Executing initializeLibrary (libomp)
OMPT --> initializeLibrary (libomp) bound lookupCallbackByCode=0x7efc564dd710
OMPT --> initializeLibrary (libomp) bound ompt_get_task_data_fn=0x7efc564de020
OMPT --> initializeLibrary (libomp) bound ompt_get_target_task_data_fn=0x7efc564de060
OMPT --> Exiting connectLibrary (libomp)
omptarget --> Loading RTLs...
omptarget --> Attempting to load library 'libomptarget.rtl.x86_64.so'...
omptarget --> Successfully loaded library 'libomptarget.rtl.x86_64.so'!
OMPT --> OMPT: Entering connectLibrary (libomptarget)
OMPT --> OMPT: Trying to load library libomptarget.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomptarget_connect
OMPT --> OMPT: Library connection handle = 0x7efc5640cac0
OMPT --> Enter ompt_libomptarget_connect
OMPT --> OMPT: Executing initializeLibrary (libomptarget)
OMPT --> OMPT: initializeLibrary (libomptarget) bound lookupCallbackByCode=0x7efc564dd710
OMPT --> Leave ompt_libomptarget_connect
OMPT --> OMPT: Exiting connectLibrary (libomptarget)
omptarget --> Registered 'libomptarget.rtl.x86_64.so' with 4 plugin visible devices!
omptarget --> Attempting to load library 'libomptarget.rtl.cuda.so'...
omptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
TARGET CUDA RTL --> Implementing cuInit with dlsym(cuInit) -> 0x7efc4e8c1ec0
TARGET CUDA RTL --> Implementing cuCtxGetDevice with dlsym(cuCtxGetDevice) -> 0x7efc4e8c9b50
TARGET CUDA RTL --> Implementing cuDeviceGet with dlsym(cuDeviceGet) -> 0x7efc4e8c1f00
TARGET CUDA RTL --> Implementing cuDeviceGetAttribute with dlsym(cuDeviceGetAttribute) -> 0x7efc4e8c2000
TARGET CUDA RTL --> Implementing cuDeviceGetCount with dlsym(cuDeviceGetCount) -> 0x7efc4e8c1f20
TARGET CUDA RTL --> Implementing cuFuncGetAttribute with dlsym(cuFuncGetAttribute) -> 0x7efc4e8f6f90
TARGET CUDA RTL --> Implementing cuDeviceGetName with dlsym(cuDeviceGetName) -> 0x7efc4e8c1f40
TARGET CUDA RTL --> Implementing cuDeviceTotalMem with dlsym(cuDeviceTotalMem) -> 0x7efc4e918b80
TARGET CUDA RTL --> Implementing cuDriverGetVersion with dlsym(cuDriverGetVersion) -> 0x7efc4e8c1ee0
TARGET CUDA RTL --> Implementing cuGetErrorString with dlsym(cuGetErrorString) -> 0x7efc4e8c1e80
TARGET CUDA RTL --> Implementing cuLaunchKernel with dlsym(cuLaunchKernel) -> 0x7efc4e92f100
TARGET CUDA RTL --> Implementing cuMemAlloc with dlsym(cuMemAlloc_v2) -> 0x7efc4e8d4f80
TARGET CUDA RTL --> Implementing cuMemAllocHost with dlsym(cuMemAllocHost) -> 0x7efc4e918c80
TARGET CUDA RTL --> Implementing cuMemAllocManaged with dlsym(cuMemAllocManaged) -> 0x7efc4e8d50a0
TARGET CUDA RTL --> Implementing cuMemAllocAsync with dlsym(cuMemAllocAsync) -> 0x7efc4e93a9c0
TARGET CUDA RTL --> Implementing cuMemcpyDtoDAsync with dlsym(cuMemcpyDtoDAsync_v2) -> 0x7efc4e9241c0
TARGET CUDA RTL --> Implementing cuMemcpyDtoH with dlsym(cuMemcpyDtoH_v2) -> 0x7efc4e924000
TARGET CUDA RTL --> Implementing cuMemcpyDtoHAsync with dlsym(cuMemcpyDtoHAsync_v2) -> 0x7efc4e9241a0
TARGET CUDA RTL --> Implementing cuMemcpyHtoD with dlsym(cuMemcpyHtoD_v2) -> 0x7efc4e923fe0
TARGET CUDA RTL --> Implementing cuMemcpyHtoDAsync with dlsym(cuMemcpyHtoDAsync_v2) -> 0x7efc4e924180
TARGET CUDA RTL --> Implementing cuMemFree with dlsym(cuMemFree_v2) -> 0x7efc4e8d4fc0
TARGET CUDA RTL --> Implementing cuMemFreeHost with dlsym(cuMemFreeHost) -> 0x7efc4e8d5020
TARGET CUDA RTL --> Implementing cuMemFreeAsync with dlsym(cuMemFreeAsync) -> 0x7efc4e93a9a0
TARGET CUDA RTL --> Implementing cuModuleGetFunction with dlsym(cuModuleGetFunction) -> 0x7efc4e8c9e30
TARGET CUDA RTL --> Implementing cuModuleGetGlobal with dlsym(cuModuleGetGlobal_v2) -> 0x7efc4e8c9e50
TARGET CUDA RTL --> Implementing cuModuleUnload with dlsym(cuModuleUnload) -> 0x7efc4e8c9df0
TARGET CUDA RTL --> Implementing cuStreamCreate with dlsym(cuStreamCreate) -> 0x7efc4e8ebc60
TARGET CUDA RTL --> Implementing cuStreamDestroy with dlsym(cuStreamDestroy_v2) -> 0x7efc4e8ebe80
TARGET CUDA RTL --> Implementing cuStreamSynchronize with dlsym(cuStreamSynchronize) -> 0x7efc4e92f0a0
TARGET CUDA RTL --> Implementing cuStreamQuery with dlsym(cuStreamQuery) -> 0x7efc4e924560
TARGET CUDA RTL --> Implementing cuCtxSetCurrent with dlsym(cuCtxSetCurrent) -> 0x7efc4e8c9b10
TARGET CUDA RTL --> Implementing cuDevicePrimaryCtxRelease with dlsym(cuDevicePrimaryCtxRelease_v2) -> 0x7efc4e8c99f0
TARGET CUDA RTL --> Implementing cuDevicePrimaryCtxGetState with dlsym(cuDevicePrimaryCtxGetState) -> 0x7efc4e8c9a30
TARGET CUDA RTL --> Implementing cuDevicePrimaryCtxSetFlags with dlsym(cuDevicePrimaryCtxSetFlags_v2) -> 0x7efc4e8c9a10
TARGET CUDA RTL --> Implementing cuDevicePrimaryCtxRetain with dlsym(cuDevicePrimaryCtxRetain) -> 0x7efc4e8c99d0
TARGET CUDA RTL --> Implementing cuModuleLoadDataEx with dlsym(cuModuleLoadDataEx) -> 0x7efc4e8c9db0
TARGET CUDA RTL --> Implementing cuDeviceCanAccessPeer with dlsym(cuDeviceCanAccessPeer) -> 0x7efc4e90d8c0
TARGET CUDA RTL --> Implementing cuCtxEnablePeerAccess with dlsym(cuCtxEnablePeerAccess) -> 0x7efc4e90d8e0
TARGET CUDA RTL --> Implementing cuMemcpyPeerAsync with dlsym(cuMemcpyPeerAsync) -> 0x7efc4e924350
TARGET CUDA RTL --> Implementing cuCtxGetLimit with dlsym(cuCtxGetLimit) -> 0x7efc4e8c9c10
TARGET CUDA RTL --> Implementing cuCtxSetLimit with dlsym(cuCtxSetLimit) -> 0x7efc4e8c9bf0
TARGET CUDA RTL --> Implementing cuEventCreate with dlsym(cuEventCreate) -> 0x7efc4e8ebf00
TARGET CUDA RTL --> Implementing cuEventRecord with dlsym(cuEventRecord) -> 0x7efc4e92f0c0
TARGET CUDA RTL --> Implementing cuStreamWaitEvent with dlsym(cuStreamWaitEvent) -> 0x7efc4e924500
TARGET CUDA RTL --> Implementing cuEventSynchronize with dlsym(cuEventSynchronize) -> 0x7efc4e8ebf80
TARGET CUDA RTL --> Implementing cuEventDestroy with dlsym(cuEventDestroy) -> 0x7efc4e923f60
OMPT --> OMPT: Entering connectLibrary (libomptarget)
OMPT --> OMPT: Trying to load library libomptarget.so
OMPT --> OMPT: Trying to get address of connection routine ompt_libomptarget_connect
OMPT --> OMPT: Library connection handle = 0x7efc5640cac0
OMPT --> Enter ompt_libomptarget_connect
OMPT --> OMPT: Executing initializeLibrary (libomptarget)
OMPT --> OMPT: initializeLibrary (libomptarget) bound lookupCallbackByCode=0x7efc564dd710
OMPT --> Leave ompt_libomptarget_connect
OMPT --> OMPT: Exiting connectLibrary (libomptarget)
omptarget --> Registered 'libomptarget.rtl.cuda.so' with 1 plugin visible devices!
omptarget --> Attempting to load library 'libomptarget.rtl.amdgpu.so'...
omptarget --> Successfully loaded library 'libomptarget.rtl.amdgpu.so'!
TARGET AMDGPU RTL --> Unable to load library 'libhsa-runtime64.so': libhsa-runtime64.so: cannot open shared object file: No such file or directory!
TARGET AMDGPU RTL --> Failed to initialize AMDGPU's HSA library
omptarget --> No devices supported in this RTL
omptarget --> RTLs loaded!
omptarget --> Image 0x000055c1f4f146e0 is compatible with RTL libomptarget.rtl.x86_64.so!
PluginInterface --> OMPT: class bound ompt_callback_device_initialize=0x55c1f4f133d0
PluginInterface --> OMPT: class bound ompt_callback_device_finalize=0x55c1f4f13420
PluginInterface --> OMPT: class bound ompt_callback_device_load=0x55c1f4f13450
PluginInterface --> OMPT: class bound ompt_callback_device_unload=(nil)
Callback Init: device_num=0 type=generic-64bit device=0x55c1f5997a30 lookup=0x7efc564dcb50 doc=(nil)
PluginInterface --> OMPT: class bound ompt_callback_device_initialize=0x55c1f4f133d0
PluginInterface --> OMPT: class bound ompt_callback_device_finalize=0x55c1f4f13420
PluginInterface --> OMPT: class bound ompt_callback_device_load=0x55c1f4f13450
PluginInterface --> OMPT: class bound ompt_callback_device_unload=(nil)
Callback Init: device_num=1 type=generic-64bit device=0x55c1f5998460 lookup=0x7efc564dcb50 doc=(nil)
PluginInterface --> OMPT: class bound ompt_callback_device_initialize=0x55c1f4f133d0
PluginInterface --> OMPT: class bound ompt_callback_device_finalize=0x55c1f4f13420
PluginInterface --> OMPT: class bound ompt_callback_device_load=0x55c1f4f13450
PluginInterface --> OMPT: class bound ompt_callback_device_unload=(nil)
Callback Init: device_num=2 type=generic-64bit device=0x55c1f5998a80 lookup=0x7efc564dcb50 doc=(nil)
PluginInterface --> OMPT: class bound ompt_callback_device_initialize=0x55c1f4f133d0
PluginInterface --> OMPT: class bound ompt_callback_device_finalize=0x55c1f4f13420
PluginInterface --> OMPT: class bound ompt_callback_device_load=0x55c1f4f13450
PluginInterface --> OMPT: class bound ompt_callback_device_unload=(nil)
Callback Init: device_num=3 type=generic-64bit device=0x55c1f59992b0 lookup=0x7efc564dcb50 doc=(nil)
omptarget --> Plugin adaptor 0x000055c1f59720d0 has index 0, exposes 4 out of 4 devices!
omptarget --> Registering image 0x000055c1f4f146e0 with RTL libomptarget.rtl.x86_64.so!
omptarget --> Done registering entries!
omptarget --> Call to omp_get_num_devices returning 4
Allocating memory on device
omptarget --> Call to omp_target_alloc for device 0 requesting 4 bytes
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
OMPT --> in ompt_target_region_begin (TargetRegionId = 0)
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
  Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000001) src=(nil) src_device_num=4 dest=(nil) dest_device_num=0 bytes=4 code=0x55c1f4f13853
PluginInterface --> MemoryManagerTy::allocate: size 4 with host pointer 0x0000000000000000.
PluginInterface --> findBucket: Size 4 is floored to 4.
PluginInterface --> findBucket: Size 4 goes to bucket 0
PluginInterface --> Cannot find a node in the FreeLists. Allocate on device.
PluginInterface --> Node address 0x000055c1f5997910, target pointer 0x000055c1f59977a0, size 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
  Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000001) src=(nil) src_device_num=4 dest=0x55c1f59977a0 dest_device_num=0 bytes=4 code=0x55c1f4f13853
OMPT --> in ompt_target_region_end (TargetRegionId = 0)
omptarget --> omp_target_alloc returns device ptr 0x000055c1f59977a0
omptarget --> Call to omp_target_alloc for device 1 requesting 4 bytes
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
OMPT --> in ompt_target_region_begin (TargetRegionId = 0)
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
  Callback DataOp EMI: endpoint=1 optype=1 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000002) src=(nil) src_device_num=4 dest=(nil) dest_device_num=1 bytes=4 code=0x55c1f4f13864
PluginInterface --> MemoryManagerTy::allocate: size 4 with host pointer 0x0000000000000000.
PluginInterface --> findBucket: Size 4 is floored to 4.
PluginInterface --> findBucket: Size 4 goes to bucket 0
PluginInterface --> Cannot find a node in the FreeLists. Allocate on device.
PluginInterface --> Node address 0x000055c1f5997940, target pointer 0x000055c1f5971fb0, size 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
  Callback DataOp EMI: endpoint=2 optype=1 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000002) src=(nil) src_device_num=4 dest=0x55c1f5971fb0 dest_device_num=1 bytes=4 code=0x55c1f4f13864
OMPT --> in ompt_target_region_end (TargetRegionId = 0)
omptarget --> omp_target_alloc returns device ptr 0x000055c1f5971fb0
Testing host to device
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_target_memcpy, dst device 0, src device 4, dst addr 0x000055c1f59977a0, src addr 0x000055c1f5992980, dst offset 0, src offset 0, length 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> copy from host to device
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
OMPT --> in ompt_target_region_begin (TargetRegionId = 0)
  Callback DataOp EMI: endpoint=1 optype=2 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000003) src=0x55c1f5992980 src_device_num=4 dest=0x55c1f59977a0 dest_device_num=0 bytes=4 code=0x55c1f4f138ca
  Callback DataOp EMI: endpoint=2 optype=2 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000003) src=0x55c1f5992980 src_device_num=4 dest=0x55c1f59977a0 dest_device_num=0 bytes=4 code=0x55c1f4f138ca
OMPT --> in ompt_target_region_end (TargetRegionId = 0)
omptarget --> omp_target_memcpy returns 0
Testing device to device
omptarget --> Call to omp_target_memcpy, dst device 1, src device 0, dst addr 0x000055c1f5971fb0, src addr 0x000055c1f59977a0, dst offset 0, src offset 0, length 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> copy from device to device
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
OMPT --> in ompt_target_region_begin (TargetRegionId = 0)
  Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000004) src=0x55c1f59977a0 src_device_num=0 dest=0x55c1f5997890 dest_device_num=4 bytes=4 code=0x55c1f4f138fc
  Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000004) src=0x55c1f59977a0 src_device_num=0 dest=0x55c1f5997890 dest_device_num=4 bytes=4 code=0x55c1f4f138fc
OMPT --> in ompt_target_region_end (TargetRegionId = 0)
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
OMPT --> in ompt_target_region_begin (TargetRegionId = 0)
  Callback DataOp EMI: endpoint=1 optype=2 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000005) src=0x55c1f5997890 src_device_num=4 dest=0x55c1f5971fb0 dest_device_num=1 bytes=4 code=0x55c1f4f138fc
  Callback DataOp EMI: endpoint=2 optype=2 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000005) src=0x55c1f5997890 src_device_num=4 dest=0x55c1f5971fb0 dest_device_num=1 bytes=4 code=0x55c1f4f138fc
OMPT --> in ompt_target_region_end (TargetRegionId = 0)
omptarget --> omp_target_memcpy returns 0
Testing device to host
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_target_memcpy, dst device 4, src device 1, dst addr 0x000055c1f5992980, src addr 0x000055c1f5971fb0, dst offset 0, src offset 0, length 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
omptarget --> copy from device to host
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
OMPT --> in ompt_target_region_begin (TargetRegionId = 0)
  Callback DataOp EMI: endpoint=1 optype=3 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000006) src=0x55c1f5971fb0 src_device_num=1 dest=0x55c1f5992980 dest_device_num=4 bytes=4 code=0x55c1f4f13942
  Callback DataOp EMI: endpoint=2 optype=3 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000006) src=0x55c1f5971fb0 src_device_num=1 dest=0x55c1f5992980 dest_device_num=4 bytes=4 code=0x55c1f4f13942
OMPT --> in ompt_target_region_end (TargetRegionId = 0)
omptarget --> omp_target_memcpy returns 0
Checking correctness
Freeing memory on device
omptarget --> Call to omp_target_free for device 0 and address 0x000055c1f59977a0
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
OMPT --> in ompt_target_region_begin (TargetRegionId = 0)
  Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000007) src=0x55c1f59977a0 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55c1f4f139a4
PluginInterface --> MemoryManagerTy::free: target memory 0x000055c1f59977a0.
PluginInterface --> findBucket: Size 4 is floored to 4.
PluginInterface --> findBucket: Size 4 goes to bucket 0
PluginInterface --> Found its node 0x000055c1f5997910. Insert it to bucket 0.
  Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000007) src=0x55c1f59977a0 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x55c1f4f139a4
OMPT --> in ompt_target_region_end (TargetRegionId = 0)
omptarget --> omp_target_free deallocated device ptr
omptarget --> Call to omp_target_free for device 1 and address 0x000055c1f5971fb0
omptarget --> Call to omp_get_num_devices returning 4
omptarget --> Call to omp_get_initial_device returning 4
OMPT --> in ompt_target_region_begin (TargetRegionId = 0)
  Callback DataOp EMI: endpoint=1 optype=4 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000008) src=0x55c1f5971fb0 src_device_num=1 dest=(nil) dest_device_num=-1 bytes=0 code=0x55c1f4f139b0
PluginInterface --> MemoryManagerTy::free: target memory 0x000055c1f5971fb0.
PluginInterface --> findBucket: Size 4 is floored to 4.
PluginInterface --> findBucket: Size 4 goes to bucket 0
PluginInterface --> Found its node 0x000055c1f5997940. Insert it to bucket 0.
  Callback DataOp EMI: endpoint=2 optype=4 target_task_data=(nil) (0x0) target_data=0x7efc562697d0 (0x0) host_op_id=0x7efc562697c8 (0x8000000000000008) src=0x55c1f5971fb0 src_device_num=1 dest=(nil) dest_device_num=-1 bytes=0 code=0x55c1f4f139b0
OMPT --> in ompt_target_region_end (TargetRegionId = 0)
omptarget --> omp_target_free deallocated device ptr
omptarget --> Unloading target library!
omptarget --> Unregistered image 0x000055c1f4f146e0 from RTL 0x000055c1f59720d0!
omptarget --> Done unregistering images!
omptarget --> Removing translation table for descriptor 0x000055c1f4f14660
omptarget --> Done unregistering library!
omptarget --> Deinit offload library!
OMPT --> Executing finalizeLibrary (libomp)
OMPT --> OMPT: Executing finalizeLibrary (libomptarget)
OMPT --> OMPT: Executing finalizeLibrary (libomptarget)
Callback Fini: device_num=0
Callback Fini: device_num=1
Callback Fini: device_num=2
Callback Fini: device_num=3
```

</details>

https://github.com/llvm/llvm-project/pull/81991


More information about the Openmp-commits mailing list