<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/116162>116162</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            OpenMP offloading failure in LLVM 15 or later versions on multiple and different AMD GPUs and a found soltion
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          tkojima0107
      </td>
    </tr>
</table>

<pre>
    After changing nextgen driver as default in LLVM 15,
OpenMP offloading fails if multiple and different AMD GPUs are installed in the host machine, even though the correct GPU architecture is specified in the compilation time.

I had conducted several experiments to confirm the issue and found a solution to it.

On the host machine with Radeon RX 6950 XT (gfx1030) and Radeon RX 7900 XTX (gfx1100),
I compiled the following code to offload to gfx1100.
```
clang++ -O3 --offload-arch=gfx1100 -fopenmp offload.cpp
```

Then, Debug type build of libomptarget was used to trace the issue.
Here is an interesting part of the debug output.

```
TARGET AMDGPU RTL --> gfx1100                                                                                                                                                   │=
TARGET AMDGPU RTL --> Compatible: Exact match   [Image: gfx1100]        :       [Env: gfx1100]                                                                                  │=
TARGET AMDGPU RTL --> gfx1030                                                                                                                                                   │=
TARGET AMDGPU RTL --> Incompatible: Processor mismatch  [Image: gfx1100]        :       [Env: gfx1030]                                                                          │=
PluginInterface --> Image is notcompatible with current environment: gfx1100                                                                                                    │=
Libomptarget --> Image 0x000055bcc7a01090 is NOT compatible with RTL libomptarget.rtl.amdgpu.so!  
```

Even though the image is regarded as compatible with gfx1100 at the early stage, it finally becomes incompatible with that environment.

This issue does not occur with ROCm LLVM developed by AMD.

Thus, I compered both libomptarget implementations in the original LLVM and the ROCm one to find the difference.

In LLVM 17.0.6 https://github.com/llvm/llvm-project/blob/6009708b4367171ccdbf4b5905cb6a803753fe18/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp#L2697,
It seems that the image must be compatible with all GPUs in the host machine due to the last if statement.
```
Expected<bool> isImageCompatible(__tgt_image_info *Info) const override {
    for (hsa_agent_t Agent : KernelAgents) {
      std::string Target;
      auto Err = utils::iterateAgentISAs(Agent, [&](hsa_isa_t ISA) {
        uint32_t Length;
        hsa_status_t Status;
        Status = hsa_isa_get_info_alt(ISA, HSA_ISA_INFO_NAME_LENGTH, &Length);
        if (Status != HSA_STATUS_SUCCESS)
          return Status;

        // TODO: This is not allowed by the standard.
        char ISAName[Length];
        Status = hsa_isa_get_info_alt(ISA, HSA_ISA_INFO_NAME, ISAName);
        if (Status != HSA_STATUS_SUCCESS)
          return Status;

        llvm::StringRef TripleTarget(ISAName);
        if (TripleTarget.consume_front("amdgcn-amd-amdhsa"))
          Target = TripleTarget.ltrim('-').str();
        return HSA_STATUS_SUCCESS;
      });
      if (Err)
        return std::move(Err);

      if (!utils::isImageCompatibleWithEnv(Info, Target))
        return false;
    }
    return true;
  }
```

On the other hands,
in ROCm LLVM 6.2.1
https://github.com/ROCm/llvm-project/blob/669db884972e769450470020c06a6f132a8a065b/openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp#L3550,
it seems to check there is at least one compatible GPU in the host machine.

```
Expected<bool>
  isImageCompatible(__tgt_image_info *Info,
                    __tgt_device_image *TgtImage) const override {

    for (hsa_agent_t Agent : KernelAgents) {
      std::string Target;
      auto Err = utils::iterateAgentISAs(Agent, [&](hsa_isa_t ISA) {
        uint32_t Length;
        hsa_status_t Status;
        Status = hsa_isa_get_info_alt(ISA, HSA_ISA_INFO_NAME_LENGTH, &Length);
        if (Status != HSA_STATUS_SUCCESS)
          return Status;

        // TODO: This is not allowed by the standard.
        char ISAName[Length];
        Status = hsa_isa_get_info_alt(ISA, HSA_ISA_INFO_NAME, ISAName);
        if (Status != HSA_STATUS_SUCCESS)
          return Status;

        llvm::StringRef TripleTarget(ISAName);
        if (TripleTarget.consume_front("amdgcn-amd-amdhsa"))
          Target = TripleTarget.ltrim('-').str();
        return HSA_STATUS_SUCCESS;
      });
      if (Err)
        return std::move(Err);

      if (utils::isImageCompatibleWithEnv(Info, Target))
        return true;
    }
```

Does the original LLVM implementation assume that the host machine has the same GPUs?

If yes, I guess limiting the visible GPUs only to the target GPU might be a solution with setting `ROCR_VISIBLE_DEVICES` environment variable.

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsWFtz4joS_jXKSxeULWMDDzwQLudQm5lMBWZ23ijZbts6R5Zckswk_35Lsgm3bM7Obmp3H4bKBUzrU9_UX7eYMbyUiDMS35N4ecdaWyk9s3-qP3jNgjAY36Uqf5nNC4sasorJkssSJD7bEiXkmh9QAzOQY8FaYYFLeHj49gnCmNAFCZYkmD82KD99AVUUQrHcLS8YFwZ4AXUrLG8EApM55LwoUKO0MP-0hN--fDXANAKXxjIhMHfYtkKolLFQs6ziEgldAB7QfaHasvLfZ0przKxDAKazilvMbOuQDJgGM17wE1im6oYLZrmSYHmNw07n7u8GKpZDpmTeZhZzMHhAzQTgc4Oa1yitAaucQMF17fG4MW1nTqFamQMDo0TbwSvg9gL_8dYg-MFtBU8sRyXh6Tsk0ziA7zsgdFIWz2EQBYROPf5JZjwNnMz3o1AYOKFX_296IzH3uxVKCPXDhSFTOTqt-si4t_3yo5ZJ0P_4j5lgsiT0ntB7GDxGMBj0KwfOzSRa9qthUKgGZd0ckYdZ07yJ2P3dVShdIJeYtiXYlwYhbbnIQRUgeKrqxjJdooUfzEBr0GtqNcvw5PJe49-xizOTwKVFjcY6SxumrUNz4rnfRbW2aS-jcaXbbv7022rnctEl0tPuAQYDEq2OLoL_vxdZUTIdkQkl0fJ9GxaqbpjlqUASzWH1zDKXgDarHEp8v6lZ6b85ZlO8fN0jmh_fxfcreXhb6n9hU388Pl6F__j1r9uwkdlFZL5olaExSkPNTR-gfy8-rnB8ZHxubfoi2pLLjTt1hTuavUVOVXcipbIn27oql7Xal3uUB66VdPX0zKiP0_VnrHg4rzfnJgTPQRAEcZxm2dgR4zRwVn1-3MG1WS6c52VrqK0Ysjovm3ZoFKEhwDvFcHXFZvzoQI0l0znmjm2vtzy6jFm_BpkWL2CsyxK6AG6h4JIJ8QIpZqpGA1xeQ9iKXURieFmhuempLVfogwkqy1rdW_y4qDvaz_GAQjWYQ_ri0vsKpTVOn46QUDspZavLGs_rRqBTwHOyOfK00rx0NnTbOP5zT_3GSnoWK3j_8NhHZFdkfuxMxsNgmEBlbWNINCd0Tei65LZq02GmakLXQhyO_waNVn9gZgldp0KlhK6TIJiOg0k6ipJxOA6zLE-LURpPgzhLEzYJonEcFRhOCF13JOiQzgwkdN34k2IGfRdF6LrLDkLXRmeErl3COMqk0QNNpuMTk1swiLXpgnXKjro1FlK8SQsmRNdHvdE5Qd56t7nnghnr2jFjmcXz4F8m6Oq5QdcHkWiRKiXc2eDGn44zPqGT_d6Wdu8123NZKCB0vpGFco1LpqSxoA6oNc8RyPi-g3aHsVDadTCVYXtWorR7C3P339e0v6GWKPxn44AuVgIYm7tYRnNjtSP8Xefr6EKItVbBSmsg0RJay4Xp1nCLmln04Jvt3BA68e9drrq-mCYkXvaaccP2Fjbb-RtKALRc2ojuLTygLG11tT-AQ3BObs3ewta_uZHpHnsdjxuWaL0n90xYQid-9wX8vp3vN-738_px_3n-abV_WH3-bfe7V5smvQp0erMDL5yfj_vQ0G3lwLa7-e7rdr_9ulistlu38mIZgEbbanmt-KVUd55g97h8dHHrK4evGMw1nl1pcFlnLJM50_nwEiCrmHYO_sxqJPF9b0a8_EBH-SLU7_Df9o8vLj7ttj5Vn7CAnXZDUJ-zXu33VTuXH7oj1da4L7RyKTshlLpykskBq3P3WxlGKPUTwY3GHYb34QWosJrXHmw8IHRM6HRorPYPbpXqzX7DRZeSZLy8Xd5ZtNL6Rrse9vVo1-qAJ9G3vNthERqeH-7rCvV3bivXGNFJV5QWx2LxhoN6FQomDF7o7Ux5_dBLWd2eC72KvEnz_einbIUaKiZz81rluTwj1GRIh2H3_B3GcvL_nLGSaZ5OJqPpmOI4mY7iYDQOAhpkQcKSIowom7AgidMPYqwojoOTLa-MpSCrMPvTWd2PaBYEOuJx_H3GXK4jfoOv3hvVbonpGISf46fF9fE4vbpFOR54ht1at2xX2q4Xf4fafhHcL4L7RXC_CO7jCO6D2e2Kt_6KuZZuALwdyS7nNmDGxew0qFyMHhXrEAyrfb03JFpfjGoFvGA_KpYtGgOC19zf5LllB26OTGFASfFynGP6EdIxSM3Lys9EZxewfiYyaD0QSYKnx8XT_ttmu7l_WO2Xq2-bxWpLkuB8CoYD05yl4kg_d_ksyqfRlN3hLBxHYZwEER3dVbNxETEaB_k0DEY0G6f5KGPIRkHCRkUxmaZ3fEYDOgrDcBSM4sloNMzTaBJP8skoDpIwGqVkFGDNuBi6AzRUurzzA_csDJMwoXeCpSiMv6enVOKPbhx3eR8v7_TMU3_aloaMAsGNNScYy63A2du38P5a_PXKHpQGwSxqOKA2fvxW8q8v6f01d3fdbZRwrr5rtZj99ITtLTKErnuTDzP6jwAAAP__fRcXpw">