<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/156805>156805</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [OpenMP] offload hierachical parallelism gives wrong results.
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ye-luo
      </td>
    </tr>
</table>

<pre>
    Caused by
https://github.com/llvm/llvm-project/pull/146404

Test OvO. https://github.com/TApplencourt/OvO
run
```
CXX="/soft/compilers/llvm/main-patched/bin/clang++" CXXFLAGS="-fopenmp --offload-arch=sm_90 -O3" ./ovo.sh run test_src/cpp/hierarchical_parallelism
```
results of 87db8e9130e49f6fd3b35ef1e22fd71bf55ef027
```
./ovo.sh report --summary --tablefmt github
>> Overall result for test_result/2025-09-04_00-08_hopper01
|   pass rate(%) |   test(#) | success(#) |   compilation error(#) |   runtime error(#) |   wrong value(#) | timeout(#) |
|----------------|-----------|--------------|------------------------|--------------------|------------------|--------------|
| 74.2% |       310 |          230 |                      0 | 0 |               80 |            0 |

 >> Summary
| language   | category                 | name                         |   pass rate(%) | test(#) |   success(#) |   compilation error(#) |   runtime error(#) | wrong value(#) |   timeout(#) |
|------------|--------------------------|------------------------------|----------------|-----------|--------------|------------------------|--------------------|------------------|--------------|
| cpp        | hierarchical_parallelism | memcopy-complex_double       | 64.4% |        45 |           29 |                      0 | 0 |               16 |            0 |
| cpp        | hierarchical_parallelism | memcopy-float                |          64.4% | 45 |           29 |                      0 |                  0 | 16 |            0 |
| cpp        | hierarchical_parallelism | atomic_add-float             |          77.8% |        72 |           56 | 0 |                  0 |               16 |            0 |
| cpp        | hierarchical_parallelism | reduction_add-complex_double | 78.4% |        74 |           58 |                      0 | 0 |               16 |            0 |
| cpp        | hierarchical_parallelism | reduction_add-float          |          78.4% | 74 |           58 |                      0 |                  0 | 16 |            0 |
```

before change 87db8e9130e49f6fd3b35ef1e22fd71bf55ef027
```
>> Overall result for test_result/2025-09-04_00-24_hopper01
|   pass rate(%) |   test(#) | success(#) |   compilation error(#) |   runtime error(#) |   wrong value(#) | timeout(#) |
|----------------|-----------|--------------|------------------------|--------------------|------------------|--------------|
| 100.0% |       310 |          310 |                      0 | 0 |                0 |            0 |

 >> Summary
| language   | category                 | name                         |   pass rate(%) | test(#) |   success(#) |   compilation error(#) |   runtime error(#) | wrong value(#) |   timeout(#) |
|------------|--------------------------|------------------------------|----------------|-----------|--------------|------------------------|--------------------|------------------|--------------|
| cpp        | hierarchical_parallelism | atomic_add-float             | 100.0% |        72 |           72 |                      0 | 0 |                0 |            0 |
| cpp        | hierarchical_parallelism | memcopy-complex_double       |         100.0% | 45 |           45 |                      0 |                  0 | 0 |            0 |
| cpp        | hierarchical_parallelism | memcopy-float                |         100.0% |        45 |           45 | 0 |                  0 |                0 |            0 |
| cpp        | hierarchical_parallelism | reduction_add-complex_double | 100.0% |        74 |           74 |                      0 | 0 |                0 |            0 |
| cpp        | hierarchical_parallelism | reduction_add-float          |         100.0% | 74 |           74 |                      0 |                  0 | 0 |            0 |
```

LIBOMPTARGET_DEBUG=1 shows the old code launching kernels in generic-SPMD mode but new code in generic mode.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzsWFuPmzgU_jXOyxHImGse8pCZDNVKrVJtZ6V5iwwcLrsGI9ukO_9-BWS2CSSZNp1RX2pFinJuOZ_PB_5krnVVNIgr4t8Rf7PgnSmlWj2jJTq5SGT2vLrnncYMkmdC16UxrSbumrCYsLioTNkldiprwmIh9i9fVqvk35gawuK2E4Kw2PECj3qErgldP6I2sN1vbbhS7XHdtgKbVHaqL7Pdbwldq67pSwT08KHr-6cn4m4IY4TFWuZ9aCrrthKo9Leeal41VstNWmJGWJxUTR8neFMQdjd8GNw_PcUf1x--jOWsXLbY1C1YlsxzIXlmcZWWxN3oerekYG3dPskmLJZ7aesSVNeAQW12WqV99bYlLC4rVH1elXKxa7niQqCodD1BoVB3wmiQOURhlkS4dFyK3jIP8sxNXB9zBxnLs9BJct_HnLJwUuG4EWylMmBZuqtrrp7BsgxPBOa1gXGL-1z3gbgPsN1j3xOMDUAu1Yhh_E1YzCjzLbq0qLej1KLRrpRti4o6fY3wHgBarjUobpCwiDCfsCWMjr7QYHNfbLpLU9T61AgwDoybSjaASkk1DVBdY6oazzu_KtkUsOeiw1NXnyK70xbGrq3JOjXNAs5kvOI6az5X97CLoWczwvwDon65Dj36BQDMnRiO1-g6FxDNjPRlH-gaDiz4MhLl0Ez_WHS8QIAhNeUGC6meZ3_aOxte4_mWDgEX-TFnB7wHPy6yA76TH1eG_4rzQsAvJ1vatscTuvSKGpw11qlsn61-BAL_3WWySwQeZQee7Z0yFzx_Qrn_d_yHmOsEF5l7I4j-NW7Ok_SwjtHcAOOC-e2QcCPrKt3xLDsD5uQ_wtCOJnMJ2aQNP7i4-RcgvR0ShVmX9k_0AGZCr-GVGM2IFXpTANGvJdYpiMlETsdxhOYGGBfMV5AcSQNC1wnmUiGkJW8KvFVi3CYZmPdbMrzHW9yh1KavaIaZ4fsekLnxt2b4rRmOJ_Rzh9QZ6s4Pp5nhZ6n71srnZZ2gmYmGmeFar1fRvbv6OTeWS3h-QDO8HZJXNcNZYk1P25nh9a1_RxRXRMMJmhtgfC-6C5rh4x9320-fH9d_fnh43G0e7v76QNyNA7qUXzWYEkGKDFKZIQjeNWlZNQX8g6pBoaFqoMAGVZVaXz5_2kDdhyWdgQa_jjnfIganvchWbrZ0l3yBKyf0g2XghUG4KFduFDqUZVGS8CTxl5EXMUwDtgyczPVCN11Uq15w0CX1qO8FNLRZwjKkIYv80MuYi8SjWPNK2ELsa1uqYlFp3eHK8YOI-gvBExR6uP5irO9v8BLGiL9ZqNVwnZV0hSYeFZU2-lsZUxkx3JttW2w-fSb-Bg53RSMDRgLAMQGKao_6cMAcLn3sRafE6ofv1YYuNWHxAcZ-xf4LAAD__1ro4Zg">