<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/163335>163335</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            Feature request: make openmp target simd for nvptx and amdgpu targets equivalent to acc vector

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            new issue

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          bschulz81

      </td>

    </tr>

</table>

<pre>

    Hi there, gpu API's for parallel programming often support three nested levels of parallelism.

In OpenAcc, this is reflected by the preprocessor constructs gang, parallel and vector. 

OpenMP has the three parallelization levels: teams distribute, parallel for and simd.

In gcc-15.2, the simd construct now corresponds to acc vector.

I.e. the construct

#pragma acc parallel loop gang vector

is equivalent to

`#pragma omp target teams distribute parallel for simd`

for gcc-15.2

The code (note the element access over a strides array, which is common for blas routines)

```

#include <omp.h>

#include <stdio.h>

int main(int argc, char** argv)

{

    int x[600];

    const int stride[2]={1,2};

    #pragma omp target  data map (tofrom: x[0:600]) map( to:stride[0:2])device(omp_get_default_device())

    #pragma omp target  simd device(omp_get_default_device())

    for(size_t i=0;i<200;i++)

 x[i*stride[0]+i*stride[1]]=-1;

    printf("%d",x[597]);

return 0;

}

```

when compiled with gcc-15.2 and the following options 

`-fopenmp -foffload=nvptx-none -fno-stack-protector -O3 -Wall`

 yields no warning and compiles fine and runs on device.

If compiled with clang 21.1.3 and the following options

` -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -O3  -Wall`

one gets the following warning from clang

> warning: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]

The cuda code generated by clang is very efficient when compared to gcc. However, when you already implement openacc with its vector construct, and if openacc is implemented by the openmp runtime, then it would make sense to turn omp target simd into an omp target vector equivalent, and allow the constructs omp target parallel for simd, and omp target teams distribute parallel for simd as well as collapse, reduction and tile constructs for this

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyUVkuPGysT_TV4U2qrTU_7sfDCHmeULD4li0_KMsJQ3c0NDQRoOzO__qro9mvmZhHJUltQj3Oo4hQiRt1axC2r96w-zMSQOhe2xyi7wbytF7OjU6_bzxpShwEZf4bWD7D79oXxVYTGBfAiCGPQgA-uDaLvtW3BNQktxMF7FxKkLiCCxZhQgcETmgiuuXrq2M9ZuWPl7ouFrx7tTkrKlDodQUcI2BiU5Ht8JRzgA_rgJMboAkhnYwqDTBFaYVtyvEISVsEJZXJhDqzcUez_fYNOxBxmhHVF8SaSdnbCx6odJBR9BKVjCvo4JHwITdQpfNS9uqFvpSwW9ZyP8DHv3hCCdWeQLgSM3lkVITkQUl4gTlHmOM--V7dxnfHKB9H2IrtccRjnfCY-RRmNdQT8NeiTMGgTJDeFWJa3KK73kERoMX0g-siSOJBnDkELV5J55f8ZqkJgfG1dwowdDfaUWUiqErgTBhBA8RVGECGIVzqic6dlRyWWru-dzemORkQIbkjaYmR8c4M-_vJJaCvNQDmrZ9f7eceqTx82YlLaXba0TdALbRlf018R2txishOB8R3jO1o6TelWe1buAADI9Der98uyZPWBVZf1XJq8O1Ji9Z5ngwNb7ReMP3O2mswnj_88d1AiCeiFp7NLrgmup76jjCWrdlNWviETxtdUx2p3zUgmfDRQeNISGV-73v9oMf1Q2IjB0HfaIGaZ3J_B5F79y0iNC4yvo37DHwk0qw4lq_aaVc-8HP_xff5le-KlGb9nQOgf1xa0lo-yWDycoA_apiYD4IzXKn-eKWa9WY3HMNoHTEOwUF69qRbvOqjcnTu01HZeG1Rw1qm79nW-19TFjTPGnbOeedKGCGOconEebe-haFzTGCcUqw725NPvwjqLUDTWFTEJ-bPwwaV8L6H4WkHxXRhzhQCvGo2KYB2cRbCUhzJPoCI02mJeCYON4OxUnItONO_gS0MqwBfzxbz6M4WRAdxTyH-KsQ3ihcnyqbAnrbQo5KBERv8OPjElj3d5LlSom0dMUxWqT5c9avKsW9alSbb0G6qsuR1mpL1-wwBnEWGw4miQpNJjaFzos03AX8M4T1IQNtJ6lm9W7Udtf1iFXrddgiOSyFE4BSRsHqVuNCoQkSQv0VASFgY7Da4P0cEFhYHYsXpffPcixqIRVAJWHa6mxYVmfbiTSDrFrJMtWgxiGmdj0XSEE4ZXwKbRUpNwXttTBELhqDnn8Nmd8YRhlE608OoGECagUK-gez-JLhWUZkTuCp3idMR3A4U_5wbRzdWWJu0lwG3QTj0SBpt0j9NQs6ATnN1gFPTiJ0JEG3N98rW7U5QsKNrSkHtYn-DcBtQFj6Aeepx98d7x41iaHP9qllG1z2gMfaUzRviYqQVUg8xVzpdHmwcY5E0PElbuZmpbqU21ETPcLlbL8qkqS76eddtGLMVqs0SslrxZNvLYrKpmJZZHWa82y4Wc6S0veb0oF0_lmpd8PVdLtShrWTY11ssNPrGnEnuhzdyYUz93oZ3pGAfcLpZVVdUzI45oYn6rcW7xDHmXlLA-zMKWnIrj0Eb2VBodU7yFSToZ3L6gSEO43h66cLmEU53v60Z8sxCMlekVvfsmjXh8W9w9YWZDMNsuJU_vJ8ZfGH9pdeqG41y6nvEXgjN9SBn_QerFl0wiMv4ysTxt-b8BAAD__5crh14">