<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/124875>124875</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            openmp target: the attached code uses openacc. It runs fine with Nvidia's nvc++ but fails on runtime with clang 20git (probably a device mapper problem with structs containing several arrays and methods..)
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          bschulz81
      </td>
    </tr>
</table>

<pre>
    the attached code (remove the txt extensions of the the filenames in main_acc.cpp and main_acc.h and use cmake to run)

uploads to the device with open-acc. 

while it works with Nvidia's nvc++ compiler it fails on clang, clang version 20.0.0git97c3a990
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/20/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang++.cfg

with the following

[localhost:6103 :0:6103] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f56aa400008)
==== backtrace (tid:   6103) ====
 0 0x0000000000041820 __sigaction()  ???:0
 1 0x0000000000014ee5 gpu_cholesky_decomposition<double>()  ???:0
 2 0x0000000000002d0c main()  ???:0
 3 0x00000000000265ce __libc_init_first() ???:0
 4 0x0000000000026689 __libc_start_main()  ???:0
 5 0x0000000000002355 _start() ???:0
=================================

Process returned -1 (0xFFFFFFFF) execution time : 0.434 s
Press ENTER to continue.

The function
main.cpp calls the function
cholesky_decomposition with a parameter that indicates it should run the entire function on gpu.
It then uses the upload commands, packed in a preprocessor macro (which one can simply replace by the original mapping command, it does not matter) 
#acc pragma enter data copyin(dA)\
 copyin(dA.pdata[0:dA.pdatalength])\
  copyin(dA.pextents[0:dA.prank])\
 copyin(dA.pstrides[0:dA.prank])

#acc pragma enter data CREATE_OUT_STRUCT(dL)\
 data copyin(dL)\
 create(dL.pdata[0:dL.pdatalength])\
 copyin(dL.pextents[0:dL.prank])\
 copyin(dL.pstrides[0:dL.prank])

to upload a struct called pdatastruct, which has as elements the arraydata, its length, an array for the extents, an array for the strides, and values for the lengths of these arrays, and whether the data is row or column major. Also it has a few routines. Among them are constructors, operators for array access (for vectors and matrices and general tensors) and routines for transpose matrices and sub arrays.

also, acc_malloc is used to create a temporary buffer, and then (unfortunately, openacc does not have a an #omp target pragma), we call:
 
#pragma acc parallel present(dA,dL, step_size)deviceptr(buffer)
{
 gpu_cholesky_decomposition(dA,dL,buffer,step_size);
}

the code in the latter function executes correctly on device with nvc++ it would also execute on the host, provided the struct datastruct is offloaded correctly, and a device function is created properly. The task for this struct offload is clearly that it does not suffice to only offload the vtable and variable names of pdatastruct and the arrays. These variables must be associated with the host variables, but what is more, the pointers in the pdatastruct variable must correctly map to the offloaded member variables. And then the function gpu_cholesky_decomposition must be offloaded correctly too...

Nvidia's nvc++ seems to be able to do so. The code there runs, and if one inserts if( acc_on_device(acc_device_nvidia)) printf("works on device") inside the gpu_cholesky_decomposition, it prints that it works on the device. also the variable dL must be loaded from the device in order to get the results. This indicates that for nvc++, this code works.

Apparently, it does not work in clang. 

Perhaps my code is wrong. In that case, i would be greatful if someone can explain me how to offload such structs or classes with methods, and such functions which are not mere loops but do something more complex. 

If it is an error in clang, (which maybe), well then I hope its device offloading gets improved or fixed soon.

Thanks.



[mdspan_acc.h.txt](https://github.com/user-attachments/files/18582049/mdspan_acc.h.txt)

[main_acc.c.txt](https://github.com/user-attachments/files/18582053/main_acc.c.txt)

[CMakeLists.txt](https://github.com/user-attachments/files/18582147/CMakeLists.txt)

[correct-output.txt](https://github.com/user-attachments/files/18582164/correct-output.txt)

[clang-output.txt](https://github.com/user-attachments/files/18582167/clang-output.txt)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0WEtv4zgS_jXKpRBDpq3EPvjgJG0gQHZ2MJM5GyWqJLFDkQJJOfb8-kVRD9udx6k3CLojivX-6mNR6L2qDNEmyR6S7OkGu1Bbt8m9rDv972p-k9vitAk1AYaAsqYCpC0IErFy1NgDAb8LxwB0DGS8ssaDLfvVmqBUmgw25EEZaFCZPUo5k20LaIrzQh0fO08gG3wjCBZcZxKxTtJtkm67VlssPC-z0oIOShK8q1CDbcncsgrot77XShOoAO_Wvfl-zx8HVShMxL0Hc5CJeEjEA0jbtEqT470lKu3BGpAaTZWIx_4POJDjiECks3SWViqs7-UC1-s0Sbev6CoKyWILx9Xd_m5528pbrUx3vK1Mx-9rR1hAYwvSvKu1Xh2TdPtsfECtqXhSjtcTseu8S8ROq5z_1YcmETuRJmKXK5Ok20drSlV1DgO7wgkdxCjIROwGl3efeHE7vON4Z7KshgxxSmJtrNb2XZlhPcketJWoa-s5rLt5uoBksU2Hv5PsCR6xq-oADBnUMJ8zDP6mqiETBu-w0zEnyhxQqwJaco3yPSxK66DBtqUCbP6TZAAMgEXhyHtIj_dldoe4TNM0XQ2VXzxNv5CjfAsOZcReUAVbAYieiTVcbk3SLaSQHtPzz3K-Eins915VKNnTRLANgGSxG3-3XFWYXwvOl0QZVG23l7XV5N9O-4IYOtarqGfxWNgu55r8-FKnuNaZiiKVEftfSiyuJcRdJgn2e61yuVdGhX2pnA-D-Afp5a_Sd6v1KO0DurD_1nj2q7uLLINe8AuLV9n_P_1GjP7prGS0OAqdM1TAbcRgetwNP-wdHUl2EY5BNcQYhnS2XCzBRw0s_-OP1x9_MZ1Ia4IyHc16_a_cFp3pIZJuOU2RrCRq7fumOb_9HBE95SC06LChQA5CjQGUKZTEwEQYwNe20wVTXNRJJih3Vs1EVLUdu_QceINhZuzN90zI5NWgKTxTVYvyjQrmV4TWUdunKPaadJbT814rWYM1BBINeNW0-gSOWs3dlJ-iYutUpbiruUOVqUYTbEEFKCx5MDZAgyGQiyBIt4lYoJTQOqwa5DDIQYEBQdr2FBFWbLmVs0fG1cXirOVtSfbA7DI-ajJVqJPs6SxyLROPmOAvxByatyuBq_0-OFXQF_t7zvsqgMe_fmxff-z_-8_r_u_Xv_55fGWVL2c7v0R58UY6wkBx8TrKly-jvNDzIciXb4N8-RDky8cggx1hg-CD62SIeKYCokP9Ete5x0mNHtADaWJe72GHzuEpBhPh4GGIQjwCmv5tpPeI5iGAz96NzsZ3BRxQd-Snt73ScX7wg9lp93tNoaZ-a8y_8uDsO1gH0uqu4QHjp3Uz2GpvGbQxEijpHZztgjLkZ7BtrKlYRQPoiPu_j9-6aMe25JAfolO98ygj5yRixWsHipuH-SU4Jal_qMiQQw08BkVt67g8mu6jdGh8az1di_ouH2IdeAi1tzFsKfcNam0lB9t5KiJpRYgBQqCmtQ7dCfKuLLkt-0xFzkjEqjOldaEzGEifhvAMA35q5xoPrAh5-8I2LYQ41gwdwRBiXFAETLJg14a-H1omdg86hpNm9vFkwtD2j9wVj-ADtXuv_qVErPvBrQ0uEavR4wjS-wfW_M0xe6lyivVSdbJ4iIqGg4IxEgdV1VOsjrR1ptj-jCAP0jpHMugTs-7lYHmeFOMwyXzNVRkleTsrjtMSs7CzB1VQMcKcu-zcXVw9W5bchHGCHmyO9cLR8uSf8kOVC9bcktOnGfDpFNC_DQ2j_GhoUB2lNKHTp-HQuSBu35UlmwgWrOFwBxn29xAw1zS0pFPxoZ_abXlJEiO4RrCyR54mGQ9N5wPkBOi9lSq6P42bnKnzVg497wK8Rz89NNYRr_HO1ipmYj8W79KDyb9o6ly9BtvxenBOdENNTu5sdAbbsTsuz_JvgDdF9En1IFg7mw0d--kdwxM18dbCKWGng4XCgrd9LSNAmdKIR4GJ51QZz2plPLngQZWJWEUmsGbf4yQRK37uH_ZmML1mymmdMqGMc5ror0ATrhMheIcyXvV2v224eOxHbX4C06TwfAub9V0RUTSWpniZ0jbkrHS2uby6KQPWFczlFphu-JUj3-kQQaX8xbwUjTPip7z2OGGwcwKjU0MVtm2LjszQWpfw511sNl6Jxsvin-RqbD00p4EsPLw7y--fTW9Xoo-wVAMF5AQV92XZaa6Ttw2NcxUdW418y2Wov8c-G1rMd7IeWtXH00qj54EutkZDobbFVPy4dwSmH45kPqni8MVQ0da2PvZOhBLL87zGDRQvtZqOY3zPJedA8SED5Jx1UwbY3DQZNnjKaeJ6rfsOeYbathRP-6FqQzxsrSJGZsOsx9c5B6U6UgHeWjON0mimukw3zKbwLQ53_lk4hjinrOoQWs_Hi9glYlepUHf5TNom3o3J3fbfHpp-rNjxFZj_n6-ylUiX60TsPqidBrzs4fzV4fcYzBZs8FrphbnH_-AbvSgf_G8xN1_eJ2L3i9ILcwMb3doutF34PSbvlonYfaL40iyD6PcavR8_ZlybvCk2i2K9WOMNbeb3i9VciDRb39SbYrnEVbpKBS6Xy3wtFsV6ld5LvC94IStv1EakIkvnYp0ulmm2nmWLu1KkKS3pLlsv8iJZptSg0jOtD83MuupGed_RZi6Wq_vsRmNO2scvY0IMXSOS7OnGbVjgNu8qnyxTHcsyqQgqaNrwoDVNU3wD_fgJLd7phoFsBs8hngFQKkPffLXitp--VrnOxBtu3N5_sRJppQJ3dutsjrk-nWeL-OnF8TiRa2p6oZGU-BqMynBnezrEMbY_4vsxt6eo2YzL0Tm9-abOw_ermKDW2Z_EV4tdzGusdZ_aw0b8LwAA___EL77o">