<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/126342>126342</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            when offloading nested structs, sometimes "PluginInterface" error: Failure to synchronize stream (nil): Error in cuStreamSynchronize: an illegal memory access was encountered happens
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          bschulz81
      </td>
    </tr>
</table>

<pre>
    The following code has no memory problems. Upon investigating, it seems not to offload dQ and dR correctly in the function qr_decomposition in line 3792 which calls 
        create_in_struct(dA);
 create_out_struct(dQ);
        create_out_struct(dR);

that do the mappings with

template<typename T>
void inline create_in_struct(const datastruct<T>& dA)
{
    #pragma omp target enter data map(to: dA,dA.pdata[0:dA.pdatalength],dA.pextents[0:dA.prank],dA.pstrides[0:dA.prank])

}

template<typename T>
void inline create_out_struct(datastruct<T>& dA)
{
    #pragma omp target enter data map(to: dA) map(alloc: dA.pdata[0:dA.pdatalength]) map(to:dA.pextents[0:dA.prank],dA.pstrides[0:dA.prank])
}
template<typename T>



 and then calls 

gpu_qr_decomposition.

If one removes the lines 1863 -1997 in gpu_qr_decomposition, especially these lines:

 #pragma omp parallel for
    for (size_t i=0; i<Q.pdatalength; i++)
    {
 Q.pdata[i]=0;
    }
//
//
    #pragma omp parallel for
    for (size_t i=0; i<R.pdatalength; i++)
    {
        R.pdata[i]=0;
 }

then the code suddenly compiles...

The strange problem is that before, similar code is called for a cholesky and an lu decomposition.

here, the code works provided i compile it without optimization.

if i compile the code with -O2, then clang takes veeeeeeerry long to finish, and for the result, the lu decomposition crashes too...

There is no problem with the array sizes, as I have checked these, and these are just test cases with 9 elements (3x3 matrix).

[main_acc.cpp.txt](https://github.com/user-attachments/files/18716923/main_acc.cpp.txt)

[mdspan_acc.h.txt](https://github.com/user-attachments/files/18716924/mdspan_acc.h.txt)

[CMakeLists.txt](https://github.com/user-attachments/files/18716925/CMakeLists.txt)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0Vt1u4zgPfRrlhmjgyEltX-QibSfAAN-H3XZmrwtGom1NZckryUnTp19IdtK00539waxhoKkokkeHh7TQe9UYojVb3bDV3QyH0Fq33nnRDvqlXMx2Vh7XX1uC2mptD8o0IKwkaNGDsdBRZ90Remd3mjo_h996a0CZPfmgGgzKNIzfggrgibroEiBYsHWtLUqQ94BGgnwAYZ0jEfQRlIEQ8w1GBGUN_O4eJQnb9dartKAMaGUI8qLicGiVaEGg1h5YtoHpEY4w0KMyjz64QQTGS7lhvGL5Tdw1me0QLuz3r_a3Ud5uezhvY9kmtBhA2oS4w75XpvFwUKGdzNT1GgOx_DYcezLYEXxl-SeWbfZWSVAmneQDtMIaH0BiwGkpv02O_BrGg2QbVpywMp73DpsOwXY9BHQNBSATyKUIERnjZbAs3yTvW7mZ99HCVjcZyzenfzWZJrRsdTdtoedAJviLXQ7N06vdB6ckfWivRgJYcfdvmHjL-H9DQjWtoNZWjIt_RUp1GeRnEJTY-QE15ze1SWjJvEqdZZumHx7ft8d8NH2uwRoCR53dk0_6jAR7WJTXOVwtqqqIjfRRhNiw5HsSCrU-Rlc_ObP8BOct2T061Jo01NZN1aitA8ZLr17oMYBi-V3G8pv44_b-Da9pkd-ktzqVcizq_bkeKlI2hjhvScLi2_he_PheCf8M3MPfBTc9D3-G8VX5sWqR_zQ3_SAlGX2ESLjS5OfzqWJxyPrg0DR0mqegYuUwwI5q6ygWxqtOaXRjMOWTHEimEyGI1mryT8ekFjSgB_hIGy2Nsc6gDtY9-Zh0ryRJUCdwcW7HYWaHALYPqlMveBFH1RdbX4Op0MLVL3zKYEBoNA0EfCIPexof546gbVy2UCujfBu3R9jxJDGWIz_ocIL5_iQgHPo2CtvaSwZdIsXYM4MJTYyAzuERYsl9SuXhM7S4JxAtiSeSo85PKEbRoyP4NvgAgXwAgZ7G4Q4VkKYutn7UUf6cQ4fBqWfGqwkLW910qMwjCjEXfT8PzyG1fdmG0KdGSoptVGiH3VzYjvHt4MldYQgo2hSb8W0dJcL4dlEWi-uK54xvvwt7nrWrm076Hkdr-5NSLmPK92EvUt7-H5_of8oH_5MSrhjfvgvKq5lc57LKK5zRelHkZVVWqzyftWtR1cvrIqtxmRWlXNXZYlkir8pFIXi1kOVMrXnGVxnPymyZl6t8nhfLlSh4UV-jKPKqYMuMOlR6rvW-m1vXzJT3A60X_Dpf8pnGHWmf7kecGzpAsjLO43XJraPT1W5oPFtmOgE-hwkqaFofYgtM9514ezLkA0kYP2hJit52FFRHUUv8Vz00ynyOn60aRcwD5Jx18QO1RaUHR7Fn_NGI1lmjXtLQIOyiEI3S6XKygU_RJw54MXxJ5i-vDtGOBpTW1KA-XeFQCPIeDuiBjLBDREASWux7Mn42OL3-QVnjoac_V72z3yh-ubeJqlTakcv9mv8RAAD__9AVXRs">