<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56387>56387</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [OpenMP][Offload] Incorrect behavior with external device code
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ZwFink
      </td>
    </tr>
</table>

<pre>
    ## Description 
When using Clang/LLVM on the `release/14.x` branch, external functions declared with `omp declare target` all return the same thread number 0 from `omp_get_thread_num`. The behavior works as expected in `release/13.x`.

Using git bisect, we have found that the bug was introduced in https://github.com/llvm/llvm-project/commit/423d34f74a10ae122a67a18a76c2ead6e26924eb.

## Minimal Example 
`head.cpp`
```c++
#include "head.h"
#include <omp.h>
#include <stdio.h>


#pragma omp declare target
int external_dev_fn(int a)
{
  int res = a+2;
  printf("I am thread %d\n", omp_get_thread_num());
  return res;
}
#pragma omp end declare target
```

`head.h`
```c++
#ifndef HEAD_HH_INCLUDED
#define HEAD_HH_INCLUDED
#include <omp.h>

#pragma omp declare target
int external_dev_fn(int a);
#pragma omp end declare target

#endif // HEAD_HH_INCLUDED
```

`driver.cpp`
```c++
#include "head.h"
#include <stdio.h>
#include <omp.h>

int main(){
  int N = 10;
  int rs = 0;
#pragma omp target data map(tofrom:N,rs)
  {

#pragma omp target parallel for reduction(+:rs)
  for(int i = 0; i < N; i++)
    {
      printf("I am thread %d\n", omp_get_thread_num());
      rs += external_dev_fn(i);
    }
  }

  printf("End result is %d\n", rs);
}

```

We compile the above example for an NVIDIA V100 GPU with the following:
```
clang++ -Xopenmp-target -march=sm_70 -fopenmp -fopenmp-targets=nvptx64 ./driver.cpp -O1 ./head.cpp
```

The result produced by `release/14.x` is as follows:
```
I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
I am thread 0
End result is 65
```

Whereas the correct result produced by `release/13.x` is:
```
I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
I am thread 0
I am thread 1
I am thread 2
I am thread 3
I am thread 4
I am thread 5
I am thread 6
I am thread 7
I am thread 8
I am thread 9
End result is 65
```

The issue is that within the function `external_dev_function`, we expect that the return value for each thread that calls `omp_get_thread_num` matches the return value for each thread that calls `omp_get_thread_num` in the main function. If we include `main.cpp` within `driver.cpp`, the program works as expected.


### Environment Details
- Clang/LLVM were built with GCC 8.3.1
- CUDA 10.1 is used 
- We have also verified this behavior on a different machine where Clang/LLVM were built with GCC 9.4.0 and CUDA 11.4.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzlV1tv4joQ_jXhxSJKnEDggYeW0C1St7sPp-3ReUEmmRCfdZzITqD77884N6CB7Vbqw0oHRRBm7PFcvrl4m8c_Fxb18CEh6EjxouS5JJYTWs7NSwqSVJrLHVkKJncWvXt4eP5KcEGZArGmjgIBTAMyXN9-RQLZKiaj1KJLAq8lKMkESSoZGamaxBAJpiAmB16mZn-eFR2RlEztoDQymBBEQVmp5hzNMuSmClhMZJVtQRGHJCrPWgkb3LZp-BvkI9Emf-G-LaRsz3NFDrn6oQnTqFIBUYnnc_lGe6_W3m7sbr6fasN3vCRbrnGbsekABGUCSfJKxqgTK2sNt9WOHFA-l6XK4ypqTkjLstCWd4Py8UFBabW1ozzDP0Lsu59xofJ_a_F3yMu4efGpF3t-EvjMdRi4lLJpwNwZC6YRRSunQKdz6sP2TN82jF-55Bl6ffXKskJAG0o0LsWddlQU-NrTmiey6K15OjFcRqKKcSul9SYMJx0yvSX6Hnne6hJPlzHPT7nnmhaK7TJGLsS_XoGO7PGziWG_SaRFZ4bKLDpvpQStxsT4HQGj8dzQLLilltfzCoXcBHejEWvCsg5JFp3E1mQpjW0Y2Qs4Mlvm5jkKa1GJZ_VEKwgvWQWIj0uW9U4_c0gbnfQ3YpPIGBJyv7oJN_f3m_Xj8uEpXB1VQCaXcJ1_PXyfExrv9kPe6NbiAp6QJlOuKX_Fc7Hie1Cfi-wBet91nHFAxrhsUXOOzccama5zgqQasg1inStOaxxFYlYyFF2g5DI3ZQ9LyiNCVuk-FQjpD_yFoIIprKyABRlrIlbhqq7KtcK3KPNMHi5po8p7HevXJXmsX1vHHnec6EDqz6cmnvkYdxlNw0v4e7u8T8uT10slYYXIxHSuBFqq3-rWuGSY6dfx-AIEq3jBBdSNgW1z7BbQlmLjdybJ4_M6XN-QZ9dxyJfvT00zNKuTXIj8gF3HNI1LZ0RNFzaeJ-O_8wJkVozb6I4zprDxeqHONoFDxknD7l_adVi4QrkvytepT2xMtmP2kPE3tyb1jeK6maa9tl4ruo63_Xl5IuB1621s09csOwXIBZI7JNEhyRuS_CFpMiRNh6RgSJoNSfPf0v4PJ52nwHTyK3ingDt1jdYoVwrnlvdx4HU4-F8G_w_X_gPBN0nPta7MdzP7msrFmyG9G_JN7M_Lc8swwpoJupnCj9NzO1TtmaiaGgksSjs161UR9i19ddrH5lhGKejPEdbaY3p5b5RN1onRvJ8Bpo7htzNH54bBLILmGlGYGTuFjh9cQ87H95Mh3szxK7nnKpcZYA8OoWRc6GbJ-PwydsCMxAsIF004yJflksxsz3b71U_hDc4etmvCVmnM0I7z0t5mmNA5Qb15wsE4Cdf1dyeMKCM4myV4TD3iRKmZLw-mELyvyNz2bbzPIcYaLVz8P4oXXjz35mxU8lLAwprcfsMO9fW7NQnNe5KInGEXDsladjXmeJUzkvubJeKLR6YSxTCqlFh8-MZVwxmb_N1k6s2CUboIgsihMd06wKZziN3EYfGMzd0JxB6bB85IsC0IbZTGGUHCockIMy9MwhFfUIdSJ3Amru973sR2PQdvclHksyiZA40t3wFEjrCNHnaudiO1qFXCG6RGpuC61Ecm05rvJNQ-MvJZVaa5WvxzuOPyx6g-eVFr_h85Tq_l">