<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56063>56063</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [CUDA] Performance regression in CUDA Clang for the RSBench mini-app
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            cuda,
            performance
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          jhuber6
      </td>
    </tr>
</table>

<pre>
    The [RSBench](https//github.com/ANL-CESAR/RSBench.git) mini-application experienced a performance regression when targeting CUDA on my V100 with CUDA 11.6.2. Previously, Clang's performance was roughly on-par with NVCC's with an execution time of about 2.1 seconds on my machine. Following the application of 0af3e6a22da2eda5021b5fad656d0b9db7702e0a the performance has regressed roughly 33% to about 3.1 seconds. Reverting this commit locally gets back the original performance and matches NVCC. This was produced using the following commands. I can provide the IR differences later.

```
$ cd cuda/
$ clang++  --offload-arch=sm_70 -O3 -c main.cu -o main.o
$ clang++  --offload-arch=sm_70 -O3 -c simulation.cu -o simulation.o
$ clang++  --offload-arch=sm_70 -O3 -c io.cu -o io.o
$ clang++  --offload-arch=sm_70 -O3 -c init.cu -o init.o
$ clang++  --offload-arch=sm_70 -O3 -c material.cu -o material.o
$ clang++  --offload-arch=sm_70 -O3 -c utils.cu -o utils.o
$ clang++  --offload-arch=sm_70 -O3 main.o simulation.o io.o init.o material.o utils.o -o rsbench -lm -lcudart
$ nvprof ./rsbench -m event
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJydVE2PozgQ_TXkUgIZE0jnwCGdbEsjjWZHPbNzHRW2Ac8YHNkmvf3vtwykO72aUyQSyqb86tWHX2Pla_29V5CUj8_fHtUo-qQ8JfyhD-HsE_5ET6dDPzWZsAMtDl8-p8e_vh2eyV4PZOSQ8D0MetQpns9GCwzajqD-PSunyUVJQCC7tW5AWoJTnVPeR6eXXo0Q0HUq6LGD4z-nA9D28Ao_csbghYIvm3meVRnP4KtTF20nb14TfoSjwbFL-M5_wH9BD85OXW9eCSw9o1uAvvw4HmfneYWRohLTTDboQYFtARs7BeBZDl4JO0q_shlQ9HpUGTxZY-xL5Bqobrf50mmGbaEq5FwiVxJLxvOmbFFWZSVZs5fNbse4YjifvWXcR8ZLVahaV-5FkfASgl1ZFe-sMnhWF-XCwkN7oPYMOoCxAg2dpHJ6aFD8niNZpzs9ovkQEkdJWQXRKz8XJoPvESjW7uysnGLbJn9NtH1LO0bCmcInEFRDcr5oqWavT88gddsqF7vuwWBQLkvYKWGH9b9i67Ms-RaEBDFJjMP2vre09ZEegDS1bWssyhQdzWdx8sPPHYP07wJSQSnoMRMTpHYx7V0oXg-Tmdu4Yt1s3Ieo7YpExp0Iow5XjGjehzLEHmg0bzVal_eh0WUxfoVa7Dtwlj59KPFcpTXNG47XGDGc802UG0jNQL84Mi68xx4vNIctZDRGb44D0B0Zw_8mbyPrQu6LPW6CDkbVJH1RYEj34OufNUqPiwTNYkM3wc2zvsrfm-5tJmfqRTeLwx-k05jL9ZUS119KkGw-ae8nFZW2rFhVbPo6b-S2fJBlxR92-6aQFeNtU7SVxJ2ssBUbg40yPtJOOF9uDo9SSK-b6x03y9NG15xxzqq8yrfbarvNhNg9qKoQcperfNuwZMsU9cNkkVdmXbdx9UyxmTpPH432wb9_RKpIN6q5ahEfp9BbV_-iNJWrNnM29ZzKf3CCCV4">