<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/121813>121813</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [RISCV][AArch64] llvm has 9-12% regression compared to gcc for spec2017/500.perlbench_r

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            backend:AArch64,

            backend:RISC-V

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          michaelmaitland

      </td>

    </tr>

</table>

<pre>

    I've been looking into this for some time now and I wanted to file an issue hoping that (a) others are seeing the same problem, (b) we can discuss on how to close this gap, and (c) see if any other targets have some insights on prior work that may help here.

| Comparison | Regression (%) |

|------------------|---------------|

| LLVM No Vec vs GCC No Vec | 12.57 |

| LLVM No Vec vs GCC Vec       | 12.19 |

| LLVM Vec vs GCC No Vec       | 9.72  |

| LLVM Vec vs GCC Vec             | 9.32    |

It looks like there is a common scalar related regression. These numbers are at O3 with LTO enabled. I know that this regression is visible in both the qemu dynamic instruction count and on hardware. I know that it impacts both in order and out of order RISC-V cores. As per a talk at the 2021 LLVM Dev Meeting, it looks like this issue also exists on AArch64 [see slide 3](https://llvm.org/devmtg/2021-11/slides/2021-ClangvsGCCForSPECOnAArch64.pdf). I'm not sure if the regression is present on other targets.

The S_regmatch function cycle count on LLVM is far behind the cycle count on GCC. In this function, the number of dynamic stack spills and reloads is much higher (over 50% higher) on LLVM than on GCC. The static number of spills and reloads is relatively similar. In this function, the number of dynamic branches are relatively similar, but there are 34% more dynamic jumps.

The issue solved by https://github.com/llvm/llvm-project/pull/90819 helps close the performance gap by a few percents, but there is is a significant way to go. I have run many other experiments that have ruled out what the issue is, and could chat about them in a call or add follow up comments.

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyUVttu2zgQ_Rr6ZRBBoqxYfvBDqqyLAO120RZ5LShqLLLmRUtSdv33i5HkXBcFagSIRM6NM-ccSsSoe4e4Y9UHVt2vxJiUDzurpRJorNDJCNetWt9ddg-Mb04ILaID4_1Rux60Sx6S0hEOPkD0FiFpi-D8GYTr4AHOwiXsIHk4aIMgHOgYRwTlBwqQlEjAeC0Y34JPCkMEERAi4ryNEIVFGIJvDVrGG7JuyfqMIIWDTkc5xgjegfJnSiSNjzhX1YuBXKgUxmtJbhER9AGEu8z5IInQY4qgxAnnI2gXda_SFHMI2gc4-3Cca7XiAgrNAAoDZiy_o79NA423gwg6egf0-hX7gDFqeuU14xWlZptmtr5593u_-GQMnz49foa_PTyihFOEj01zfaPdgmfVBn5rTY_zb3Eotm8d3sd-dthmGw6_cXi2fulT8uV57tFDmkATwegjDQcDgo4gQHprvYMohREBAhpBeAlP_cvgu8KI4EbbXtEhEnwp4ayTgk_fvwA60RrsMniAIyFvGtQ0_-cwlOyko24NzRdan9SErn_RjtBdnLBa0uBTGGUie-lHlybkELJE6M4i4OsUOoG2g5ApzvG0Ax86DLPXmMAfloWvD9-am0eQPmDM4C7CQFaQhDnCVCsCz3kxt_YeT_AZMWnXE3j1m87puFBImOgBf-k4I_XuLkh1uwZWfSCMR6M7hJJV94zXKqUhsvKO8T3je2NONvOhZ3zf4ckmeqD0N0XB-H5yjNelxgjXn-LHptn78O2fv5ovbkmUDd2B8W0GJAwWnE8QxzCRi87zuvVDwIguUZ2vWLcw6LtC-PYjYG9FkgoOo1uGcJEGl1F4N7eHxEYEaFFp102p3lh9bJoMHtyiS0so6iTZzjCiyVyHHpOQR4iDNiZOkwtovOioy2BHqUDpnipmvPYnDFDljFfL4qRaS11JCfeUng4Uk0havsj4_zkmyOsTmgtEbbUR4U-qb4NwUuHMi_ehyLEd00I4sinXVL71AZ9i_Bzt8HISM7yiNyfsoL3Aa_T0OqmxzaS3C5SWfzdD8D9RJsb3w2gM4_ttXhfbSS3jkygjQf_ggxVOIukzJRBwwDNtSHQpvq55gjsIoItKH7QULsFZXEjoe098nHQ7jA7ss6bjrwGDthRtpupiZHAm5lktrJuPquP1lpB-NB1I2hatn4uwRGwBUhgDPoDoOjh4Y_wZxmGSL0qTrbpd2W3LrVjhrtiUt0VdV2W5UjsULW6k3G5LXnfr_HbTymJbHzrebroi57jSO57zKi_y26IuqzLP6rquS1yXxe26zYvNlq1ztEKb7Erc1VT2ruBFXZQrI1o0cbrCOW-FPKLrWHm30JRxznjzamdWI9qo7ldhNw2vHfvI1rkhNXnOk3Qy08cBuTySllQfrnGreyA7UCLC9qbgBKsXpJfThThf_b2U8wfCgJLnxYbxfZXn2YDBtOik-hFWYzC7P8bZ1AaSqqUTpx3_LwAA__9yieHh">