<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56702>56702</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Possible problems with the AMDGPU backend: Significant performance drops in generated batched matrix multiplication kernels
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ravil-mobile
      </td>
    </tr>
</table>

<pre>
    Hi everyone,

I was testing some batched matrix multiplication kernels using HIP from ROCm 4.5.2, 5.1.0 and 5.1.3 suites. My target hardware is Mi200 i.e., gfx90a. The kernels themselves were generated with a python script. The multiplications took the following form: $A^{56, k} \cdot B^{k, 9}$, where $k \in [56, 55, 54, 53, 52, 51, 50, 49, 48]$. I used the outer product sum approach and unroll all the loops. Here is my obtained results.

![performance-comparison](https://user-images.githubusercontent.com/19637079/180739709-5b2bb1e6-7452-49c1-941e-482e745adcc9.png)

The performance significantly drops when $k \in [55, 53, 51, 49]$. In theory, $k$ is the contraction length which, in my case, determines the bounds of the outer-most loop and thus should not affect performance. However, it influences the total number of instructions.

Here is my source code: 
[kernels.tar.gz](https://github.com/llvm/llvm-project/files/9179647/kernels.tar.gz)

I suspect that there is something wrong with the instruction scheduling. For convenience, I am attaching the intermediate code as well:
[kernel-hip-amdgcn-amd-amdhsa-gfx90a.s.txt](https://github.com/llvm/llvm-project/files/9179650/kernel-hip-amdgcn-amd-amdhsa-gfx90a.s.txt)

Thank you in advance!

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJytVV2PqzYQ_TXkZQQCB0J4yMN-dLv7sOqq7f0BBgZwY2xkm2TTX98ZyL2bXfXhVqoUbIM9njPnzExq214OzwrwhO5iDUbiIUofo_RuHV_gLD0E9EGZHrwdEWoZmgFbGGVw6h3GWQc1adXIoKyBIzqD2sPs2eD55Q06Z0f4_beHEfKkSAQ5gCLJkhSkaZfVFvysyEUCrxcI0vUYYJCuPUuHoDy8KpGmoBJM2Lbv3qtUJvDngD-chQFHj_qEHs5IRj0adDIQyLMKA0iYLmEgcL5xagqr7WfcdIe1R74IOqu1PTP6zrox2t5BJPK7qPglKu-LHUM4RuUjRMVD09oA9-vOkTcq2qDDvDwPDIRejnxSGRqv1kWxjMupYruMKynZMqY85tUy7qOC70vghfikaBienQM6mJxt5yYQcyPIid5kMyyEzsYRfpD08Glt7UTEPuNK5XgBWwepDF3m0BMFPrmVOxIZ4ZzQcejSNBg3dpykU96aBct-CGHyREoknuhHqFysRtmTej1RPdf8pbEmoAkJ2dKZrNpty7SseLlPy21VplVc1KKuM9zFZV6IOK-aLK7yDON8L5A-ybZpqmQyfSSqW3ws3A068Ko3qiMNTdAXaB0Fy8ybr8QXN2RnV4K_U2uYKOsu_JnN6GGqmD0OxMlmSWyNpqdcOg-qGfgo3UxsNtJzyUCLpMpIvK6GtZ1N68F2H5LFo_Vh0WPRibjy4Ac76xYMpZHsOiQ9b4Ij1eyZy3LxFshhp2ekjdVFsEFqMPNYUzqQI2V8cPOC9bOmN9p7O7uGw2pxyev1VHF_raOEii_p__43oVdxr4pqffo-xZR6fxFweu2URk9zlZXVLi9p9eXaz1K-UO76iWMOg-ThCpN7TBi4_M7O8sgVzAHfBEiFTB1o1nQqgSfrWKgTGsXsMFsvIKksQqCi4ItWaxYIW0VtYSEAJDcLrTnEzzzEg5piObZ9Y3jiZ_AyvjYeCuc9_B8UcaU__bzHr4UgzREuduY8lO1JLpFn6-YGD9mu2FOpVWW-aQ_bttpWchNU0Hh4s96rWiO3EJpG_8Hw3evjr2_fqME3RzQtZ8gfH_X1qe7WSiPXH532p_4WNrPTh_9Mm_J-XngrdmUqNsOh7Hb7tGn2jcg7IdKiyAosRFUI0VXVPi83Wtbk7UCSRkIYPMNyBa1JuI06iJSsSlGkVbrLtkmd77CS6VY2qShTKaI8xVEqnTCOxLp-4w4LpHruPW1q5alv_tiUntsQ4uKO7pcz_dm4g5MnpanqaxJ9s_g_LPj_ATkDaMk">