<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/133733>133733</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [MLIR][NVVM] Why not support wgmma.mma_async with A as register fragments, instead of smem

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            mlir

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          deciding

      </td>

    </tr>

</table>

<pre>

    Currently can only pass A_desc, however, this might cause write and reread from smem in some cases.

ref: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-matrix-instructions-wgmma-mma 

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJxU0c9q5DgQBvCnkS-FjVuy1snBh05Cw8JmD3PIHENZKtsa9MeopO702w9uAsOcpINU1Pf9kNmtkWgS-kXotwZr2VKeLBlnXVybOdn79Fpzplj8HQxGSNHfYUdmOH9aYiPkK2zpRlfKx7VsjiG4dStgsDLBLbtCgNFCpkxoYckpAAcK4CJwCgQGmbgT_Vn050yLUGfYStlZqLOQFyEvNhnu4tVZh51JQciLqRaFvOyY0XvybdmO2S19kanFpSjkxUVLX91WghdSId-j2XKKqXJ7w7yvOdW99XQl3wYs2X21LnLJ1Rzfub2tIWAbAkJjJ2Wf1TM2NJ3GQQ1Kaf3UbNMgFc4DadmjIbLDYkdl_8F-mWX_pCw2bpK91L1Sp5PutR47M-txtPSs0Sxjb3sx9BTQ-c77a-hSXhvHXGk6KTUq1XicyfPDRsrgXRZSHkp5Ot63c11ZDL13XPjPhOKKf3i-__fvD6HfhH75_-PjXeg3-LndIaYCXPc95QKPjF0I-PmoB26ubHAGZMi0Oi6UYcm4BoqFD9ujoEMwLQ-_pmY__Q21urLV-dvoWOn7aPecfpEpB8uRkIW8fIe8TvJ3AAAA___5E9iR">