<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/133733>133733</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [MLIR][NVVM] Why not support wgmma.mma_async with A as register fragments, instead of smem
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            mlir
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          deciding
      </td>
    </tr>
</table>

<pre>
    Currently can only pass A_desc, however, this might cause write and reread from smem in some cases.

ref: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-matrix-instructions-wgmma-mma 
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJxU0c9q5DgQBvCnkS-FjVuy1snBh05Cw8JmD3PIHENZKtsa9MeopO702w9uAsOcpINU1Pf9kNmtkWgS-kXotwZr2VKeLBlnXVybOdn79Fpzplj8HQxGSNHfYUdmOH9aYiPkK2zpRlfKx7VsjiG4dStgsDLBLbtCgNFCpkxoYckpAAcK4CJwCgQGmbgT_Vn050yLUGfYStlZqLOQFyEvNhnu4tVZh51JQciLqRaFvOyY0XvybdmO2S19kanFpSjkxUVLX91WghdSId-j2XKKqXJ7w7yvOdW99XQl3wYs2X21LnLJ1Rzfub2tIWAbAkJjJ2Wf1TM2NJ3GQQ1Kaf3UbNMgFc4DadmjIbLDYkdl_8F-mWX_pCw2bpK91L1Sp5PutR47M-txtPSs0Sxjb3sx9BTQ-c77a-hSXhvHXGk6KTUq1XicyfPDRsrgXRZSHkp5Ot63c11ZDL13XPjPhOKKf3i-__fvD6HfhH75_-PjXeg3-LndIaYCXPc95QKPjF0I-PmoB26ubHAGZMi0Oi6UYcm4BoqFD9ujoEMwLQ-_pmY__Q21urLV-dvoWOn7aPecfpEpB8uRkIW8fIe8TvJ3AAAA___5E9iR">