<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/133733>133733</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[MLIR][NVVM] Why not support wgmma.mma_async with A as register fragments, instead of smem
</td>
</tr>
<tr>
<th>Labels</th>
<td>
mlir
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
deciding
</td>
</tr>
</table>
<pre>
Currently can only pass A_desc, however, this might cause write and reread from smem in some cases.
ref: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-warpgroup-level-matrix-instructions-wgmma-mma
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJxU0c9q5DgQBvCnkS-FjVuy1snBh05Cw8JmD3PIHENZKtsa9MeopO702w9uAsOcpINU1Pf9kNmtkWgS-kXotwZr2VKeLBlnXVybOdn79Fpzplj8HQxGSNHfYUdmOH9aYiPkK2zpRlfKx7VsjiG4dStgsDLBLbtCgNFCpkxoYckpAAcK4CJwCgQGmbgT_Vn050yLUGfYStlZqLOQFyEvNhnu4tVZh51JQciLqRaFvOyY0XvybdmO2S19kanFpSjkxUVLX91WghdSId-j2XKKqXJ7w7yvOdW99XQl3wYs2X21LnLJ1Rzfub2tIWAbAkJjJ2Wf1TM2NJ3GQQ1Kaf3UbNMgFc4DadmjIbLDYkdl_8F-mWX_pCw2bpK91L1Sp5PutR47M-txtPSs0Sxjb3sx9BTQ-c77a-hSXhvHXGk6KTUq1XicyfPDRsrgXRZSHkp5Ot63c11ZDL13XPjPhOKKf3i-__fvD6HfhH75_-PjXeg3-LndIaYCXPc95QKPjF0I-PmoB26ubHAGZMi0Oi6UYcm4BoqFD9ujoEMwLQ-_pmY__Q21urLV-dvoWOn7aPecfpEpB8uRkIW8fIe8TvJ3AAAA___5E9iR">