<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/72368>72368</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
NNVM Dialect NVVM_CpAsyncBulkTensorGlobalToSharedClusterOp is missing the `multicast` operand
</td>
</tr>
<tr>
<th>Labels</th>
<td>
mlir:gpu,
mlir:nvgpu
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
joker-eph
</td>
</tr>
</table>
<pre>
See: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor
Seems like https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td#L1402-L1433 is missing the operand?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUkUtr4zwUhn_N8cbIyJJz6cKLtMEfH6QtTEu2w5F1EquRJaNLaP_9YLclQ3ezkeDowvM-L8Zozo6ohdU9rPYF5jT40L75CwVG01Aorz_aFyKQu3JIaYogdyA6EJ32fazc1WiDVe9HEF2fNYLoJgxoLVmWhkCoGb1Tn5PxDkRnnKb3akijBSE1JmSjv9JILjF0mvXeXSlE4x0zLqaQ-_ldZP3EMH64nqlsLyyRiz4A3wPffa4vRGMsrbnQD8izSUNWX3zWXr83NgX_Rn0C0SnrFYhuRDMDjtaEhbO3WdNtsDdoP-8fDsfH_3-B6J6Ox8fnKVZJg5CHuuGCHepGytLEcjQxGncu00Clnyig0yC7v5EL3Up9J--woLbecM4bsRV3xdDy01ZJqZA3q7WWp4bPh6ctngTSZkN1YVrBhazrelVzvuHbaqNIqx4VR6nliZ-g4TSisdWctPLhXJgYM7UbIdfbwqIiG5fChVjSyd15yiAEiIfbyF2_hqt9EdrFmcrnCA23JqZ4-zuZZKl9ejo-ll-SytnM74dpN1d2n-3ldSnsP-sV2lf_MmAg_WBzTBSep5-6YM3HbJPpMSZY8299RQ62_edyl-ARRLdk_xMAAP__FYz1Vw">