<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/72368>72368</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            NNVM Dialect NVVM_CpAsyncBulkTensorGlobalToSharedClusterOp is missing the `multicast` operand
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            mlir:gpu,
            mlir:nvgpu
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          joker-eph
      </td>
    </tr>
</table>

<pre>
    See: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor

Seems like https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td#L1402-L1433 is missing the operand?


</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUkUtr4zwUhn_N8cbIyJJz6cKLtMEfH6QtTEu2w5F1EquRJaNLaP_9YLclQ3ezkeDowvM-L8Zozo6ohdU9rPYF5jT40L75CwVG01Aorz_aFyKQu3JIaYogdyA6EJ32fazc1WiDVe9HEF2fNYLoJgxoLVmWhkCoGb1Tn5PxDkRnnKb3akijBSE1JmSjv9JILjF0mvXeXSlE4x0zLqaQ-_ldZP3EMH64nqlsLyyRiz4A3wPffa4vRGMsrbnQD8izSUNWX3zWXr83NgX_Rn0C0SnrFYhuRDMDjtaEhbO3WdNtsDdoP-8fDsfH_3-B6J6Ox8fnKVZJg5CHuuGCHepGytLEcjQxGncu00Clnyig0yC7v5EL3Up9J--woLbecM4bsRV3xdDy01ZJqZA3q7WWp4bPh6ctngTSZkN1YVrBhazrelVzvuHbaqNIqx4VR6nliZ-g4TSisdWctPLhXJgYM7UbIdfbwqIiG5fChVjSyd15yiAEiIfbyF2_hqt9EdrFmcrnCA23JqZ4-zuZZKl9ejo-ll-SytnM74dpN1d2n-3ldSnsP-sV2lf_MmAg_WBzTBSep5-6YM3HbJPpMSZY8299RQ62_edyl-ARRLdk_xMAAP__FYz1Vw">