[llvm] [LLVM][NVPTX] Add NVPTX codegen support for clusterlaunchcontrol instruction (PR #134568)
Alex MacLean via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 7 09:19:58 PDT 2025
================
@@ -7301,3 +7301,67 @@ def INT_NVVM_ST_BULK_SHARED_CTA:
"st.bulk.shared::cta [$dest_addr], $size, 0;",
[(int_nvvm_st_bulk_shared_cta addr:$dest_addr, i64:$size, (i64 0))]>,
Requires<[hasSM<100>, hasPTX<86>]>;
+
+// clusterlaunchcontrol Instructions
+
+multiclass CLUSTERLAUNCHCONTROL_TRY_CANCEL<Intrinsic Intrin, bit Multicast, list<Predicate> Pred> {
+ defvar Intrinsic = !cast<Intrinsic>(Intrin # !if(!eq(Multicast, 1), "_multicast", ""));
+ defvar IntrinsicShared = !cast<Intrinsic>(Intrin # !if(!eq(Multicast, 1), "_multicast", "") # "_shared");
+ defvar MulticastMod = ".multicast::cluster::all";
+
+ def _gen: NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$mbar),
+ "clusterlaunchcontrol.try_cancel.async.mbarrier::complete_tx::bytes" #
+ !if(!eq(Multicast, 1), MulticastMod, "") #
+ ".b128 [$addr], [$mbar];",
+ [(Intrinsic Int64Regs:$addr, Int64Regs:$mbar)]>, Requires<Pred>;
+
+ def _shared: NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$mbar),
+ "clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
+ !if(!eq(Multicast, 1), MulticastMod, "") #
+ ".b128 [$addr], [$mbar];",
+ [(IntrinsicShared Int64Regs:$addr, Int64Regs:$mbar)]>, Requires<Pred>;
+
+ def _shared32: NVPTXInst<(outs), (ins Int32Regs:$addr, Int32Regs:$mbar),
+ "clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
+ !if(!eq(Multicast, 1), MulticastMod, "") #
+ ".b128 [$addr], [$mbar];",
----------------
AlexMaclean wrote:
You can use the `addr` pattern and `ADDR` operand to automatically get address folding and avoid the need for 32 and 64 bit variants.
https://github.com/llvm/llvm-project/pull/134568
More information about the llvm-commits
mailing list