[llvm] [LLVM][NVPTX] Add NVPTX codegen support for clusterlaunchcontrol instruction (PR #134568)

via llvm-commits llvm-commits at lists.llvm.org
Sun Apr 6 19:37:20 PDT 2025


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-nvptx

Author: Pradeep Kumar (schwarzschild-radius)

<details>
<summary>Changes</summary>

This commit adds NVPTX codegen support for clusterlaunchcontrol instructions with tests under clusterlaunchcontrol.ll and clusterlaunchcontrol-multicast.ll. For more information, Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel)

---

Patch is 34.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134568.diff


11 Files Affected:

- (modified) llvm/docs/NVPTXUsage.rst (+98) 
- (modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+46) 
- (modified) llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp (+40) 
- (modified) llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h (+1) 
- (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+121-1) 
- (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.h (+3) 
- (modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+1) 
- (modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+63) 
- (modified) llvm/lib/Target/NVPTX/NVPTXSubtarget.h (+14) 
- (added) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll (+50) 
- (added) llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll (+140) 


``````````diff
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 621879fc5648b..7bbd18c71fbeb 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -1478,6 +1478,104 @@ similar but the latter uses generic addressing (see `Generic Addressing <https:/
 
 For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk>`__.
 
+
+clusterlaunchcontrol Intrinsics
+-------------------------------
+
+'``llvm.nvvm.clusterlaunchcontrol.try_cancel*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async(ptr %addr, ptr %mbar)
+  declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
+  declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast(ptr %addr, ptr %mbar)
+  declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
+
+Overview:
+"""""""""
+
+The ``clusterlaunchcontrol.try_cancel`` intrinsics requests atomically cancelling
+the launch of a cluster that has not started running yet. It asynchronously writes
+a 16-byte opaque response to shared memory, pointed to by ``addr`` indicating whether the
+operation succeeded or failed. ``addr`` and ``mbar`` must be in ``shared::cta``
+otherwise the result is undefined. The completion of the asynchronous operation
+is tracked using the mbarrier completion mechanism at ``.cluster`` scope referenced
+by the shared memory pointer, ``mbar``. On success, the opaque response contains
+the CTA id of the first CTA of the canceled cluster; no other successful response
+from other ``clusterlaunchcontrol.try_cancel`` operations from the same grid will
+contain that id.
+
+The ``multicast`` variant specifies that the response is asynchronously written to
+the corresponding shared memory location of each CTA in the requesting cluster.
+The completion of the write of each local response is tracked by independent
+mbarriers at the corresponding shared memory location of each CTA in the
+cluster.
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel>`__.
+
+'``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare i1 @llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled(i128 %try_cancel_response)
+
+Overview:
+"""""""""
+
+The ``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled`` intrinsic can be
+used to decode the opaque response written by the
+``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
+
+The intrinsic returns false if the request failed. If the request succeeded,
+it returns true. A true result indicates that:
+
+- the thread block cluster whose first CTA id matches that of the response
+  handle will not run
+- no other successful response of another ``try_cancel`` request will contain
+  the first CTA id of that cluster
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
+
+
+'``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare <4 x i32> @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid(i128 %try_cancel_response)
+  declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.x(i128 %try_cancel_response)
+  declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.y(i128 %try_cancel_response)
+  declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.z(i128 %try_cancel_response)
+
+Overview:
+"""""""""
+
+The ``clusterlaunchcontrol.query_cancel.get_first_ctaid`` intrinsic can be
+used to decode the opaque response written by the
+``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
+
+If the request succeeded,
+``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.{x,y,z}`` returns
+the coordinate of the first CTA in the canceled cluster, either x, y, or z.
+``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid`` returns a vector
+where the first three components are the x, y, z coordinates of the first CTA.
+The contents of the fourth element are unspecified
+
+If the request failed, the behavior of these intrinsics is undefined.
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
+
 Other Intrinsics
 ----------------
 
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 3e9588a515c9e..12d98229eba02 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5381,4 +5381,50 @@ def int_nvvm_st_bulk_shared_cta : DefaultAttrsIntrinsic<[],
   [IntrArgMemOnly, IntrWriteMem,
     WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>]>;
 
+//
+// Cluster launch control
+//
+
+// clusterlaunchcontrol.try_cancel
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async
+    : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
+                [IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+                "llvm.nvvm.clusterlaunchcontrol.try_cancel.async">;
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async_shared
+    : Intrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
+                [IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+                "llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared">;
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast
+    : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
+                [IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+                "llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast">;
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared
+    : Intrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
+                [IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+                "llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared">;
+
+// clusterlaunchcontrol.query_cancel.is_canceled
+
+def int_nvvm_clusterlaunchcontrol_query_cancel_is_canceled
+    : Intrinsic<[llvm_i1_ty], [llvm_i128_ty], [],
+                "llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled">;
+
+// clusterlaunchcontrol.query_cancel.get_first_ctaid
+
+def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid
+    : Intrinsic<[llvm_v4i32_ty], [llvm_i128_ty], [],
+                "llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid">;
+
+foreach dim = ["x", "y", "z"] in {
+
+def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_ # dim
+    : Intrinsic<[llvm_i32_ty], [llvm_i128_ty], [],
+                "llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid." # dim>;
+}
+
+
 } // let TargetPrefix = "nvvm"
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
index ec1f969494cd1..36dff04f01413 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
@@ -312,6 +312,37 @@ void NVPTXDAGToDAGISel::SelectTcgen05Ld(SDNode *N, bool hasOffset) {
   }
 }
 
+void NVPTXDAGToDAGISel::SelectClusterLaunchControl(SDNode *N) {
+  SDLoc DL(N);
+  unsigned IID = cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
+  unsigned Opcode;
+  switch (IID) {
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+    Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED;
+    break;
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
+    Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID;
+    break;
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+    Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_x;
+    break;
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+    Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_y;
+    break;
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
+    Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_z;
+    break;
+  }
+
+  SDValue Operands[] = {
+      N->getOperand(2), // TryCancelResponse_0
+      N->getOperand(3), // TryCancelResponse_1
+      N->getOperand(0), // Chain
+  };
+
+  ReplaceNode(N, CurDAG->getMachineNode(Opcode, DL, N->getVTList(), Operands));
+}
+
 bool NVPTXDAGToDAGISel::tryIntrinsicChain(SDNode *N) {
   unsigned IID = N->getConstantOperandVal(1);
   switch (IID) {
@@ -366,6 +397,15 @@ bool NVPTXDAGToDAGISel::tryIntrinsicChain(SDNode *N) {
     SelectTcgen05Ld(N, /* hasOffset */ true);
     return true;
   }
+
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z: {
+    SelectClusterLaunchControl(N);
+    return true;
+  }
   }
 }
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
index 23cbd458571a0..f634276c498b7 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
@@ -102,6 +102,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXDAGToDAGISel : public SelectionDAGISel {
                                            bool IsIm2Col = false);
   void SelectTcgen05Ld(SDNode *N, bool hasOffset = false);
   void SelectTcgen05St(SDNode *N, bool hasOffset = false);
+  void SelectClusterLaunchControl(SDNode *N);
 
   inline SDValue getI32Imm(unsigned Imm, const SDLoc &DL) {
     return CurDAG->getTargetConstant(Imm, DL, MVT::i32);
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index b566cdd4b6bfc..cae193e5d8595 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -1017,6 +1017,9 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
                      {MVT::v2i32, MVT::v4i32, MVT::v8i32, MVT::v16i32,
                       MVT::v32i32, MVT::v64i32, MVT::v128i32},
                      Custom);
+
+  // Enable custom lowering for i128 bit type supported in PTX
+  setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i128, Custom);
 }
 
 const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
@@ -1165,6 +1168,54 @@ NVPTXTargetLowering::LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const {
   return DAG.getNode(NVPTXISD::Wrapper, dl, PtrVT, Op);
 }
 
+SDValue NVPTXTargetLowering::LowerIntrinsicWChain(SDValue Op,
+                                                  SelectionDAG &DAG) const {
+  SDNode *N = Op.getNode();
+  SDValue Intrin = N->getOperand(1);
+  SDLoc DL(N);
+
+  // Get the intrinsic ID
+  unsigned IntrinNo = cast<ConstantSDNode>(Intrin.getNode())->getZExtValue();
+  switch (IntrinNo) {
+  default:
+    break;
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z: {
+
+    if (N->getOperand(2).getValueType() != MVT::i128) {
+      // return, if the operand is already lowered
+      return SDValue();
+    }
+
+    SDLoc DL(N);
+    SmallVector<SDValue, 8> Ops;
+
+    Ops.push_back(N->getOperand(0)); // Chain
+    Ops.push_back(N->getOperand(1)); // Intrinsic
+
+    SDValue TryCancelResponse = N->getOperand(2);
+    SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TryCancelResponse);
+    SDValue TryCancelResponse0 =
+        DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+                    DAG.getIntPtrConstant(0, DL));
+    SDValue TryCancelResponse1 =
+        DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+                    DAG.getIntPtrConstant(1, DL));
+
+    Ops.push_back(TryCancelResponse0);
+    Ops.push_back(TryCancelResponse1);
+
+    MemIntrinsicSDNode *MemSD = cast<MemIntrinsicSDNode>(N);
+    return DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, DL, N->getVTList(),
+                                   Ops, MemSD->getMemoryVT(),
+                                   MemSD->getMemOperand());
+  }
+  }
+  return Op;
+}
+
 static bool IsTypePassedAsArray(const Type *Ty) {
   return Ty->isAggregateType() || Ty->isVectorTy() || Ty->isIntegerTy(128) ||
          Ty->isHalfTy() || Ty->isBFloatTy();
@@ -2862,7 +2913,7 @@ NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
   case ISD::GlobalAddress:
     return LowerGlobalAddress(Op, DAG);
   case ISD::INTRINSIC_W_CHAIN:
-    return Op;
+    return LowerIntrinsicWChain(Op, DAG);
   case ISD::INTRINSIC_VOID:
     return LowerIntrinsicVoid(Op, DAG);
   case ISD::BUILD_VECTOR:
@@ -4724,6 +4775,21 @@ bool NVPTXTargetLowering::getTgtMemIntrinsic(
     Info.align.reset();
     return true;
   }
+
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z: {
+    auto &DL = I.getModule()->getDataLayout();
+    Info.opc = ISD::INTRINSIC_W_CHAIN;
+    Info.memVT = getValueType(DL, I.getType());
+    Info.ptrVal = nullptr;
+    Info.offset = 0;
+    Info.flags = MachineMemOperand::MOLoad;
+    Info.align.reset();
+    return true;
+  }
   }
   return false;
 }
@@ -6036,6 +6102,60 @@ static void ReplaceINTRINSIC_W_CHAIN(SDNode *N, SelectionDAG &DAG,
   case Intrinsic::nvvm_tcgen05_ld_16x32bx2_x64:
   case Intrinsic::nvvm_tcgen05_ld_16x32bx2_x128:
     return ReplaceTcgen05Ld(N, DAG, Results, /* Offset */ true);
+
+  case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid: {
+    // The intrinsic returns the CTAID of x, y and z dimension as a v4i32 value
+    EVT ResVT = N->getValueType(0);
+    if (!ResVT.isVector())
+      return; // already legalized.
+
+    const unsigned NumElts = ResVT.getVectorNumElements(); // v4i32
+
+    // Create the return type of the instructions
+    SmallVector<EVT, 5> ListVTs;
+    for (unsigned i = 0; i < NumElts; ++i)
+      ListVTs.push_back(MVT::i32);
+    ListVTs.push_back(MVT::Other);
+
+    SDVTList ResVTs = DAG.getVTList(ListVTs);
+
+    SmallVector<SDValue, 8> Ops;
+    // Add Chain and Intrinsic ID
+    Ops.push_back(N->getOperand(0)); // Chain
+    Ops.push_back(N->getOperand(1)); // Intrinsic ID
+
+    SDValue TryCancelResponse = N->getOperand(2); // i128 operand
+    // Cast i128 to v2i64 and split into two i64
+    SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TryCancelResponse);
+    SDValue TryCancelResponse_0 =
+        DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+                    DAG.getIntPtrConstant(0, DL));
+    SDValue TryCancelResponse_1 =
+        DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+                    DAG.getIntPtrConstant(1, DL));
+
+    Ops.push_back(TryCancelResponse_0);
+    Ops.push_back(TryCancelResponse_1);
+
+    MemIntrinsicSDNode *MemSD = cast<MemIntrinsicSDNode>(N);
+    // Create a new Intrinsic Node with 2 x i64 operands
+    SDValue NewNode =
+        DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, DL, ResVTs, Ops,
+                                MemSD->getMemoryVT(), MemSD->getMemOperand());
+
+    // Scalarize the vector results
+    SmallVector<SDValue, 4> ScalarRes;
+    for (unsigned i = 0; i < NumElts; ++i) {
+      SDValue Res = NewNode.getValue(i);
+      ScalarRes.push_back(Res);
+    }
+
+    SDValue Chain = NewNode.getValue(NumElts); // v4i32 value
+    SDValue BuildVector = DAG.getNode(ISD::BUILD_VECTOR, DL, ResVT, ScalarRes);
+    Results.push_back(BuildVector); // Build Vector
+    Results.push_back(Chain);       // Chain
+    return;
+  }
   }
 }
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 39470be254efa..723e6defa8327 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -329,6 +329,9 @@ class NVPTXTargetLowering : public TargetLowering {
   SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;
 
   SDValue LowerCopyToReg_128(SDValue Op, SelectionDAG &DAG) const;
+
+  SDValue LowerIntrinsicWChain(SDValue Op, SelectionDAG &DAG) const;
+
   unsigned getNumRegisters(LLVMContext &Context, EVT VT,
                            std::optional<MVT> RegisterVT) const override;
   bool
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index 7d0c47fa464c5..29d19c40a6f59 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -160,6 +160,7 @@ def hasHWROT32 : Predicate<"Subtarget->hasHWROT32()">;
 def noHWROT32 : Predicate<"!Subtarget->hasHWROT32()">;
 def hasDotInstructions : Predicate<"Subtarget->hasDotInstructions()">;
 def hasTcgen05Instructions : Predicate<"Subtarget->hasTcgen05Instructions()">;
+def hasBlackwellArch : Predicate<"Subtarget->hasBlackwellArch()">;
 
 def True : Predicate<"true">;
 def False : Predicate<"false">;
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 34cb63e44ca71..3f46c8c1a64ba 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -7301,3 +7301,66 @@ def INT_NVVM_ST_BULK_SHARED_CTA:
             "st.bulk.shared::cta [$dest_addr], $size, 0;",
             [(int_nvvm_st_bulk_shared_cta addr:$dest_addr, i64:$size, (i64 0))]>,
             Requires<[hasSM<100>, hasPTX<86>]>;
+
+// clusterlaunchcontrol Instructions
+
+multiclass CLUSTERLAUNCHCONTROL_TRY_CANCEL<Intrinsic Intrin, bit Multicast, list<Predicate> Pred> {
+  defvar Intrinsic = !cast<Intrinsic>(Intrin # !if(!eq(Multicast, 1), "_multicast", ""));
+  defvar IntrinsicShared = !cast<Intrinsic>(Intrin # !if(!eq(Multicast, 1), "_multicast", "") # "_shared");
+
+  def _gen: NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$mbar),
+                      "clusterlaunchcontrol.try_cancel.async.mbarrier::complete_tx::bytes" #
+                      !if(!eq(Multicast, 1), ".multicast::cluster::all", "") #
+                      ".b128 [$addr], [$mbar];",
+                      [(Intrinsic Int64Regs:$addr, Int64Regs:$mbar)]>, Requires<Pred>;
+
+  def _shared: NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$mbar),
+                         "clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
+                         !if(!eq(Multicast, 1), ".multicast::cluster::all", "") #
+                         ".b128 [$addr], [$mbar];",
+                         [(IntrinsicShared Int64Regs:$addr, Int64Regs:$mbar)]>, Requires<Pred>;
+
+  def _shared32: NVPTXInst<(outs), (ins Int32Regs:$addr, Int32Regs:$mbar),
+                           "clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
+                           !if(!eq(Multicast, 1), ".multicast::cluster::all", "") #
+                           ".b128 [$addr], [$mbar];",
+                           [(IntrinsicShared Int32Regs:$addr, Int32Regs:$mbar)]>, Requires<Pred>;
+}
+
+defm CLUSTERLAUNCHCONTRL_TRY_CANCEL :
+      CLUSTERLAUNCHCONTROL_TRY_CANCEL<
+        int_nvvm_clusterlaunchcontrol_try_cancel_async, 0, [hasSM<100>, hasPTX<86>]>;
+
+defm CLUSTERLAUNCHCONTRL_TRY_CANCEL_MULTICAST :
+      CLUSTERLAUNCHCONTROL_TRY_CANCEL<
+        int_nvvm_clusterlaunchcontrol_try_cance...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/134568


More information about the llvm-commits mailing list