[llvm] [LLVM][NVPTX] Add NVPTX codegen support for clusterlaunchcontrol instruction (PR #134568)
Pradeep Kumar via llvm-commits
llvm-commits at lists.llvm.org
Sun Apr 6 19:36:49 PDT 2025
https://github.com/schwarzschild-radius created https://github.com/llvm/llvm-project/pull/134568
This commit adds NVPTX codegen support for clusterlaunchcontrol instructions with tests under clusterlaunchcontrol.ll and clusterlaunchcontrol-multicast.ll. For more information, Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel)
>From 87e3d74ae66e5ae0fd6630cd2bd5639e18bbf11e Mon Sep 17 00:00:00 2001
From: pradeepku <pradeepku at nvidia.com>
Date: Thu, 3 Apr 2025 16:14:17 +0530
Subject: [PATCH] [LLVM][NVPTX] Add NVPTX codegen support for
clusterlaunchcontrol instruction
This commit adds NVPTX codegen support for clusterlaunchcontrol instructions with tests under clusterlaunchcontrol.ll and clusterlaunchcontrol-multicast.ll. For more information, Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel)
---
llvm/docs/NVPTXUsage.rst | 98 ++++++++++++
llvm/include/llvm/IR/IntrinsicsNVVM.td | 46 ++++++
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp | 40 +++++
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h | 1 +
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 122 ++++++++++++++-
llvm/lib/Target/NVPTX/NVPTXISelLowering.h | 3 +
llvm/lib/Target/NVPTX/NVPTXInstrInfo.td | 1 +
llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 63 ++++++++
llvm/lib/Target/NVPTX/NVPTXSubtarget.h | 14 ++
.../NVPTX/clusterlaunchcontrol-multicast.ll | 50 +++++++
.../CodeGen/NVPTX/clusterlaunchcontrol.ll | 140 ++++++++++++++++++
11 files changed, 577 insertions(+), 1 deletion(-)
create mode 100644 llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll
create mode 100644 llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 621879fc5648b..7bbd18c71fbeb 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -1478,6 +1478,104 @@ similar but the latter uses generic addressing (see `Generic Addressing <https:/
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk>`__.
+
+clusterlaunchcontrol Intrinsics
+-------------------------------
+
+'``llvm.nvvm.clusterlaunchcontrol.try_cancel*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async(ptr %addr, ptr %mbar)
+ declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
+ declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast(ptr %addr, ptr %mbar)
+ declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
+
+Overview:
+"""""""""
+
+The ``clusterlaunchcontrol.try_cancel`` intrinsics requests atomically cancelling
+the launch of a cluster that has not started running yet. It asynchronously writes
+a 16-byte opaque response to shared memory, pointed to by ``addr`` indicating whether the
+operation succeeded or failed. ``addr`` and ``mbar`` must be in ``shared::cta``
+otherwise the result is undefined. The completion of the asynchronous operation
+is tracked using the mbarrier completion mechanism at ``.cluster`` scope referenced
+by the shared memory pointer, ``mbar``. On success, the opaque response contains
+the CTA id of the first CTA of the canceled cluster; no other successful response
+from other ``clusterlaunchcontrol.try_cancel`` operations from the same grid will
+contain that id.
+
+The ``multicast`` variant specifies that the response is asynchronously written to
+the corresponding shared memory location of each CTA in the requesting cluster.
+The completion of the write of each local response is tracked by independent
+mbarriers at the corresponding shared memory location of each CTA in the
+cluster.
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel>`__.
+
+'``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare i1 @llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled(i128 %try_cancel_response)
+
+Overview:
+"""""""""
+
+The ``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled`` intrinsic can be
+used to decode the opaque response written by the
+``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
+
+The intrinsic returns false if the request failed. If the request succeeded,
+it returns true. A true result indicates that:
+
+- the thread block cluster whose first CTA id matches that of the response
+ handle will not run
+- no other successful response of another ``try_cancel`` request will contain
+ the first CTA id of that cluster
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
+
+
+'``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare <4 x i32> @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid(i128 %try_cancel_response)
+ declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.x(i128 %try_cancel_response)
+ declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.y(i128 %try_cancel_response)
+ declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.z(i128 %try_cancel_response)
+
+Overview:
+"""""""""
+
+The ``clusterlaunchcontrol.query_cancel.get_first_ctaid`` intrinsic can be
+used to decode the opaque response written by the
+``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
+
+If the request succeeded,
+``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.{x,y,z}`` returns
+the coordinate of the first CTA in the canceled cluster, either x, y, or z.
+``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid`` returns a vector
+where the first three components are the x, y, z coordinates of the first CTA.
+The contents of the fourth element are unspecified
+
+If the request failed, the behavior of these intrinsics is undefined.
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
+
Other Intrinsics
----------------
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 3e9588a515c9e..12d98229eba02 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5381,4 +5381,50 @@ def int_nvvm_st_bulk_shared_cta : DefaultAttrsIntrinsic<[],
[IntrArgMemOnly, IntrWriteMem,
WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>]>;
+//
+// Cluster launch control
+//
+
+// clusterlaunchcontrol.try_cancel
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async
+ : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
+ [IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+ "llvm.nvvm.clusterlaunchcontrol.try_cancel.async">;
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async_shared
+ : Intrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
+ [IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+ "llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared">;
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast
+ : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
+ [IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+ "llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast">;
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared
+ : Intrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
+ [IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+ "llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared">;
+
+// clusterlaunchcontrol.query_cancel.is_canceled
+
+def int_nvvm_clusterlaunchcontrol_query_cancel_is_canceled
+ : Intrinsic<[llvm_i1_ty], [llvm_i128_ty], [],
+ "llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled">;
+
+// clusterlaunchcontrol.query_cancel.get_first_ctaid
+
+def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid
+ : Intrinsic<[llvm_v4i32_ty], [llvm_i128_ty], [],
+ "llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid">;
+
+foreach dim = ["x", "y", "z"] in {
+
+def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_ # dim
+ : Intrinsic<[llvm_i32_ty], [llvm_i128_ty], [],
+ "llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid." # dim>;
+}
+
+
} // let TargetPrefix = "nvvm"
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
index ec1f969494cd1..36dff04f01413 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
@@ -312,6 +312,37 @@ void NVPTXDAGToDAGISel::SelectTcgen05Ld(SDNode *N, bool hasOffset) {
}
}
+void NVPTXDAGToDAGISel::SelectClusterLaunchControl(SDNode *N) {
+ SDLoc DL(N);
+ unsigned IID = cast<ConstantSDNode>(N->getOperand(1))->getZExtValue();
+ unsigned Opcode;
+ switch (IID) {
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+ Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED;
+ break;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
+ Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID;
+ break;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+ Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_x;
+ break;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+ Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_y;
+ break;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
+ Opcode = NVPTX::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_z;
+ break;
+ }
+
+ SDValue Operands[] = {
+ N->getOperand(2), // TryCancelResponse_0
+ N->getOperand(3), // TryCancelResponse_1
+ N->getOperand(0), // Chain
+ };
+
+ ReplaceNode(N, CurDAG->getMachineNode(Opcode, DL, N->getVTList(), Operands));
+}
+
bool NVPTXDAGToDAGISel::tryIntrinsicChain(SDNode *N) {
unsigned IID = N->getConstantOperandVal(1);
switch (IID) {
@@ -366,6 +397,15 @@ bool NVPTXDAGToDAGISel::tryIntrinsicChain(SDNode *N) {
SelectTcgen05Ld(N, /* hasOffset */ true);
return true;
}
+
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z: {
+ SelectClusterLaunchControl(N);
+ return true;
+ }
}
}
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
index 23cbd458571a0..f634276c498b7 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
@@ -102,6 +102,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXDAGToDAGISel : public SelectionDAGISel {
bool IsIm2Col = false);
void SelectTcgen05Ld(SDNode *N, bool hasOffset = false);
void SelectTcgen05St(SDNode *N, bool hasOffset = false);
+ void SelectClusterLaunchControl(SDNode *N);
inline SDValue getI32Imm(unsigned Imm, const SDLoc &DL) {
return CurDAG->getTargetConstant(Imm, DL, MVT::i32);
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index b566cdd4b6bfc..cae193e5d8595 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -1017,6 +1017,9 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
{MVT::v2i32, MVT::v4i32, MVT::v8i32, MVT::v16i32,
MVT::v32i32, MVT::v64i32, MVT::v128i32},
Custom);
+
+ // Enable custom lowering for i128 bit type supported in PTX
+ setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i128, Custom);
}
const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
@@ -1165,6 +1168,54 @@ NVPTXTargetLowering::LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const {
return DAG.getNode(NVPTXISD::Wrapper, dl, PtrVT, Op);
}
+SDValue NVPTXTargetLowering::LowerIntrinsicWChain(SDValue Op,
+ SelectionDAG &DAG) const {
+ SDNode *N = Op.getNode();
+ SDValue Intrin = N->getOperand(1);
+ SDLoc DL(N);
+
+ // Get the intrinsic ID
+ unsigned IntrinNo = cast<ConstantSDNode>(Intrin.getNode())->getZExtValue();
+ switch (IntrinNo) {
+ default:
+ break;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z: {
+
+ if (N->getOperand(2).getValueType() != MVT::i128) {
+ // return, if the operand is already lowered
+ return SDValue();
+ }
+
+ SDLoc DL(N);
+ SmallVector<SDValue, 8> Ops;
+
+ Ops.push_back(N->getOperand(0)); // Chain
+ Ops.push_back(N->getOperand(1)); // Intrinsic
+
+ SDValue TryCancelResponse = N->getOperand(2);
+ SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TryCancelResponse);
+ SDValue TryCancelResponse0 =
+ DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+ DAG.getIntPtrConstant(0, DL));
+ SDValue TryCancelResponse1 =
+ DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+ DAG.getIntPtrConstant(1, DL));
+
+ Ops.push_back(TryCancelResponse0);
+ Ops.push_back(TryCancelResponse1);
+
+ MemIntrinsicSDNode *MemSD = cast<MemIntrinsicSDNode>(N);
+ return DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, DL, N->getVTList(),
+ Ops, MemSD->getMemoryVT(),
+ MemSD->getMemOperand());
+ }
+ }
+ return Op;
+}
+
static bool IsTypePassedAsArray(const Type *Ty) {
return Ty->isAggregateType() || Ty->isVectorTy() || Ty->isIntegerTy(128) ||
Ty->isHalfTy() || Ty->isBFloatTy();
@@ -2862,7 +2913,7 @@ NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::GlobalAddress:
return LowerGlobalAddress(Op, DAG);
case ISD::INTRINSIC_W_CHAIN:
- return Op;
+ return LowerIntrinsicWChain(Op, DAG);
case ISD::INTRINSIC_VOID:
return LowerIntrinsicVoid(Op, DAG);
case ISD::BUILD_VECTOR:
@@ -4724,6 +4775,21 @@ bool NVPTXTargetLowering::getTgtMemIntrinsic(
Info.align.reset();
return true;
}
+
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z: {
+ auto &DL = I.getModule()->getDataLayout();
+ Info.opc = ISD::INTRINSIC_W_CHAIN;
+ Info.memVT = getValueType(DL, I.getType());
+ Info.ptrVal = nullptr;
+ Info.offset = 0;
+ Info.flags = MachineMemOperand::MOLoad;
+ Info.align.reset();
+ return true;
+ }
}
return false;
}
@@ -6036,6 +6102,60 @@ static void ReplaceINTRINSIC_W_CHAIN(SDNode *N, SelectionDAG &DAG,
case Intrinsic::nvvm_tcgen05_ld_16x32bx2_x64:
case Intrinsic::nvvm_tcgen05_ld_16x32bx2_x128:
return ReplaceTcgen05Ld(N, DAG, Results, /* Offset */ true);
+
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid: {
+ // The intrinsic returns the CTAID of x, y and z dimension as a v4i32 value
+ EVT ResVT = N->getValueType(0);
+ if (!ResVT.isVector())
+ return; // already legalized.
+
+ const unsigned NumElts = ResVT.getVectorNumElements(); // v4i32
+
+ // Create the return type of the instructions
+ SmallVector<EVT, 5> ListVTs;
+ for (unsigned i = 0; i < NumElts; ++i)
+ ListVTs.push_back(MVT::i32);
+ ListVTs.push_back(MVT::Other);
+
+ SDVTList ResVTs = DAG.getVTList(ListVTs);
+
+ SmallVector<SDValue, 8> Ops;
+ // Add Chain and Intrinsic ID
+ Ops.push_back(N->getOperand(0)); // Chain
+ Ops.push_back(N->getOperand(1)); // Intrinsic ID
+
+ SDValue TryCancelResponse = N->getOperand(2); // i128 operand
+ // Cast i128 to v2i64 and split into two i64
+ SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TryCancelResponse);
+ SDValue TryCancelResponse_0 =
+ DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+ DAG.getIntPtrConstant(0, DL));
+ SDValue TryCancelResponse_1 =
+ DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+ DAG.getIntPtrConstant(1, DL));
+
+ Ops.push_back(TryCancelResponse_0);
+ Ops.push_back(TryCancelResponse_1);
+
+ MemIntrinsicSDNode *MemSD = cast<MemIntrinsicSDNode>(N);
+ // Create a new Intrinsic Node with 2 x i64 operands
+ SDValue NewNode =
+ DAG.getMemIntrinsicNode(ISD::INTRINSIC_W_CHAIN, DL, ResVTs, Ops,
+ MemSD->getMemoryVT(), MemSD->getMemOperand());
+
+ // Scalarize the vector results
+ SmallVector<SDValue, 4> ScalarRes;
+ for (unsigned i = 0; i < NumElts; ++i) {
+ SDValue Res = NewNode.getValue(i);
+ ScalarRes.push_back(Res);
+ }
+
+ SDValue Chain = NewNode.getValue(NumElts); // v4i32 value
+ SDValue BuildVector = DAG.getNode(ISD::BUILD_VECTOR, DL, ResVT, ScalarRes);
+ Results.push_back(BuildVector); // Build Vector
+ Results.push_back(Chain); // Chain
+ return;
+ }
}
}
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 39470be254efa..723e6defa8327 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -329,6 +329,9 @@ class NVPTXTargetLowering : public TargetLowering {
SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCopyToReg_128(SDValue Op, SelectionDAG &DAG) const;
+
+ SDValue LowerIntrinsicWChain(SDValue Op, SelectionDAG &DAG) const;
+
unsigned getNumRegisters(LLVMContext &Context, EVT VT,
std::optional<MVT> RegisterVT) const override;
bool
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index 7d0c47fa464c5..29d19c40a6f59 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -160,6 +160,7 @@ def hasHWROT32 : Predicate<"Subtarget->hasHWROT32()">;
def noHWROT32 : Predicate<"!Subtarget->hasHWROT32()">;
def hasDotInstructions : Predicate<"Subtarget->hasDotInstructions()">;
def hasTcgen05Instructions : Predicate<"Subtarget->hasTcgen05Instructions()">;
+def hasBlackwellArch : Predicate<"Subtarget->hasBlackwellArch()">;
def True : Predicate<"true">;
def False : Predicate<"false">;
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 34cb63e44ca71..3f46c8c1a64ba 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -7301,3 +7301,66 @@ def INT_NVVM_ST_BULK_SHARED_CTA:
"st.bulk.shared::cta [$dest_addr], $size, 0;",
[(int_nvvm_st_bulk_shared_cta addr:$dest_addr, i64:$size, (i64 0))]>,
Requires<[hasSM<100>, hasPTX<86>]>;
+
+// clusterlaunchcontrol Instructions
+
+multiclass CLUSTERLAUNCHCONTROL_TRY_CANCEL<Intrinsic Intrin, bit Multicast, list<Predicate> Pred> {
+ defvar Intrinsic = !cast<Intrinsic>(Intrin # !if(!eq(Multicast, 1), "_multicast", ""));
+ defvar IntrinsicShared = !cast<Intrinsic>(Intrin # !if(!eq(Multicast, 1), "_multicast", "") # "_shared");
+
+ def _gen: NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$mbar),
+ "clusterlaunchcontrol.try_cancel.async.mbarrier::complete_tx::bytes" #
+ !if(!eq(Multicast, 1), ".multicast::cluster::all", "") #
+ ".b128 [$addr], [$mbar];",
+ [(Intrinsic Int64Regs:$addr, Int64Regs:$mbar)]>, Requires<Pred>;
+
+ def _shared: NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$mbar),
+ "clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
+ !if(!eq(Multicast, 1), ".multicast::cluster::all", "") #
+ ".b128 [$addr], [$mbar];",
+ [(IntrinsicShared Int64Regs:$addr, Int64Regs:$mbar)]>, Requires<Pred>;
+
+ def _shared32: NVPTXInst<(outs), (ins Int32Regs:$addr, Int32Regs:$mbar),
+ "clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
+ !if(!eq(Multicast, 1), ".multicast::cluster::all", "") #
+ ".b128 [$addr], [$mbar];",
+ [(IntrinsicShared Int32Regs:$addr, Int32Regs:$mbar)]>, Requires<Pred>;
+}
+
+defm CLUSTERLAUNCHCONTRL_TRY_CANCEL :
+ CLUSTERLAUNCHCONTROL_TRY_CANCEL<
+ int_nvvm_clusterlaunchcontrol_try_cancel_async, 0, [hasSM<100>, hasPTX<86>]>;
+
+defm CLUSTERLAUNCHCONTRL_TRY_CANCEL_MULTICAST :
+ CLUSTERLAUNCHCONTROL_TRY_CANCEL<
+ int_nvvm_clusterlaunchcontrol_try_cancel_async, 1, [hasBlackwellArch]>;
+
+def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED :
+ NVPTXInst<(outs Int1Regs:$pred), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
+ !strconcat("{{\n\t",
+ !strconcat(".reg .b128 %handle;\n\t",
+ !strconcat("mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t",
+ !strconcat("clusterlaunchcontrol.query_cancel.is_canceled.pred.b128 $pred, %handle;\n\t", "}}")))), []>,
+ Requires<[hasSM<100>, hasPTX<86>]>;
+
+def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID :
+ NVPTXInst<(outs Int32Regs:$r1, Int32Regs:$r2, Int32Regs:$r3, Int32Regs:$r4),
+ (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
+ !strconcat("{{\n\t",
+ !strconcat(".reg .b128 %handle;\n\t",
+ !strconcat("mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t",
+ !strconcat("clusterlaunchcontrol.query_cancel.get_first_ctaid.v4.b32.b128 {$r1, $r2, $r3, $r4}, %handle;\n\t", "}}")))), []>,
+ Requires<[hasSM<100>, hasPTX<86>]>;
+
+class CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<string Dim> :
+ NVPTXInst<(outs Int32Regs:$reg), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
+ !strconcat("{{\n\t",
+ !strconcat(".reg .b128 %handle;\n\t",
+ !strconcat("mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t",
+ !strconcat("clusterlaunchcontrol.query_cancel.get_first_ctaid::" # Dim # ".b32.b128 $reg, %handle;\n\t", "}}")))), []>,
+ Requires<[hasSM<100>, hasPTX<86>]>;
+
+foreach dim = ["x", "y", "z"] in {
+ def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_ # dim :
+ CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<dim>;
+}
diff --git a/llvm/lib/Target/NVPTX/NVPTXSubtarget.h b/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
index 0a4fc8d1435be..a82b809142e65 100644
--- a/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
+++ b/llvm/lib/Target/NVPTX/NVPTXSubtarget.h
@@ -112,6 +112,20 @@ class NVPTXSubtarget : public NVPTXGenSubtargetInfo {
return HasTcgen05 && PTXVersion >= 86;
}
+ bool hasBlackwellArch() const {
+ auto HasSM = [&]() {
+ if (FullSmVersion == 1001)
+ return true;
+ if (FullSmVersion == 1011)
+ return true;
+ if (FullSmVersion == 1201)
+ return true;
+ return false;
+ };
+
+ return HasSM() && PTXVersion >= 86;
+ }
+
// Prior to CUDA 12.3 ptxas did not recognize that the trap instruction
// terminates a basic block. Instead, it would assume that control flow
// continued to the next instruction. The next instruction could be in the
diff --git a/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll
new file mode 100644
index 0000000000000..8f0f9a6e27367
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll
@@ -0,0 +1,50 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -o - -mcpu=sm_100a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 | %ptxas-verify -arch=sm_100a %}
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_100a %}
+; RUN: llc -o - -mcpu=sm_101a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 | %ptxas-verify -arch=sm_101a %}
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_101a %}
+; RUN: llc -o - -mcpu=sm_120a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 | %ptxas-verify -arch=sm_120a %}
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_120a %}
+
+define void @nvvm_clusterlaunchcontrol_try_cancel_multicast(ptr %addr, ptr %mbar,
+; CHECK-PTX-SHARED64-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
+; CHECK-PTX-SHARED64: {
+; CHECK-PTX-SHARED64-NEXT: .reg .b64 %rd<5>;
+; CHECK-PTX-SHARED64-EMPTY:
+; CHECK-PTX-SHARED64-NEXT: // %bb.0:
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_0];
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_1];
+; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%rd1], [%rd2];
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd3, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_2];
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd4, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_3];
+; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%rd3], [%rd4];
+; CHECK-PTX-SHARED64-NEXT: ret;
+;
+; CHECK-PTX-SHARED32-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
+; CHECK-PTX-SHARED32: {
+; CHECK-PTX-SHARED32-NEXT: .reg .b32 %r<3>;
+; CHECK-PTX-SHARED32-NEXT: .reg .b64 %rd<3>;
+; CHECK-PTX-SHARED32-EMPTY:
+; CHECK-PTX-SHARED32-NEXT: // %bb.0:
+; CHECK-PTX-SHARED32-NEXT: ld.param.u64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_0];
+; CHECK-PTX-SHARED32-NEXT: ld.param.u64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_1];
+; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%rd1], [%rd2];
+; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_2];
+; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_3];
+; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%r1], [%r2];
+; CHECK-PTX-SHARED32-NEXT: ret;
+ ptr addrspace(3) %saddr, ptr addrspace(3) %smbar,
+ i128 %try_cancel_response) {
+ tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast(ptr %addr, ptr %mbar)
+
+ tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %saddr, ptr addrspace(3) %smbar)
+ ret void;
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll
new file mode 100644
index 0000000000000..24596b7a98935
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll
@@ -0,0 +1,140 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_100 -mattr=+ptx86 | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_100 -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100 -mattr=+ptx86 | %ptxas-verify -arch=sm_100 %}
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100 -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_100 %}
+
+define void @nvvm_clusterlaunchcontrol_try_cancel(ptr %addr, ptr %mbar,
+; CHECK-PTX-SHARED64-LABEL: nvvm_clusterlaunchcontrol_try_cancel(
+; CHECK-PTX-SHARED64: {
+; CHECK-PTX-SHARED64-NEXT: .reg .b64 %rd<5>;
+; CHECK-PTX-SHARED64-EMPTY:
+; CHECK-PTX-SHARED64-NEXT: // %bb.0:
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_param_0];
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_param_1];
+; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.mbarrier::complete_tx::bytes.b128 [%rd1], [%rd2];
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd3, [nvvm_clusterlaunchcontrol_try_cancel_param_2];
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd4, [nvvm_clusterlaunchcontrol_try_cancel_param_3];
+; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.b128 [%rd3], [%rd4];
+; CHECK-PTX-SHARED64-NEXT: ret;
+;
+; CHECK-PTX-SHARED32-LABEL: nvvm_clusterlaunchcontrol_try_cancel(
+; CHECK-PTX-SHARED32: {
+; CHECK-PTX-SHARED32-NEXT: .reg .b32 %r<3>;
+; CHECK-PTX-SHARED32-NEXT: .reg .b64 %rd<3>;
+; CHECK-PTX-SHARED32-EMPTY:
+; CHECK-PTX-SHARED32-NEXT: // %bb.0:
+; CHECK-PTX-SHARED32-NEXT: ld.param.u64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_param_0];
+; CHECK-PTX-SHARED32-NEXT: ld.param.u64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_param_1];
+; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.mbarrier::complete_tx::bytes.b128 [%rd1], [%rd2];
+; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r1, [nvvm_clusterlaunchcontrol_try_cancel_param_2];
+; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r2, [nvvm_clusterlaunchcontrol_try_cancel_param_3];
+; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.b128 [%r1], [%r2];
+; CHECK-PTX-SHARED32-NEXT: ret;
+ ptr addrspace(3) %saddr, ptr addrspace(3) %smbar,
+ i128 %try_cancel_response) {
+
+ tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async(ptr %addr, ptr %mbar)
+
+ tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared(ptr addrspace(3) %saddr, ptr addrspace(3) %smbar)
+ ret void;
+}
+
+define i32 @nvvm_clusterlaunchcontrol_query_cancel_is_canceled(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_is_canceled(
+; CHECK: {
+; CHECK-NEXT: .reg .pred %p<2>;
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_is_canceled_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {%rd1, %rd2};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.is_canceled.pred.b128 %p1, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: selp.u32 %r1, 1, 0, %p1;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %v0 = call i1 @llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled(i128 %try_cancel_response)
+ %v2 = zext i1 %v0 to i32
+ ret i32 %v2;
+}
+
+
+define <4 x i32> @nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<5>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {%rd1, %rd2};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.get_first_ctaid.v4.b32.b128 {%r1, %r2, %r3, %r4}, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: st.param.v4.b32 [func_retval0], {%r1, %r2, %r3, %r4};
+; CHECK-NEXT: ret;
+ %v0 = call <4 x i32> @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid(i128 %try_cancel_response)
+ ret <4 x i32> %v0;
+}
+
+define i32 @nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {%rd1, %rd2};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.get_first_ctaid::x.b32.b128 %r1, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %v0 = call i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.x(i128 %try_cancel_response)
+ ret i32 %v0;
+}
+
+define i32 @nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {%rd1, %rd2};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.get_first_ctaid::y.b32.b128 %r1, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %v0 = call i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.y(i128 %try_cancel_response)
+ ret i32 %v0;
+}
+
+define i32 @nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {%rd1, %rd2};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.get_first_ctaid::z.b32.b128 %r1, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %v0 = call i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.z(i128 %try_cancel_response)
+ ret i32 %v0;
+}
More information about the llvm-commits
mailing list