[llvm] [LLVM][NVPTX] Add NVPTX codegen support for clusterlaunchcontrol instruction (PR #134568)
Pradeep Kumar via llvm-commits
llvm-commits at lists.llvm.org
Thu May 8 17:09:25 PDT 2025
https://github.com/schwarzschild-radius updated https://github.com/llvm/llvm-project/pull/134568
>From b84c1be648fa7a9c257ef59e5bf9966f44378859 Mon Sep 17 00:00:00 2001
From: pradeepku <pradeepku at nvidia.com>
Date: Thu, 3 Apr 2025 16:14:17 +0530
Subject: [PATCH] [LLVM][NVPTX] Add NVPTX codegen support for
clusterlaunchcontrol instruction
This commit adds NVPTX codegen support for clusterlaunchcontrol instructions with tests under clusterlaunchcontrol.ll and clusterlaunchcontrol-multicast.ll. For more information, Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel)
---
llvm/docs/NVPTXUsage.rst | 96 +++++++++++++
llvm/include/llvm/IR/IntrinsicsNVVM.td | 34 +++++
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 71 +++++++++-
llvm/lib/Target/NVPTX/NVPTXISelLowering.h | 7 +-
llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 81 +++++++++++
.../NVPTX/clusterlaunchcontrol-multicast.ll | 42 ++++++
.../CodeGen/NVPTX/clusterlaunchcontrol.ll | 132 ++++++++++++++++++
7 files changed, 461 insertions(+), 2 deletions(-)
create mode 100644 llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll
create mode 100644 llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index c1426823d87af..a2074e8d0c809 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -1558,6 +1558,102 @@ similar but the latter uses generic addressing (see `Generic Addressing <https:/
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk>`__.
+
+clusterlaunchcontrol Intrinsics
+-------------------------------
+
+'``llvm.nvvm.clusterlaunchcontrol.try_cancel*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
+ declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
+
+Overview:
+"""""""""
+
+The ``clusterlaunchcontrol.try_cancel`` intrinsics requests atomically cancelling
+the launch of a cluster that has not started running yet. It asynchronously non-atomically writes
+a 16-byte opaque response to shared memory, pointed to by 16-byte-aligned ``addr`` indicating whether the
+operation succeeded or failed. ``addr`` and 8-byte-aligned ``mbar`` must refer to ``shared::cta``
+otherwise the behavior is undefined. The completion of the asynchronous operation
+is tracked using the mbarrier completion mechanism at ``.cluster`` scope referenced
+by the shared memory pointer, ``mbar``. On success, the opaque response contains
+the CTA id of the first CTA of the canceled cluster; no other successful response
+from other ``clusterlaunchcontrol.try_cancel`` operations from the same grid will
+contain that id.
+
+The ``multicast`` variant specifies that the response is asynchronously non-atomically written to
+the corresponding shared memory location of each CTA in the requesting cluster.
+The completion of the write of each local response is tracked by independent
+mbarriers at the corresponding shared memory location of each CTA in the
+cluster.
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel>`__.
+
+'``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare i1 @llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled(i128 %try_cancel_response)
+
+Overview:
+"""""""""
+
+The ``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled`` intrinsic decodes the opaque response written by the
+``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
+
+The intrinsic returns ``0`` (false) if the request failed. If the request succeeded,
+it returns ``1`` (true). A true result indicates that:
+
+- the thread block cluster whose first CTA id matches that of the response
+ handle will not run, and
+- no other successful response of another ``try_cancel`` request in the grid will contain
+ the first CTA id of that cluster
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
+
+
+'``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare {i32, i32, i32} @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid(i128 %try_cancel_response)
+ declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.x(i128 %try_cancel_response)
+ declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.y(i128 %try_cancel_response)
+ declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.z(i128 %try_cancel_response)
+
+Overview:
+"""""""""
+
+The ``clusterlaunchcontrol.query_cancel.get_first_ctaid`` intrinsic can be
+used to decode the successful opaque response written by the
+``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
+
+If the request succeeded:
+
+- ``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.{x,y,z}`` returns
+ the coordinate of the first CTA in the canceled cluster, either x, y, or z.
+
+- ``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid`` returns a struct
+ of three elements which correspond to the x, y, z coordinates of the first CTA.
+
+If the request failed, the behavior of these intrinsics is undefined.
+
+For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
+
Other Intrinsics
----------------
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 8b87822d3fdda..0b3cfddf1814b 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5472,4 +5472,38 @@ def int_nvvm_st_bulk_shared_cta : DefaultAttrsIntrinsic<[],
[IntrArgMemOnly, IntrWriteMem,
WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>]>;
+//
+// clusterlaunchcontorl Intrinsics
+//
+
+// clusterlaunchcontrol.try_cancel
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async_shared
+ : DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
+ [IntrHasSideEffects, IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+ "llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared">;
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared
+ : DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
+ [IntrHasSideEffects, IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+ "llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared">;
+
+// clusterlaunchcontrol.query_cancel.is_canceled
+
+def int_nvvm_clusterlaunchcontrol_query_cancel_is_canceled
+ : DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_i128_ty], [],
+ "llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled">;
+
+// clusterlaunchcontrol.query_cancel.get_first_ctaid*
+
+def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid
+ : DefaultAttrsIntrinsic<[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], [llvm_i128_ty], [],
+ "llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid">;
+
+foreach dim = ["x", "y", "z"] in {
+def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_ # dim
+ : DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i128_ty], [],
+ "llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid." # dim>;
+}
+
} // let TargetPrefix = "nvvm"
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 3769aae7b620f..8c3297447fdea 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -1015,6 +1015,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
+ // Enable custom lowering for the i128 bit operand with clusterlaunchcontrol
+ setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i128, Custom);
}
const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
@@ -1091,6 +1093,11 @@ const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
MAKE_CASE(NVPTXISD::BrxEnd)
MAKE_CASE(NVPTXISD::BrxItem)
MAKE_CASE(NVPTXISD::BrxStart)
+ MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED)
+ MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID)
+ MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X)
+ MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y)
+ MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z)
}
return nullptr;
@@ -1163,6 +1170,68 @@ NVPTXTargetLowering::LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const {
return DAG.getNode(NVPTXISD::Wrapper, dl, PtrVT, Op);
}
+static SDValue LowerClusterLauncControl(SDValue Op, SelectionDAG &DAG) {
+
+ SDNode *N = Op.getNode();
+ if (N->getOperand(2).getValueType() != MVT::i128) {
+ // return, if the operand is already lowered
+ return SDValue();
+ }
+
+ unsigned IID =
+ cast<ConstantSDNode>(N->getOperand(1).getNode())->getZExtValue();
+ auto Opcode = [&]() {
+ switch (IID) {
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+ return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
+ return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+ return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+ return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
+ return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z;
+ default:
+ llvm_unreachable("unsupported/unhandled intrinsic");
+ }
+ }();
+
+ SDLoc DL(N);
+ SDValue TryCancelResponse = N->getOperand(2);
+ SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TryCancelResponse);
+ SDValue TryCancelResponse0 =
+ DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+ DAG.getIntPtrConstant(0, DL));
+ SDValue TryCancelResponse1 =
+ DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
+ DAG.getIntPtrConstant(1, DL));
+
+ return DAG.getNode(Opcode, DL, N->getVTList(),
+ {N->getOperand(0), N->getOperand(1), TryCancelResponse0,
+ TryCancelResponse1});
+}
+
+static SDValue LowerIntrinsicWChain(SDValue Op, SelectionDAG &DAG) {
+ SDNode *N = Op.getNode();
+ SDValue Intrin = N->getOperand(1);
+ SDLoc DL(N);
+
+ // Get the intrinsic ID
+ unsigned IntrinNo = cast<ConstantSDNode>(Intrin.getNode())->getZExtValue();
+ switch (IntrinNo) {
+ default:
+ break;
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
+ case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
+ return LowerClusterLauncControl(Op, DAG);
+ }
+ return Op;
+}
+
std::string NVPTXTargetLowering::getPrototype(
const DataLayout &DL, Type *retTy, const ArgListTy &Args,
const SmallVectorImpl<ISD::OutputArg> &Outs, MaybeAlign retAlignment,
@@ -2908,7 +2977,7 @@ NVPTXTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::GlobalAddress:
return LowerGlobalAddress(Op, DAG);
case ISD::INTRINSIC_W_CHAIN:
- return Op;
+ return LowerIntrinsicWChain(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN:
return lowerIntrinsicWOChain(Op, DAG);
case ISD::INTRINSIC_VOID:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index 7a8bf3bf33a94..6b9b8704f5660 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -99,7 +99,12 @@ enum NodeType : unsigned {
StoreRetval,
StoreRetvalV2,
StoreRetvalV4,
- LAST_MEMORY_OPCODE = StoreRetvalV4,
+ CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED,
+ CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID,
+ CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X,
+ CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y,
+ CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z,
+ LAST_MEMORY_OPCODE = CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z,
};
}
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 3eedb43e4c81a..c354184bc1ec9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -7354,3 +7354,84 @@ def INT_NVVM_ST_BULK_SHARED_CTA:
"st.bulk.shared::cta [$dest_addr], $size, 0;",
[(int_nvvm_st_bulk_shared_cta addr:$dest_addr, i64:$size, (i64 0))]>,
Requires<[hasSM<100>, hasPTX<86>]>;
+
+//
+// clusterlaunchcontorl Instructions
+//
+
+def CLUSTERLAUNCHCONTRL_TRY_CANCEL:
+ NVPTXInst<(outs), (ins ADDR:$addr, ADDR:$mbar),
+ "clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.b128 " #
+ "[$addr], [$mbar];",
+ [(int_nvvm_clusterlaunchcontrol_try_cancel_async_shared addr:$addr, addr:$mbar)]>,
+ Requires<[hasSM<100>, hasPTX<86>]>;
+
+def CLUSTERLAUNCHCONTRL_TRY_CANCEL_MULTICAST:
+ NVPTXInst<(outs), (ins ADDR:$addr, ADDR:$mbar),
+ "clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
+ ".multicast::cluster::all.b128 " #
+ "[$addr], [$mbar];",
+ [(int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared addr:$addr, addr:$mbar)]>,
+ Requires<[hasSM<100>, hasArchAccelFeatures, hasPTX<86>]>;
+
+def SDTClusterLaunchControlQueryCancelIsCanceled: SDTypeProfile<1, 2, []>;
+def clusterlaunchcontrol_query_cancel_is_canceled:
+ SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED",
+ SDTClusterLaunchControlQueryCancelIsCanceled, [SDNPHasChain, SDNPSideEffect]>;
+
+def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED:
+ NVPTXInst<(outs Int1Regs:$pred), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
+ "{{\n\t" #
+ ".reg .b128 %handle;\n\t" #
+ "mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
+ "clusterlaunchcontrol.query_cancel.is_canceled.pred.b128 $pred, %handle;\n\t" #
+ "}}", [(set i1:$pred,
+ (clusterlaunchcontrol_query_cancel_is_canceled i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
+ Requires<[hasSM<100>, hasPTX<86>]>;
+
+def SDTClusterLaunchControlQueryCancelGetFirstCtaId: SDTypeProfile<3, 2, []>;
+def clusterlaunchcontrol_query_cancel_first_cta_id:
+ SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID",
+ SDTClusterLaunchControlQueryCancelGetFirstCtaId, [SDNPHasChain, SDNPSideEffect]>;
+
+def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID:
+ NVPTXInst<(outs Int32Regs:$r1, Int32Regs:$r2, Int32Regs:$r3),
+ (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
+ "{{\n\t" #
+ ".reg .b128 %handle;\n\t" #
+ "mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
+ "clusterlaunchcontrol.query_cancel.get_first_ctaid.v4.b32.b128 {$r1, $r2, $r3, _}, %handle;\n\t" #
+ "}}", [(set i32:$r1, i32:$r2, i32:$r3,
+ (clusterlaunchcontrol_query_cancel_first_cta_id i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
+ Requires<[hasSM<100>, hasPTX<86>]>;
+
+def SDTClusterLaunchControlQueryCancelGetFirstCtaIdX: SDTypeProfile<1, 2, []>;
+def clusterlaunchcontrol_query_cancel_first_cta_id_x :
+ SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X",
+ SDTClusterLaunchControlQueryCancelGetFirstCtaIdX, [SDNPHasChain, SDNPSideEffect]>;
+
+def SDTClusterLaunchControlQueryCancelGetFirstCtaIdY: SDTypeProfile<1, 2, []>;
+def clusterlaunchcontrol_query_cancel_first_cta_id_y:
+ SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y",
+ SDTClusterLaunchControlQueryCancelGetFirstCtaIdY, [SDNPHasChain, SDNPSideEffect]>;
+
+def SDTClusterLaunchControlQueryCancelGetFirstCtaIdZ: SDTypeProfile<1, 2, []>;
+def clusterlaunchcontrol_query_cancel_first_cta_id_z:
+ SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z",
+ SDTClusterLaunchControlQueryCancelGetFirstCtaIdZ, [SDNPHasChain, SDNPSideEffect]>;
+
+class CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<string Dim>:
+ NVPTXInst<(outs Int32Regs:$reg), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
+ "{{\n\t" #
+ ".reg .b128 %handle;\n\t" #
+ "mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
+ "clusterlaunchcontrol.query_cancel.get_first_ctaid::" # Dim # ".b32.b128 $reg, %handle;\n\t" #
+ "}}", [(set i32:$reg,
+ (!cast<SDNode>("clusterlaunchcontrol_query_cancel_first_cta_id_" # Dim)
+ i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
+ Requires<[hasSM<100>, hasPTX<86>]>;
+
+foreach dim = ["x", "y", "z"] in {
+ def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_ # dim:
+ CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<dim>;
+}
diff --git a/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll
new file mode 100644
index 0000000000000..58d9dc05f9bdd
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol-multicast.ll
@@ -0,0 +1,42 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -o - -mcpu=sm_100a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 | %ptxas-verify -arch=sm_100a %}
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_100a %}
+; RUN: llc -o - -mcpu=sm_101a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 | %ptxas-verify -arch=sm_101a %}
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_101a %}
+; RUN: llc -o - -mcpu=sm_120a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 | %ptxas-verify -arch=sm_120a %}
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_120a %}
+
+define void @nvvm_clusterlaunchcontrol_try_cancel_multicast(ptr %addr, ptr %mbar,
+; CHECK-PTX-SHARED64-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
+; CHECK-PTX-SHARED64: {
+; CHECK-PTX-SHARED64-NEXT: .reg .b64 %rd<3>;
+; CHECK-PTX-SHARED64-EMPTY:
+; CHECK-PTX-SHARED64-NEXT: // %bb.0:
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_2];
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_3];
+; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%rd1], [%rd2];
+; CHECK-PTX-SHARED64-NEXT: ret;
+;
+; CHECK-PTX-SHARED32-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
+; CHECK-PTX-SHARED32: {
+; CHECK-PTX-SHARED32-NEXT: .reg .b32 %r<3>;
+; CHECK-PTX-SHARED32-EMPTY:
+; CHECK-PTX-SHARED32-NEXT: // %bb.0:
+; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_2];
+; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_3];
+; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%r1], [%r2];
+; CHECK-PTX-SHARED32-NEXT: ret;
+ ptr addrspace(3) %saddr, ptr addrspace(3) %smbar,
+ i128 %try_cancel_response) {
+
+ tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %saddr, ptr addrspace(3) %smbar)
+ ret void;
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}
diff --git a/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll
new file mode 100644
index 0000000000000..f29907c469bec
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/clusterlaunchcontrol.ll
@@ -0,0 +1,132 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_100 -mattr=+ptx86 | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_100 -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100 -mattr=+ptx86 | %ptxas-verify -arch=sm_100 %}
+; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100 -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_100 %}
+
+define void @nvvm_clusterlaunchcontrol_try_cancel(ptr %addr, ptr %mbar,
+; CHECK-PTX-SHARED64-LABEL: nvvm_clusterlaunchcontrol_try_cancel(
+; CHECK-PTX-SHARED64: {
+; CHECK-PTX-SHARED64-NEXT: .reg .b64 %rd<3>;
+; CHECK-PTX-SHARED64-EMPTY:
+; CHECK-PTX-SHARED64-NEXT: // %bb.0:
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_param_2];
+; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_param_3];
+; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.b128 [%rd1], [%rd2];
+; CHECK-PTX-SHARED64-NEXT: ret;
+;
+; CHECK-PTX-SHARED32-LABEL: nvvm_clusterlaunchcontrol_try_cancel(
+; CHECK-PTX-SHARED32: {
+; CHECK-PTX-SHARED32-NEXT: .reg .b32 %r<3>;
+; CHECK-PTX-SHARED32-EMPTY:
+; CHECK-PTX-SHARED32-NEXT: // %bb.0:
+; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r1, [nvvm_clusterlaunchcontrol_try_cancel_param_2];
+; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r2, [nvvm_clusterlaunchcontrol_try_cancel_param_3];
+; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.b128 [%r1], [%r2];
+; CHECK-PTX-SHARED32-NEXT: ret;
+ ptr addrspace(3) %saddr, ptr addrspace(3) %smbar,
+ i128 %try_cancel_response) {
+
+ tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared(ptr addrspace(3) %saddr, ptr addrspace(3) %smbar)
+ ret void;
+}
+
+define i32 @nvvm_clusterlaunchcontrol_query_cancel_is_canceled(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_is_canceled(
+; CHECK: {
+; CHECK-NEXT: .reg .pred %p<2>;
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_is_canceled_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {8294, %rd1};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.is_canceled.pred.b128 %p1, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: selp.b32 %r1, 1, 0, %p1;
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %v0 = call i1 @llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled(i128 %try_cancel_response)
+ %v2 = zext i1 %v0 to i32
+ ret i32 %v2;
+}
+
+define {i32, i32, i32} @nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<4>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {8290, %rd1};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.get_first_ctaid.v4.b32.b128 {%r1, %r2, %r3, _}, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: st.param.b32 [func_retval0+4], %r2;
+; CHECK-NEXT: st.param.b32 [func_retval0+8], %r3;
+; CHECK-NEXT: ret;
+ %v0 = call {i32, i32, i32} @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid(i128 %try_cancel_response)
+ ret {i32, i32, i32} %v0;
+}
+
+define i32 @nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {8291, %rd1};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.get_first_ctaid::x.b32.b128 %r1, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %v0 = call i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.x(i128 %try_cancel_response)
+ ret i32 %v0;
+}
+
+define i32 @nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {8292, %rd1};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.get_first_ctaid::y.b32.b128 %r1, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %v0 = call i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.y(i128 %try_cancel_response)
+ ret i32 %v0;
+}
+
+define i32 @nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z(i128 %try_cancel_response) local_unnamed_addr #0 {
+; CHECK-LABEL: nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z(
+; CHECK: {
+; CHECK-NEXT: .reg .b32 %r<2>;
+; CHECK-NEXT: .reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.v2.u64 {%rd1, %rd2}, [nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z_param_0];
+; CHECK-NEXT: {
+; CHECK-NEXT: .reg .b128 %handle;
+; CHECK-NEXT: mov.b128 %handle, {8293, %rd1};
+; CHECK-NEXT: clusterlaunchcontrol.query_cancel.get_first_ctaid::z.b32.b128 %r1, %handle;
+; CHECK-NEXT: }
+; CHECK-NEXT: st.param.b32 [func_retval0], %r1;
+; CHECK-NEXT: ret;
+ %v0 = call i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.z(i128 %try_cancel_response)
+ ret i32 %v0;
+}
More information about the llvm-commits
mailing list