[llvm] [LLVM][NVPTX] Add support for griddepcontrol instruction (PR #123511)
Pradeep Kumar via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 23 09:34:29 PST 2025
https://github.com/schwarzschild-radius updated https://github.com/llvm/llvm-project/pull/123511
>From 036e5f57d3a4162ea206b35ab2a692dfbcd8614b Mon Sep 17 00:00:00 2001
From: pradeepku <pradeepku at nvidia.com>
Date: Thu, 2 Jan 2025 17:36:14 +0530
Subject: [PATCH] [LLVM][NVPTX] Add support for griddepcontrol instruction
This commit adds support for griddepcontrol PTX instruction with tests under griddepcontrol.ll
---
llvm/docs/NVPTXUsage.rst | 23 +++++++++++++++++++++++
llvm/include/llvm/IR/IntrinsicsNVVM.td | 3 +++
llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 12 ++++++++++++
llvm/test/CodeGen/NVPTX/griddepcontrol.ll | 17 +++++++++++++++++
4 files changed, 55 insertions(+)
create mode 100644 llvm/test/CodeGen/NVPTX/griddepcontrol.ll
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index a5a78a2882eec3..64dd2b84a1763e 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -939,6 +939,29 @@ including that ``wgmma.mma_async`` instruction is undefined behavior.
For more information, refer PTX ISA
`<https://docs.nvidia.com/cuda/parallel-thread-execution/#asynchronous-warpgroup-level-matrix-instructions-wgmma-wait-group>`_.
+'``llvm.nvvm.griddepcontrol.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.griddepcontrol.launch_dependents()
+ declare void @llvm.nvvm.griddepcontrol.wait()
+
+Overview:
+"""""""""
+
+The ``griddepcontrol`` intrinsics allows the dependent grids and prerequisite grids as defined by the runtime, to control execution in the following way:
+
+``griddepcontrol.launch_dependents`` intrinsic signals that the dependents can be scheduled, before the current grid completes. The intrinsic can be invoked by multiple threads in the current CTA and repeated invocations of the intrinsic will have no additional side effects past that of the first invocation.
+
+``griddepcontrol.wait`` intrinsic causes the executing thread to wait until all prerequisite grids in flight have completed and all the memory operations from the prerequisite grids are performed and made visible to the current grid.
+
+For more information, refer
+`PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol>`__.
+
Other Intrinsics
----------------
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index 00c441920bfa1c..68c2373a1a4541 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5044,4 +5044,7 @@ def int_nvvm_cp_async_bulk_prefetch_L2
NoCapture<ArgIndex<0>>, ReadOnly<ArgIndex<0>>,
ImmArg<ArgIndex<3>>]>;
+def int_nvvm_griddepcontrol_launch_dependents: Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects]>;
+def int_nvvm_griddepcontrol_wait: Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects]>;
+
} // let TargetPrefix = "nvvm"
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 6198c4aa9b94cb..56d8b734bf01df 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -7569,4 +7569,16 @@ def INT_NVVM_WGMMA_WAIT_GROUP_SYNC_ALIGNED : NVPTXInst<(outs), (ins i64imm:$n),
[(int_nvvm_wgmma_wait_group_sync_aligned timm:$n)]>, Requires<[hasSM90a, hasPTX<80>]>;
} // isConvergent = true
+def GRIDDEPCONTROL_LAUNCH_DEPENDENTS :
+ NVPTXInst<(outs), (ins),
+ "griddepcontrol.launch_dependents;",
+ [(int_nvvm_griddepcontrol_launch_dependents)]>,
+ Requires<[hasSM<90>, hasPTX<78>]>;
+
+def GRIDDEPCONTROL_WAIT :
+ NVPTXInst<(outs), (ins),
+ "griddepcontrol.wait;",
+ [(int_nvvm_griddepcontrol_wait)]>,
+ Requires<[hasSM<90>, hasPTX<78>]>;
+
def INT_EXIT : NVPTXInst<(outs), (ins), "exit;", [(int_nvvm_exit)]>;
diff --git a/llvm/test/CodeGen/NVPTX/griddepcontrol.ll b/llvm/test/CodeGen/NVPTX/griddepcontrol.ll
new file mode 100644
index 00000000000000..fe15b3fe4afbd9
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/griddepcontrol.ll
@@ -0,0 +1,17 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mcpu=sm_90 -march=nvptx64 | FileCheck %s
+; RUN: %if ptxas-11.8 %{ llc < %s -mcpu=sm_90 -march=nvptx64 | %ptxas-verify %}
+
+define void @griddepcontrol() {
+; CHECK-LABEL: griddepcontrol(
+; CHECK: {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: griddepcontrol.launch_dependents;
+; CHECK-NEXT: griddepcontrol.wait;
+; CHECK-NEXT: ret;
+ call void @llvm.nvvm.griddepcontrol.launch.dependents()
+ call void @llvm.nvvm.griddepcontrol.wait()
+ ret void
+}
More information about the llvm-commits
mailing list