[llvm] [LLVM][NVPTX] Add NVPTX codegen support for clusterlaunchcontrol instruction (PR #134568)

via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 7 12:16:15 PDT 2025


================
@@ -5381,4 +5381,50 @@ def int_nvvm_st_bulk_shared_cta : DefaultAttrsIntrinsic<[],
   [IntrArgMemOnly, IntrWriteMem,
     WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>]>;
 
+//
+// Cluster launch control
+//
+
+// clusterlaunchcontrol.try_cancel
+
+def int_nvvm_clusterlaunchcontrol_try_cancel_async
+    : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
+                [IntrHasSideEffects, IntrArgMemOnly, NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>],
+                "llvm.nvvm.clusterlaunchcontrol.try_cancel.async">;
----------------
gonzalobg wrote:

In general, for all intrinsics that only accept "addresses to one state space and generic addresses to that one state space", LLVM should probably be doing what you propose ( @akshayrdeodhar for vis, since it sounds like something worth exploring related to better optimizations for intrinsics ).

Some intrinsics accepting generic addresses offload the address space detection + conversion to HW. 
If SW were to perform the detection + conversion, then that is pretty much always worse than doing it in HW.
For these, it may be good if we could express that a generic address, e.g., 
- can only narrow to some state spaces (e.g. either "global" or "shared").
- cannot narrow to some state spaces (e.g. "local").

Such that if address space narrowing proves that, e.g., a generic address does not point to e.g., "shared", then an intrinsic that only supports shared | global is called, this intrinsic further enables address space narrowing to narrow it further to global.

https://github.com/llvm/llvm-project/pull/134568


More information about the llvm-commits mailing list