[llvm] [LLVM][NVPTX] Add NVPTX codegen support for fence.proxy.tensormap (PR #100748)

Tue Jul 30 11:02:30 PDT 2024

================
@@ -1418,6 +1418,20 @@ let TargetPrefix = "nvvm" in {
   def int_nvvm_fence_sc_cluster:
       Intrinsic<[], [], [IntrNoCallback]>;
 
+// Proxy fence (uni-directional)
+foreach scope = ["cta", "cluster", "gpu", "sys"] in {
+
+  def int_nvvm_fence_proxy_tensormap_release_ # scope:
+        Intrinsic<[], [], [IntrNoCallback],
+                  "llvm.nvvm.fence.proxy.tensormap.release." # scope>;
+
+  def int_nvvm_fence_proxy_tensormap_acquire_ # scope:
+        Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty],
+                  [IntrNoCallback, ImmArg<ArgIndex<1>>],
----------------
Artem-B wrote:

Intrinsic constraints may be underspecified here.

I'm not familiar enough with the new synchronization instructions, so I can't tell you what is the right way to specify their behavior in LLVM, but it's not uncommon for various barrier instructions to come with a `IntrHasSideEffects` to make sure that LLVM would not move them around. 

These instruction variants seem to be narrower in scops and apply only to the area specified by the pointer and size, which suggests that we may give LLVM a bit more freedom and may get by with `IntrArgMemOnly`, `IntrReadMem`, `IntrWriteMem`.

Also, are there any concerns regarding replicating/merging these instructions across different control flow paths? Do they need `IntrConvergent` ?


https://github.com/llvm/llvm-project/pull/100748