[llvm] [LLVM][NVPTX] Add NVPTX codegen support for fence.proxy.tensormap (PR #100748)

Tue Jul 30 11:02:29 PDT 2024

================
@@ -0,0 +1,36 @@
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_90 -mattr=+ptx83 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-12.5 %{ llc < %s -march=nvptx64 -mcpu=sm_90 -mattr=+ptx83 | %ptxas-verify -arch=sm_90 %}
+
+; CHECK-LABEL: test_fence_proxy_tensormap_release
+define void @test_fence_proxy_tensormap_release() {
+  ; CHECK: fence.proxy.tensormap::generic.release.cta;
+  call void @llvm.nvvm.fence.proxy.tensormap.release.cta();
+
+  ; CHECK: fence.proxy.tensormap::generic.release.cluster;
+  call void @llvm.nvvm.fence.proxy.tensormap.release.cluster();
+
+  ; CHECK: fence.proxy.tensormap::generic.release.gpu;
+  call void @llvm.nvvm.fence.proxy.tensormap.release.gpu();
+
+  ; CHECK: fence.proxy.tensormap::generic.release.sys;
+  call void @llvm.nvvm.fence.proxy.tensormap.release.sys();
+
+  ret void
+}
+
+; CHECK-LABEL: test_fence_proxy_tensormap_acquire
+define void @test_fence_proxy_tensormap_acquire(ptr addrspace(0) %addr) {
+  ; CHECK: fence.proxy.tensormap::generic.acquire.cta [%rd{{[0-9]+}}], 128;
+  call void @llvm.nvvm.fence.proxy.tensormap.acquire.cta(ptr addrspace(0) %addr, i32 128);
----------------
Artem-B wrote:

What is supposed to happen when a user passes a constant other than 128? 

AFAICT we'll happily pass it along into generated PTX, and ptxas will fail to compile it. That's not ideal. I suppose, in this case it may be OK, as it's the PTX spec which specifies size as a parameter, but allows only one specific value at the moment. We could diagnose the wrong value a bit earlier, but I think it could be argued that it's not LLVM's job in this case.

One thing we could do is to remove the pattern for automatically matching the the intrinsic, and instead use a separate pattern that would match the intrinsic with a constant integer argument 128. Attempts to use the intrinsic with any other argument will fail in LLVM with a failure to lower the intrinsic, which would allow us to diagnose such errors earlier in the compilation pipeline.


https://github.com/llvm/llvm-project/pull/100748