[llvm] [NVPTX] Add Volta Atomic SequentiallyConsistent Load and Store Operations (PR #98551)

Wed Jul 31 11:15:19 PDT 2024

================
@@ -82,45 +153,139 @@ define void @generic_volatile(ptr %a, ptr %b, ptr %c, ptr %d) local_unnamed_addr
   ; CHECK: st.volatile.f64 [%rd{{[0-9]+}}], %fd{{[0-9]+}}
   store volatile double %f.add, ptr %c
 
+  ; TODO: volatile, atomic, and volatile atomic memory operations on vector types.
+  ; Currently, LLVM:
+  ; - does not allow atomic operations on vectors.
+  ; - it allows volatile operations but not clear what that means.
+  ; Following both semantics make sense in general and PTX supports both:
+  ; - volatile/atomic/volatile atomic applies to the whole vector
+  ; - volatile/atomic/volatile atomic applies elementwise
+  ; Actions required:
+  ; - clarify LLVM semantics for volatile on vectors and align the NVPTX backend with those
+  ;   Below tests show that the current implementation picks the semantics in an inconsistent way
+  ;   * volatile <2 x i8> lowers to "elementwise volatile"
+  ;   * <4 x i8> lowers to "full vector volatile"
+  ; - provide support for vector atomics, e.g., by extending LLVM IR or via intrinsics
+  ; - update tests in load-store-sm70.ll as well.
+
+  ; TODO: make this operation consistent with the one for <4 x i8>
+  ; This operation lowers to a "element wise volatile PTX operation".
+  ; CHECK: ld.volatile.v2.u8 {%rs{{[0-9]+}}, %rs{{[0-9]+}}}, [%rd{{[0-9]+}}]
+  %h.load = load volatile <2 x i8>, ptr %b
+  %h.add = add <2 x i8> %h.load, <i8 1, i8 1>
+  ; CHECK: st.volatile.v2.u8 [%rd{{[0-9]+}}], {%rs{{[0-9]+}}, %rs{{[0-9]+}}}
+  store volatile <2 x i8> %h.add, ptr %b
+
+  ; TODO: make this operation consistent with the one for <2 x i8>
+  ; This operation lowers to a "full vector volatile PTX operation".
+  ; CHECK: ld.volatile.u32 %r{{[0-9]+}}, [%rd{{[0-9]+}}]
+  %i.load = load volatile <4 x i8>, ptr %c
----------------
Artem-B wrote:

We probably want `<8 x i8>`, and, maybe, `<6 x i8>` too.  i8 in PTX is a really odd type.

Speaking of odd types, we should also test `i1`.

https://github.com/llvm/llvm-project/pull/98551