[llvm] [NVPTX] Lower 16xi8 and 8xi8 stores efficiently (PR #73646)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 5 11:23:42 PST 2023


================
@@ -37,3 +37,19 @@ define void @v16i8(ptr %a, ptr %b) {
   store <16 x i8> %v, ptr %b
   ret void
 }
+
+; CHECK-LABEL: .visible .func v16i8_store
+define void @v16i8_store(ptr %a, <16 x i8> %v) {
+  ; CHECK:      ld.param.u64   %rd1, [v16i8_store_param_0];
+  ; CHECK-NEXT: ld.param.v4.u32   {%r1, %r2, %r3, %r4}, [v16i8_store_param_1];
+  ; CHECK-NEXT: st.v4.u32   [%rd1], {%r1, %r2, %r3, %r4};
+  store <16 x i8> %v, ptr %a
+  ret void
+}
+
+; CHECK-LABEL: .visible .func v8i8_store
+define void @v8i8_store(ptr %a, <8 x i8> %v) {
+  ; CHECK: st.v2.u32
+  store <8 x i8> %v, ptr %a
----------------
Artem-B wrote:

You're right. Using larger types for loads/stores must be aligned appropriately.

We do use `allowsMemoryAccessForAlignment` in other places.


https://github.com/llvm/llvm-project/pull/73646


More information about the llvm-commits mailing list