[llvm] [NVPTX] Preserve v16i8 vector loads when legalizing (PR #67322)

Thu Nov 23 02:11:50 PST 2023

bondhugula wrote:

@pasaulais @Artem-B These are all really useful changes for v16i8 - thanks! While experimenting, I found a related scenario that has been missed.  While the current trunk ensures v4.u32 is used for 16 x i8 loads, this isn't the case for 16 x i8 stores that don't have a load feeding into them. The simplest test case is below. llc generates four st.u32 here - while in theory, v4.store.b32 or v4.store.u32 could have been used. I can contribute a patch to make this efficient as well, with this same approach, if that makes sense. store v16 x i8 ->  bitcast v16i8 to v4 x (v4 x i8) + store v4 x i32.

```
; RUN: llc < %s -march=nvptx -mcpu=sm_20 | FileCheck %s

; CHECK-LABEL: foo3
define void @foo3(ptr %a, ptr %b, <16 x i8> %v) {
; CHECK: st.v4.b32
  store <16 x i8> %v, ptr %b
  ret void
}
```

```bin/llc -march=nvptx -mcpu=sm_80 i8_store_vec.ll -o -
//
// Generated by LLVM NVPTX Back-End
//

.version 7.0
.target sm_80
.address_size 32

	// .globl	foo3                    // -- Begin function foo3
                                        // @foo3
.visible .func foo3(
	.param .b32 foo3_param_0,
	.param .b32 foo3_param_1,
	.param .align 16 .b8 foo3_param_2[16]
)
{
	.reg .b32 	%r<6>;

// %bb.0:
	ld.param.u32 	%r1, [foo3_param_1];
	ld.param.v4.u32 	{%r2, %r3, %r4, %r5}, [foo3_param_2];
	st.u32 	[%r1+12], %r5;
	st.u32 	[%r1+8], %r4;
	st.u32 	[%r1+4], %r3;
	st.u32 	[%r1], %r2;
	ret;
                                        // -- End function
```

On a related note, the following patch is a much smaller unit test case for this PR:
```

--- a/llvm/test/CodeGen/NVPTX/vector-stores.ll
+++ b/llvm/test/CodeGen/NVPTX/vector-stores.ll
@@ -29,3 +29,11 @@ define void @foo4(<4 x i32> %val, ptr %ptr) {
   ret void
 }
 
+; CHECK-LABEL: @v16i8
+define void @v16i8(ptr %a, ptr %b) {
+; CHECK: ld.v4.u32
+; CHECK: st.v4.u32
+  %v = load <16 x i8>, ptr %a
+  store <16 x i8> %v, ptr %b
+  ret void
+}
```

https://github.com/llvm/llvm-project/pull/67322