[llvm] [NVPTX] Preserve v16i8 vector loads when legalizing (PR #67322)
Uday Bondhugula via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 23 02:11:50 PST 2023
bondhugula wrote:
@pasaulais @Artem-B These are all really useful changes for v16i8 - thanks! While experimenting, I found a related scenario that has been missed. While the current trunk ensures v4.u32 is used for 16 x i8 loads, this isn't the case for 16 x i8 stores that don't have a load feeding into them. The simplest test case is below. llc generates four st.u32 here - while in theory, v4.store.b32 or v4.store.u32 could have been used. I can contribute a patch to make this efficient as well, with this same approach, if that makes sense. store v16 x i8 -> bitcast v16i8 to v4 x (v4 x i8) + store v4 x i32.
```
; RUN: llc < %s -march=nvptx -mcpu=sm_20 | FileCheck %s
; CHECK-LABEL: foo3
define void @foo3(ptr %a, ptr %b, <16 x i8> %v) {
; CHECK: st.v4.b32
store <16 x i8> %v, ptr %b
ret void
}
```
```bin/llc -march=nvptx -mcpu=sm_80 i8_store_vec.ll -o -
//
// Generated by LLVM NVPTX Back-End
//
.version 7.0
.target sm_80
.address_size 32
// .globl foo3 // -- Begin function foo3
// @foo3
.visible .func foo3(
.param .b32 foo3_param_0,
.param .b32 foo3_param_1,
.param .align 16 .b8 foo3_param_2[16]
)
{
.reg .b32 %r<6>;
// %bb.0:
ld.param.u32 %r1, [foo3_param_1];
ld.param.v4.u32 {%r2, %r3, %r4, %r5}, [foo3_param_2];
st.u32 [%r1+12], %r5;
st.u32 [%r1+8], %r4;
st.u32 [%r1+4], %r3;
st.u32 [%r1], %r2;
ret;
// -- End function
```
On a related note, the following patch is a much smaller unit test case for this PR:
```
--- a/llvm/test/CodeGen/NVPTX/vector-stores.ll
+++ b/llvm/test/CodeGen/NVPTX/vector-stores.ll
@@ -29,3 +29,11 @@ define void @foo4(<4 x i32> %val, ptr %ptr) {
ret void
}
+; CHECK-LABEL: @v16i8
+define void @v16i8(ptr %a, ptr %b) {
+; CHECK: ld.v4.u32
+; CHECK: st.v4.u32
+ %v = load <16 x i8>, ptr %a
+ store <16 x i8> %v, ptr %b
+ ret void
+}
```
https://github.com/llvm/llvm-project/pull/67322
More information about the llvm-commits
mailing list