[PATCH] D145614: [AARCH64] Enable STORE of v4i8 to help more vectorization opportunities

Wed Apr 5 09:57:36 PDT 2023

Carrot added a comment.

In D145614#4224190 <https://reviews.llvm.org/D145614#4224190>, @dmgreen wrote:

> In D145614#4214202 <https://reviews.llvm.org/D145614#4214202>, @Carrot wrote:
>
>> X86 backend also set store of v4i8 as custom. We have similar capability for v4i8.
>>
>> I can also add a real custom way of lowering store of v4i8 as following. It's more simple and natural than storing of v4i16.
>>
>>   t2: i32 = bitcast t1:v4i8
>>   t3: ch = store<store (s32) into %somewhere> t10, t2, address
>>
>> But v4i8 is not a legal type, so we can't see a store v4i8 dag node, so it looks not necessary.
>
> I believe that X86 will treat vector lanes differently to Arm/AArch64. For smaller types the vector will be widened by adding more elements (v4i8 [a,b,c,d] will become v8i8 [a,b,c,d,u,u,u,u]) as opposed to being promoted under aarch64 to larger sizes (v4i8 is promoted to v4i16, with the top half of each lane unused). They can both have their advantages and disadvantages. With SVE having t/b instructions the promotion can make more sense, and it is good to keep SVE and NEON inline. The same is true under MVE which only has 128bit vectors so more types are promoted, but this plays nicely into the how the t/b instructions operate.

Why with SVE t/b instructions the promotion of v4i8 -> v4i16 make more sense than the widen of v4i8 -> v8i8?

In my understanding the widen of v4i8 -> v8i8 usually is a nop, but the promotion of v4i8 -> v4i16 needs a real instruction.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145614/new/

https://reviews.llvm.org/D145614