[PATCH] D145614: [AARCH64] Enable STORE of v4i8 to help more vectorization opportunities

Mon Mar 27 07:16:50 PDT 2023

dmgreen added a comment.

In D145614#4214202 <https://reviews.llvm.org/D145614#4214202>, @Carrot wrote:

> X86 backend also set store of v4i8 as custom. We have similar capability for v4i8.
>
> I can also add a real custom way of lowering store of v4i8 as following. It's more simple and natural than storing of v4i16.
>
>   t2: i32 = bitcast t1:v4i8
>   t3: ch = store<store (s32) into %somewhere> t10, t2, address
>
> But v4i8 is not a legal type, so we can't see a store v4i8 dag node, so it looks not necessary.

I believe that X86 will treat vector lanes differently to Arm/AArch64. For smaller types the vector will be widened by adding more elements (v4i8 [a,b,c,d] will become v8i8 [a,b,c,d,u,u,u,u]) as opposed to being promoted under aarch64 to larger sizes (v4i8 is promoted to v4i16, with the top half of each lane unused). They can both have their advantages and disadvantages. With SVE having t/b instructions the promotion can make more sense, and it is good to keep SVE and NEON inline. The same is true under MVE which only has 128bit vectors so more types are promoted, but this plays nicely into the how the t/b instructions operate.

I think it is worth separating the cost model controls for the SLP vectorizer and the codegen issues of the produced code.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145614/new/

https://reviews.llvm.org/D145614