[PATCH] D48332: [AArch64] Add custom lowering for v4i8 trunc store
Adhemerval Zanella via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 20 11:50:04 PDT 2018
zatrazz added a comment.
In https://reviews.llvm.org/D48332#1136937, @efriedma wrote:
> I wonder if we should prefer to widen `<2 x i8>` and `<4 x i8>` to `<8 x i8>` instead of promoting to `<4 x i16>`. It would make stores like this a bit cheaper. Maybe an interesting experiment at some point (mostly just modifying AArch64TargetLowering::getPreferredVectorAction, I think, and seeing what happens to the generated code).
I tried your suggestion, but without further tuning in vector lowering this does not yield much gain on a vector store operation. The operation:
%0 = trunc <4 x i32> %a to <4 x i8>
store <4 x i8> %0, <4 x i8>* %p, align 4, !tbaa !2
is scalarized because LowerBUILD_VECTOR can't really see a good pattern to use on it:
Custom lowering: t49: v8i8 = BUILD_VECTOR t37, t40, t43, t46, undef:i32, undef:i32, undef:i32, undef:i32
AArch64TargetLowering::ReconstructShuffle
Reshuffle failed: span too large for a VEXT to cope
LowerBUILD_VECTOR: alternatives failed, creating sequence of INSERT_VECTOR_ELT
Maybe if we handle v4i8 as v4i32 we could get a better code generation, but also it would require some more tuning in generic code. I do see a better code generation for trunc store v2i32 to v2i8, but I am not convinced that this vector type should be tuned.
> Do we need similar handling to this patch for `<2 x i16>` or `<2 x i8>`?
The trunc store for v2i16 to v2i8 and v4i32 to v4i8 indeed can be optimized, but I also think it can be orthogonal to this optimization.
Repository:
rL LLVM
https://reviews.llvm.org/D48332
More information about the llvm-commits
mailing list