[PATCH] D103629: [AArch64] Cost-model i8 vector loads/stores
Sjoerd Meijer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 4 09:00:27 PDT 2021
SjoerdMeijer added a comment.
In D103629#2797291 <https://reviews.llvm.org/D103629#2797291>, @efriedma wrote:
>> And while we don't have a load instruction that supports this
>
> If `<4 x i8>` loads matter, we should probably convert them to a 32-bit load followed by a zip1, which should would have a cost of 2. (Or possibly 3 on big-endian, I guess.) Basically the inverse of LowerTruncateVectorStore.
Question about this. I will keep looking a bit longer because my zip1-fu is not so strong, but I was struggling to see how codegen would look like. For an example like this:
define <4 x i32> @f(<4 x i8>* %a, <4 x i32> %b) {
%x = load <4 x i8>, <4 x i8>* %a
%y = sext <4 x i8> %x to <4 x i32>
%z = add <4 x i32> %y, %b
ret <4 x i32> %z
}
I am failing to see how with something like
fmov s0, w0
zip1.8d v0, v0, v0
I would get the bytes sign extended and in the right place with zip1 for the 128-bit add.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D103629/new/
https://reviews.llvm.org/D103629
More information about the llvm-commits
mailing list