[PATCH] D103629: [AArch64] Cost-model i8 vector loads/stores
Sjoerd Meijer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 4 00:07:11 PDT 2021
SjoerdMeijer added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1254
+ // a vector with:
+ // ld1 {v0.b}[0], [x0]
+ // followed by some offset calculation like:
----------------
dmgreen wrote:
> Prior to D102938, this wasn't true and seems to still not be very true in general:
> https://godbolt.org/z/7KMrEqcMW
> Although the add's can be removed. The serialized ld1's won't be very cheap though, on many cpus.
>
> A factor of two might be enough to show they are expensive, but there would probably be some cases where performance was worse.
> As Eli says, optimizing the 4 x i8 case at least using a 32bit load and a shuffle sounds like a good idea.
> Prior to D102938, this wasn't true and seems to still not be very true in general:
> https://godbolt.org/z/7KMrEqcMW
While there are some variants, the trend is still roughly 2 * #elements.
> A factor of two might be enough to show they are expensive.
Yep, so that's what this patch does. Correct me if I am wrong, but looks like we agree on that.
> As Eli says, optimizing the 4 x i8 case at least using a 32bit load and a shuffle sounds like a good idea.
Yep, that's a nice suggestion, will look into that first.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D103629/new/
https://reviews.llvm.org/D103629
More information about the llvm-commits
mailing list