[PATCH] D120912: [AArch64][SVE] Convert gather/scatter with a stride of 2 to contiguous loads/stores
Kerry McLaughlin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 7 07:37:38 PST 2022
kmclaughlin added a comment.
In D120912#3357559 <https://reviews.llvm.org/D120912#3357559>, @efriedma wrote:
> Did you try using ld2/st2? I guess the problem is that it accesses too many bytes?
Hi @efriedma, I did investigate whether I could use ld2/st2 instead and found that the performance was very similar compared with contiguous loads & stores. The problem with using ld2/st2 was that we could end up accessing too many bytes as you suggested. It's for this reason that we're instead using contiguous loads/stores and always interleave the predicate with pfalse, to ensure that we don't attempt to access beyond the last element which the gather/scatter would have accessed.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16628
+ if (Index->getOpcode() == ISD::STEP_VECTOR &&
+ Mask->getOpcode() != ISD::EXTRACT_SUBVECTOR &&
+ IndexType == ISD::SIGNED_SCALED) {
----------------
efriedma wrote:
> Why the restriction on the mask opcode?
This restriction was added downstream as a workaround to an issue which I have not been able to reproduce. Removing as I believe it is no longer required.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16636
+ unsigned ShiftAmt;
+ switch (MemVT.getSimpleVT().SimpleTy) {
+ case MVT::nxv2i64:
----------------
david-arm wrote:
> Are you assuming this is a legal/simple type here? For example, the type could be <vscale x 12 x i32>, which will be an extended type and `getSimpleVT` will assert. Could you add a test case for this? I think you can avoid this either by only doing this after legalisation or by rewriting as a sequence of if-else statements, i.e.
>
> if (MemVT == MVT::nxv2i64 || MemVT == MVT::nxv2f64
> ...
>
Hi @david-arm, I have changed this to be a sequence of if-else statements checking MemVT as suggested. I tried adding a test using types such as nxv12i32, but I found that this crashes when the stepvector has an illegal type with "Do not know how to widen the result of this operator!"
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16646
+ default:
+ return SDValue();
+ }
----------------
efriedma wrote:
> I guess you're not handling i16/i8 because you'd have to worry about extension?
This patch was only for handling legal types, though I think i8/i16 types could be supported in future if this would be beneficial.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D120912/new/
https://reviews.llvm.org/D120912
More information about the llvm-commits
mailing list