[PATCH] D120912: [AArch64][SVE] Convert gather/scatter with a stride of 2 to contiguous loads/stores

Mon Mar 7 07:37:38 PST 2022

kmclaughlin added a comment.

In D120912#3357559 <https://reviews.llvm.org/D120912#3357559>, @efriedma wrote:

> Did you try using ld2/st2?  I guess the problem is that it accesses too many bytes?

Hi @efriedma, I did investigate whether I could use ld2/st2 instead and found that the performance was very similar compared with contiguous loads & stores. The problem with using ld2/st2 was that we could end up accessing too many bytes as you suggested. It's for this reason that we're instead using contiguous loads/stores and always interleave the predicate with pfalse, to ensure that we don't attempt to access beyond the last element which the gather/scatter would have accessed.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16628
+  if (Index->getOpcode() == ISD::STEP_VECTOR &&
+      Mask->getOpcode() != ISD::EXTRACT_SUBVECTOR &&
+      IndexType == ISD::SIGNED_SCALED) {
----------------
efriedma wrote:
> Why the restriction on the mask opcode?
This restriction was added downstream as a workaround to an issue which I have not been able to reproduce. Removing as I believe it is no longer required.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16636
+    unsigned ShiftAmt;
+    switch (MemVT.getSimpleVT().SimpleTy) {
+    case MVT::nxv2i64:
----------------
david-arm wrote:
> Are you assuming this is a legal/simple type here? For example, the type could be <vscale x 12 x i32>, which will be an extended type and `getSimpleVT` will assert. Could you add a test case for this? I think you can avoid this either by only doing this after legalisation or by rewriting as a sequence of if-else statements, i.e.
> 
>   if (MemVT == MVT::nxv2i64 || MemVT == MVT::nxv2f64
>   ...
> 
Hi @david-arm, I have changed this to be a sequence of if-else statements checking MemVT as suggested. I tried adding a test using types such as nxv12i32, but I found that this crashes when the stepvector has an illegal type with "Do not know how to widen the result of this operator!"

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16646
+    default:
+      return SDValue();
+    }
----------------
efriedma wrote:
> I guess you're not handling i16/i8 because you'd have to worry about extension?
This patch was only for handling legal types, though I think i8/i16 types could be supported in future if this would be beneficial.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120912/new/

https://reviews.llvm.org/D120912