[PATCH] D136153: [AArch64] Allow cost computation for interleaved accesses

Thu Dec 1 07:24:35 PST 2022

mgabka added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13761-13762
+  unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());
+  unsigned VecSize = (UseScalable || EC.isScalable())
+                         ? Subtarget->getMinSVEVectorSizeInBits()
+                         : 128;
----------------
paulwalker-arm wrote:
> The usage of `getMinSVEVectorSizeInBits` here is only relevant for fixed length vector types.  When working with scalable types we already know the size of the legal types (i.e. vscale * 128) and so the `: 128` part should be fine because `Scalable_EC/(vscale * 128)` ==> `EC.getKnownMinValue() / 128`).
> 
> Fixing this should mean you no longer require `vscale_range` attributes for your scalable vector tests.
ok, I will simplify it

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13782-13783
+
+  if (EC.isScalable())
+    return true;

----------------
paulwalker-arm wrote:
> Is this correct? Do you really want to say all scalable vector types are legal?
my understanding is that at the moment we do not know which  scalable vector types are legal or not for this type of accesses, it depends how this is going to be implemented, right?

getInterleavedMemoryOpCost is just calling isLegalInterleavedAccessType, but later there is a check and returns invalid cost for all scalable VectorTypes.
I probably should add a TODO comment here.

I thought that this might be enough for now, we will need to add some extra tests later to check if the right types are legal or not.

At the moment the isLegalInterleavedAccessType is only used for the fixed width SVE so I think I should add extra assert here as well.

I might also try to check the scalable types here assuming legal element sizes etc, but at the moment we aren't able to test it.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2415-2416
+          Factor * TLI->getNumInterleavedAccesses(SubVecTy, DL, UseScalable);
+      if (isa<ScalableVectorType>(VecTy))
+        return InstructionCost::getInvalid();
+      return Cost;
----------------
paulwalker-arm wrote:
> This is worth a comment because the naive optimisation is to simply move the check higher up whereas you deliberately want to exercise as much of the costing code as possible, even though we'll currently ignore the result.
> 
ok will add a comment, fair point

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2420-2422
+  // Calling the base implementation should only happen
+  // when we can scalarize the operation, what is not possible for
+  // scalable vectors.
----------------
paulwalker-arm wrote:
> Perhaps simplify to just "All other uses of scalable vectors are not legal."? With that said, looking at the base implementation of getInterleavedMemoryOpCost suggests it does the right thing so you could remove this code and just fall into the base version as before.
yeah you are right, it has an early check at the beginning, I will change it

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll:1-6
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -mtriple=aarch64-none-linux-gnu -S -loop-vectorize -instcombine -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses=true -mattr=+sve -scalable-vectorization=on -runtime-memory-check-threshold=24 < %s | FileCheck %s
+
+; At the moment LLVM is not capable to vectorize interleaved accesses operating
+; on scalable vectors. This test is checking if the LV decides to use
+; gather/satter for such cases.
----------------
paulwalker-arm wrote:
> Does the code affect the output of these tests? Perhaps in this instance it is better to commit the tests first in isolation and then it's clear to see what effect this patch has (or rather, to show this patch does not change the existing behaviour).
My patch does not affect the output for this test.

This test is also a copy of llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
but uses aarch64 target triple and sve feature.
Is it ok to do it like that?

I wanted to demonstrate that despite removing the early exit from  LoopVectorizationCostModel::getInterleaveGroupCost for scalable VF we still do not attempt to vectorize interleaved groups for AArch64.

In that instance yes It might be worth to pre commit this test.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136153/new/

https://reviews.llvm.org/D136153