[PATCH] D120880: [AArch64] Insert subvector costs

Tue Mar 22 02:27:17 PDT 2022

dmgreen added a comment.

In D120880#3363347 <https://reviews.llvm.org/D120880#3363347>, @labrinea wrote:

> Do we care about vector alignment? I am just looking at the x86 code, which does. Also, does this change apply to subvector extract too?

By vector alignment - do you mean the alignment of the subvector inserted into the larger vector? If so - yep. We can insert aligned elements (a v4i8 inserted into a v16i8 is just an f32 lane mov providing the lane % 4 == 0), but not unaligned values.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2721
+        TLI->getTypeLegalizationCost(DL, SubTp);
+    if (SubLT.second.isVector()) {
+      int NumSubElts = SubLT.second.getVectorNumElements();
----------------
sdesmalen wrote:
> I think this code silently assumes fixed-width == NEON, so I think it's worth adding a check that vector type being inserted into is less than 128bits. (and something similar for the value of the index being within a certain range). For fixed-length SVE vectors where wider vectors are legal, these ranges may go beyond 128bits.
> 
> It would also be useful to have a test for that case.
It was intending to work on larger-than-legal types. As in a v16i8 inserted (aligned) into a v32i8 isn't really an insert after all, it's just legalized to being a different register, and should be cheap as a result. llvm shuffles can always be larger than legal types. I can add a check that the legal type is at most 128bits.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120880/new/

https://reviews.llvm.org/D120880