[PATCH] D109445: [SVE][LoopVectorize] Optimise code generated by widenPHIInstruction

Thu Sep 9 08:10:11 PDT 2021

sdesmalen added a comment.

Just a minor nit on the commit message, this patch is not really specific to AArch64 SVE but rather to scalable vectors.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4769-4771
+          // extraction of any lane. However, to generate better code, we still
+          // need to calculate values for the first n lanes since these could be
+          // required later (e.g. by a load instruction).
----------------
Hi @RosieSumpter, I think it's worth elaborating a little bit more on the 'generate better code' in the comment.

[(too) long explanation here]

>From what I understand, the code is better because the `extractelement` instruction that is otherwise generated (for scalar uses of this vector) may not always be folded away if the stepvector has multiple uses, leading to a redundant move (in case of element 0 for the vector-element-0 -> gpr move) or possibly expensive extractelement instructions (to extract a fixed-width lane from a scalable vector) for element > 0.

In the former case, the value for element 0 is freely available because it is the start value of the stepvector.
In the latter case, there will be a cost regardless. Either the additional `add/gep` generated below to offset from the start value of the stepvector, or the extract from the stepvector itself. It's just expected that the scalar code will be cheaper.

Can you maybe capture some of that in the comment? (albeit more succinctly)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109445/new/

https://reviews.llvm.org/D109445