[PATCH] D109445: [SVE][LoopVectorize] Optimise code generated by widenPHIInstruction
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 9 08:10:11 PDT 2021
sdesmalen added a comment.
Just a minor nit on the commit message, this patch is not really specific to AArch64 SVE but rather to scalable vectors.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4769-4771
+ // extraction of any lane. However, to generate better code, we still
+ // need to calculate values for the first n lanes since these could be
+ // required later (e.g. by a load instruction).
----------------
Hi @RosieSumpter, I think it's worth elaborating a little bit more on the 'generate better code' in the comment.
[(too) long explanation here]
>From what I understand, the code is better because the `extractelement` instruction that is otherwise generated (for scalar uses of this vector) may not always be folded away if the stepvector has multiple uses, leading to a redundant move (in case of element 0 for the vector-element-0 -> gpr move) or possibly expensive extractelement instructions (to extract a fixed-width lane from a scalable vector) for element > 0.
In the former case, the value for element 0 is freely available because it is the start value of the stepvector.
In the latter case, there will be a cost regardless. Either the additional `add/gep` generated below to offset from the start value of the stepvector, or the extract from the stepvector itself. It's just expected that the scalar code will be cheaper.
Can you maybe capture some of that in the comment? (albeit more succinctly)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109445/new/
https://reviews.llvm.org/D109445
More information about the llvm-commits
mailing list