[PATCH] D145163: Add support for vectorization of interleaved memory accesses for scalable VF
Philip Reames via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 14 08:36:35 PDT 2023
reames added inline comments.
================
Comment at: llvm/include/llvm/Analysis/VectorUtils.h:596
+/// Return an anonymous struct value containing multiple smaller vectors
+/// deinterleaved from the larger input vector.
----------------
Unless you have plans to reuse these, this is just an implementation detail of the vectorizer. As such, these would be better as static functions in LoopVectorize.cpp
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2761
for (unsigned Part = 0; Part < UF; Part++) {
- Value *StridedVec = Builder.CreateShuffleVector(
- NewLoads[Part], StrideMask, "strided.vec");
+ Value *StridedVec = Builder.CreateExtractValue(NewLoads[Part], I);
----------------
The interface here feels really awkward for fixed length vectors. We have to create this dummy struct type, construct it, destruct it, and we loose the ability to slice out the inactive lanes.
I almost wonder if this code would be clearer without the helper function at all. With an explicit version based on scalable type here, we could do a simplified version of this loop with an early return and leave the fixed length codegen unchanged.
I'd be tempted to try that and see if the overall code quality looked reasonable.
You could also try a lambda which enumerate the active lanes (i.e. doing the shuffle or extract as required), and move the handling of the bitcast and reverse to a callback. This might be too much complexity though.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2713
Group->addMetadata(NewLoad);
- NewLoads.push_back(NewLoad);
+ NewLoads.push_back(Builder.CreateVectorDeinterleave(
+ NewLoad, InterleaveFactor, "strided.vec"));
----------------
huntergr wrote:
> reames wrote:
> > Having this be only in the normal load path seems unlikely to be correct. Surely we must also handle masked loads as well?
> This does handle masked loads -- 'NewLoad = Builder.CreateAligned....' is a standalone statement on the else with no opening brace. I've added a blank line to perhaps make that a little more obvious.
>
> Unless there's something else I've missed?
Yeah, I got confused by the brace style in the code above.
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
; RUN: opt -opaque-pointers=0 < %s -passes=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -prefer-predicate-over-epilogue=scalar-epilogue \
----------------
Please submit a separate change to autogen this file, and then rebase. Same with the other file you switched to autogen.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D145163/new/
https://reviews.llvm.org/D145163
More information about the llvm-commits
mailing list