[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree
Anton Afanasyev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 7 03:20:52 PDT 2021
anton-afanasyev marked 3 inline comments as done.
anton-afanasyev added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4670-4671
+ // Create shuffle to resize vector
+ for (unsigned I = 0; I < NumElts; I++)
+ Mask[I] = (I < NumScalars) ? I : UndefMaskElem;
+ V = Builder.CreateShuffleVector(V, UndefValue::get(V->getType()), Mask);
----------------
ABataev wrote:
> anton-afanasyev wrote:
> > ABataev wrote:
> > > anton-afanasyev wrote:
> > > > ABataev wrote:
> > > > > ```
> > > > > SmallVector<int, 16> Mask(NumElts, UndefMaskElem);
> > > > > std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);
> > > > > ```
> > > > >
> > > > > Also, I think just `std::iota(Mask.begin(), Mask.end(), 0); ` shall work too
> > > > Thanks, changed to `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);`
> > > >
> > > > Yes, `std::iota(Mask.begin(), Mask.end(), 0);` gives the same result, but wouldn't it lead to redundant code lowered?
> > > `std::iota(Mask.begin(), Mask.end(), 0);` will produce the same code as `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);` but only if `NumScalars <= Mask.size() * 2`. Otherwise the compiler may crash in some cases. So, better to keep `std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);`
> > `Mask.size()` is just `NumElts` and `NumScalars <= NumElts` (is it worth to add assert for this?), so `NumScalars <= Mask.size() * 2` for all cases.
> >
> > What I did mean by "the same result" is that shuffle
> > ```
> > %a = shufflevector <2 x float> %b, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
> > ```
> > is equivalent to
> > ```
> > %a = shufflevector <2 x float> %b, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
> > ```
> > (here `NumScalars == 2` and `NumElts == 4`).
> We may have a situation say NumScalars is 2 and NumElts is 8, if insert the same scalars several times.
Do you mean "NumScalars is 8 and NumElts is 2"?
This case is excluded, we can get only unique inserts (inserts with unique index) from `findBuildAggregate()` for now. Also, we even assert
```
assert(ReuseShuffleIndicies.empty() && "All inserts should be unique");
```
at first line of `InsertElement` at the `buildTree_rec()` function.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98714/new/
https://reviews.llvm.org/D98714
More information about the llvm-commits
mailing list