[PATCH] D77881: [VectorUtils] add IR-level analysis for widening of shuffle mask

Sat Apr 11 06:23:27 PDT 2020

spatel marked 2 inline comments as done.
spatel added inline comments.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:401

 void llvm::scaleShuffleMask(size_t Scale, ArrayRef<int> Mask,
                             SmallVectorImpl<int> &ScaledMask) {
----------------
lebedev.ri wrote:
> As a preliminary step, rename this to be `narrowShuffleMask()` ?
Yes, that would make more sense if we have both of these utils. Another possibility would be to make a single function with a ScaleUpOrDown bool param, but that's probably less readable in calling code.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:432-466
+  // Step through the input mask by splitting into Scale-sized subsections.
+  ScaledMask.clear();
+  for (int i = 0, e = NumElts; i != e; i += Scale) {
+    // Check if the elements that map to a widened mask element are consecutive.
+    int OutputElt;
+    for (int j = 0; j != Scale; ++j) {
+      int MaskElt = Mask[i + j];
----------------
lebedev.ri wrote:
> Ok, in light of not accepting partial sentinel values, what are your thoughts on the following then:
> ```
>   // Step through the input mask by splitting into Scale-sized subsections.
>   ScaledMask.clear();
>   ScaledMask.reserve(NumElts / Scale);
> 
>   for (ArrayRef<int> MaskSlice = Mask.take_front(Scale),
>                      RemainingMaskElts = Mask.take_back(Mask.size() - Scale);
>        !MaskSlice.empty(); MaskSlice = RemainingMaskElts.take_front(Scale),
>                      RemainingMaskElts = RemainingMaskElts.take_back(
>                          RemainingMaskElts.size() - Scale)) {
>     assert((int)MaskSlice.size() == Scale && "Expected Scale-sized slice.");
> 
>     // The slice must be homogeneous.
>     int OutputElt;
> 
>     if (MaskSlice.front() < 0) {
>       // Negative values (undef or other "sentinel" values) must be equal across
>       // the entire subsection.
>       if (!is_splat(MaskSlice))
>         return false;
>       OutputElt = MaskSlice.front();
>     } else {
>       // A positive mask element must be cleanly divisible.
>       if (MaskSlice.front() % Scale != 0)
>         return false;
>       // The elements of the subsection must be consecutive.
>       auto ExpectedSlice =
>           llvm::seq(MaskSlice.front(), MaskSlice.front() + Scale);
>       assert(llvm::size(ExpectedSlice) == Scale && "Got wrong sequence.");
>       if (!std::equal(adl_begin(MaskSlice), adl_end(MaskSlice),
>                       adl_begin(ExpectedSlice)))
>         return false;
>       OutputElt = MaskSlice.front() / Scale;
>     }
> 
>     // All narrow elements in this subsection map to the same wider element.
>     ScaledMask.push_back(OutputElt);
>   }
> ```
> https://godbolt.org/z/yoLNmG
> 
> This results in ~Scale less divisions.
Very C++. :)
I'm good with that...especially since you already wrote it and checked the optimization!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77881/new/

https://reviews.llvm.org/D77881