[PATCH] D77881: [VectorUtils] add IR-level analysis for widening of shuffle mask
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Apr 11 06:23:27 PDT 2020
spatel marked 2 inline comments as done.
spatel added inline comments.
================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:401
void llvm::scaleShuffleMask(size_t Scale, ArrayRef<int> Mask,
SmallVectorImpl<int> &ScaledMask) {
----------------
lebedev.ri wrote:
> As a preliminary step, rename this to be `narrowShuffleMask()` ?
Yes, that would make more sense if we have both of these utils. Another possibility would be to make a single function with a ScaleUpOrDown bool param, but that's probably less readable in calling code.
================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:432-466
+ // Step through the input mask by splitting into Scale-sized subsections.
+ ScaledMask.clear();
+ for (int i = 0, e = NumElts; i != e; i += Scale) {
+ // Check if the elements that map to a widened mask element are consecutive.
+ int OutputElt;
+ for (int j = 0; j != Scale; ++j) {
+ int MaskElt = Mask[i + j];
----------------
lebedev.ri wrote:
> Ok, in light of not accepting partial sentinel values, what are your thoughts on the following then:
> ```
> // Step through the input mask by splitting into Scale-sized subsections.
> ScaledMask.clear();
> ScaledMask.reserve(NumElts / Scale);
>
> for (ArrayRef<int> MaskSlice = Mask.take_front(Scale),
> RemainingMaskElts = Mask.take_back(Mask.size() - Scale);
> !MaskSlice.empty(); MaskSlice = RemainingMaskElts.take_front(Scale),
> RemainingMaskElts = RemainingMaskElts.take_back(
> RemainingMaskElts.size() - Scale)) {
> assert((int)MaskSlice.size() == Scale && "Expected Scale-sized slice.");
>
> // The slice must be homogeneous.
> int OutputElt;
>
> if (MaskSlice.front() < 0) {
> // Negative values (undef or other "sentinel" values) must be equal across
> // the entire subsection.
> if (!is_splat(MaskSlice))
> return false;
> OutputElt = MaskSlice.front();
> } else {
> // A positive mask element must be cleanly divisible.
> if (MaskSlice.front() % Scale != 0)
> return false;
> // The elements of the subsection must be consecutive.
> auto ExpectedSlice =
> llvm::seq(MaskSlice.front(), MaskSlice.front() + Scale);
> assert(llvm::size(ExpectedSlice) == Scale && "Got wrong sequence.");
> if (!std::equal(adl_begin(MaskSlice), adl_end(MaskSlice),
> adl_begin(ExpectedSlice)))
> return false;
> OutputElt = MaskSlice.front() / Scale;
> }
>
> // All narrow elements in this subsection map to the same wider element.
> ScaledMask.push_back(OutputElt);
> }
> ```
> https://godbolt.org/z/yoLNmG
>
> This results in ~Scale less divisions.
Very C++. :)
I'm good with that...especially since you already wrote it and checked the optimization!
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D77881/new/
https://reviews.llvm.org/D77881
More information about the llvm-commits
mailing list