[PATCH] D53668: [LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 30 01:58:37 PDT 2018
Ayal accepted this revision.
Ayal added a comment.
This revision is now accepted and ready to land.
> Also added a test with stride 3.
Thanks, one was indeed needed.
LGTM, with few minor additional optional suggestions.
================
Comment at: include/llvm/Analysis/VectorUtils.h:425
+ /// is no other means (such as masking) to support them. This can happen when
+ /// we optimize for size and don't allow creating a scalar epilogue.
void invalidateGroupsRequiringScalarEpilogue();
----------------
The "if there is no other means..." is now part of the "This can happen when". I.e.,
"This can happen when optimizing for size forbids a scalar epilogue, and the gap cannot be filtered by masking the load/store."
================
Comment at: include/llvm/CodeGen/BasicTTIImpl.h:850
// We only scale the cost of loads since interleaved store groups aren't
// allowed to have gaps.
if (Opcode == Instruction::Load && VecTySize > VecTyLTSize) {
----------------
This scaling works just as well for gap-masked loads, right?
(Another place to remember extending when introducing support for gap-masked stores, btw)
================
Comment at: include/llvm/CodeGen/BasicTTIImpl.h:921
- if (!IsMasked)
+ if (!UseMaskForCond)
return Cost;
----------------
Comment here that UseMaskForGaps alone does not add to Cost, because its mask is uniform. Unlike below where it adds the cost of And-ing the two masks.
================
Comment at: lib/Analysis/VectorUtils.cpp:959
<< "LV: Invalidate candidate interleaved group due to gaps that "
- "require a scalar epilogue.\n");
+ "require a scalar epilogue (not allowed under optsize) or masking "
+ "of interleave-groups (not enabled). \n");
----------------
"or masking ..." >> "and cannot be masked (not enabled)."
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1243
+ /// or as a peel-loop to handle gaps in interleave-groups.
+ /// Under optsize we don't allow any iterations to execute in the scalar
+ /// loop.
----------------
"Under optsize" and when the trip count is very small "we don't ..."
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2077
Instruction *NewLoad;
- if (IsMaskRequired) {
- auto *Undefs = UndefValue::get(Mask[Part]->getType());
- auto *RepMask = createReplicatedMask(Builder, InterleaveFactor, VF);
- Value *ShuffledMask = Builder.CreateShuffleVector(
- Mask[Part], Undefs, RepMask, "interleaved.mask");
- NewLoad = Builder.CreateMaskedLoad(NewPtrs[Part], Group->getAlignment(),
- ShuffledMask, UndefVec,
- "wide.masked.vec");
+ if (IsMaskRequired || MaskForGaps) {
+ assert(useMaskedInterleavedAccesses(*TTI) &&
----------------
Rename `IsMaskRequired` to `[Is]MaskForCondRequired`?
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2084
+ auto *RepMask = createReplicatedMask(Builder, InterleaveFactor, VF);
+ Value *ShuffledMask = Builder.CreateShuffleVector(
+ Mask[Part], Undefs, RepMask, "interleaved.mask");
----------------
Rename `ShuffledMask` to `MaskForCond`?
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4393
+ // enabled it, because otherwise it either wouldn't have been created or
+ // it should have been invalidated by the costModel.
+ assert(useMaskedInterleavedAccesses(TTI) &&
----------------
"costModel" >> "cost model", or "CostModel"
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4649
+ IsScalarEpilogueAllowed = !OptForSize;
+
----------------
It's indeed good to record in CostModel the constraint forbidding a scalar epilogue, instead of passing an overloaded and abused OptForSize parameter around. It should be recorded at the outset, say at the constructor of CostModel, to be consistent; and isScalarEpilogueAllowed() should be used inside the CostModel instead of OptForSize throughout. This NFC refactoring can be done separately, before/after this patch.
================
Comment at: test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll:89
+; When enable-masked-interleaved-access is disabled the interleave-group is
+; invalidated, so we end up scalarizing.
; (Before the fix that this test checks, we used to create an epilogue despite
----------------
Add checks for this scalarizing when enable-masked-interleaved-access is disabled, or comment that it is already checked above?
https://reviews.llvm.org/D53668
More information about the llvm-commits
mailing list