[PATCH] D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available.
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 2 04:24:57 PST 2020
Ayal accepted this revision.
Ayal added a comment.
This revision is now accepted and ready to land.
This looks good to me, adding a couple of minor last nits, thanks!
================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:609
+ /// Returns true if the target machine can represent \p V as a masked gather
+ /// or scatter operation.
----------------
"represent \p V" >> "represent a vectorized version of \p V" ?
(This is the original comment, but the original context was LV.)
================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2314
// of using gather/scatters (if available).
+ if (TTI && TTI->isLegalGatherOrScatter(MemAccess)) {
+ LLVM_DEBUG(dbgs() << " But leaving as a gather/scatter instead.\n");
----------------
May want to add a cl::opt flag to turn this filtering on/off.
================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2315
+ if (TTI && TTI->isLegalGatherOrScatter(MemAccess)) {
+ LLVM_DEBUG(dbgs() << " But leaving as a gather/scatter instead.\n");
+ return;
----------------
Suggest to start the line with "LAA:"
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4612
: !(isLegalMaskedStore(Ty, Ptr, Alignment) ||
isLegalMaskedScatter(Ty, Alignment));
}
----------------
This is another potential customer of [TTI.]isLegalGatherOrScatter(I). After which CM's isLegalMaskedGather() and isLegalMaskedScatter() become useless and should be removed.
This also suggests a TTI.isLegalMaskedLoadOrStore(Value *), btw.
But these are independent cleanups.
================
Comment at: llvm/test/Transforms/LoopVectorize/X86/optsize.ll:151
; AUTOVF-LABEL: for.body:
-define void @scev4stride1(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32 %k) #2 {
+define void @scev4stride1(i16* noalias nocapture %a, i16* noalias nocapture readonly %b, i32 %k) #2 {
for.body.preheader:
----------------
dmgreen wrote:
> Ayal wrote:
> > Would indeed be good to have an i16 version retaining current checks (unvectorized behavior), as skx supports gathers of i32 but not for i16; and also the original i32 version with checks for vectorized code.
> The gather version of the original 32bit code here seems to be quite large for -Os. There are large constant pool entries that the gather needs to access?
The actual size of vectorized loops under -Os does deserve further investigation, in general.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71919/new/
https://reviews.llvm.org/D71919
More information about the llvm-commits
mailing list