[PATCH] D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available.

Thu Jan 2 04:24:57 PST 2020

Ayal accepted this revision.
Ayal added a comment.
This revision is now accepted and ready to land.

This looks good to me, adding a couple of minor last nits, thanks!

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfo.h:609

+  /// Returns true if the target machine can represent \p V as a masked gather
+  /// or scatter operation.
----------------
"represent \p V" >> "represent a vectorized version of \p V" ?
(This is the original comment, but the original context was LV.)

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2314
   // of using gather/scatters (if available).
+  if (TTI && TTI->isLegalGatherOrScatter(MemAccess)) {
+    LLVM_DEBUG(dbgs() << "  But leaving as a gather/scatter instead.\n");
----------------
May want to add a cl::opt flag to turn this filtering on/off.

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2315
+  if (TTI && TTI->isLegalGatherOrScatter(MemAccess)) {
+    LLVM_DEBUG(dbgs() << "  But leaving as a gather/scatter instead.\n");
+    return;
----------------
Suggest to start the line with "LAA:"

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4612
                             : !(isLegalMaskedStore(Ty, Ptr, Alignment) ||
                                 isLegalMaskedScatter(Ty, Alignment));
   }
----------------
This is another potential customer of [TTI.]isLegalGatherOrScatter(I). After which CM's isLegalMaskedGather() and isLegalMaskedScatter() become useless and should be removed.

This also suggests a TTI.isLegalMaskedLoadOrStore(Value *), btw.

But these are independent cleanups.

================
Comment at: llvm/test/Transforms/LoopVectorize/X86/optsize.ll:151
 ; AUTOVF-LABEL: for.body:
-define void @scev4stride1(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32 %k) #2 {
+define void @scev4stride1(i16* noalias nocapture %a, i16* noalias nocapture readonly %b, i32 %k) #2 {
 for.body.preheader:
----------------
dmgreen wrote:
> Ayal wrote:
> > Would indeed be good to have an i16 version retaining current checks (unvectorized behavior), as skx  supports gathers of i32 but not for i16; and also the original i32 version with checks for vectorized code.
> The gather version of the original 32bit code here seems to be quite large for -Os. There are large constant pool entries that the gather needs to access?
The actual size of vectorized loops under -Os does deserve further investigation, in general.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71919/new/

https://reviews.llvm.org/D71919