[PATCH] D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size

Wed Aug 22 05:38:13 PDT 2018

Ayal added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4948

-  // If we don't know the precise trip count, don't try to vectorize.
-  if (TC < 2) {
-    ORE->emit(
-        createMissedAnalysis("UnknownLoopCountComplexCFG")
-        << "unable to calculate the loop count due to complex control flow");
-    LLVM_DEBUG(
-        dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
+  if (TC == 1) {
+    LLVM_DEBUG(dbgs() << "LV: Aborting, single iteration (non) loop.\n");
----------------
hsaito wrote:
> Ayal wrote:
> > hsaito wrote:
> > > reames wrote:
> > > > There's a mix of seemingly unrelated changes here.  This is one example.  It would be good to land these separately.  
> > > This change is relevant in the sense that TC < 2 is split into two parts: TC==1 and TC==0. TC==0 case will then have a chance of hitting Legal->canFoldTailByMasking() later. As a result, TC==1 case can return early here, with a very crisp messaging. 
> > > 
> > > Having said that, if you'd like to see the same ORE->emit(...) LLVM_DEBUG() stuff here, I won't go against that. Messaging change can be a separate commit.
> > > 
> > > Ayal, we need ORE->emit() here, in addition to LLVM_DEBUG(), right, regardless of whether we change the actual message?
> > Yes, this change is unrelated and should land separately. The original ORE message is wrong. Not sure the TC==1 qualifies for any ORE message - "loops" with a known trip count of one are simply irrelevant for vectorization; though we could vectorize them with a mask...
> I think every non-vectorized loop that goes through vectorizer's analysis qualifies for ORE. After all, TC==1 knowledge may or may not be available to the programmer otherwise. 
ok

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2748
+  // If tail is to be folded, vector loop takes care of all iterations.
+  Value *CheckMinIters = Builder.getFalse();
+  if (!Cost->foldTailByMasking())
----------------
hsaito wrote:
> Personally, I don't like to see the IR like the following going out of the vectorizer, even though that's later cleaned up tirivially.
>     %1 = false       // unused and thus will be trivially cleaned up later.
>     %2 = icmp ...
> Changing this part of the patch to
>      Value *CheckMinIters = nullptr;
>      if ()
>          ....
>      else
>          CheckMinIters = Builder.getFalse();
> would make cleaner IR going out for common cases, at a small price to pay in ease-of-reading.
> 
> If you agree, great. If not, I won't make a big deal about it. At the end of the day, we should clean up this area of code such that we don't have to rely on CheckMinIters being "false" constant to cleanup the unnecessary min iter check. That improvement can be done as a separate NFC patch.
> 
> 
One could change this part of the patch to create an unconditional branch instead of a conditional one from BB to NewBB; or avoid creating NewBB / calling `emitMinimumIterationCountCheck()` altogether `if (foldTailByMasking())`. Both alternatives will change the dominance structure and thus require special attention when updating DT in `updateAnalysis()`. The latter would also need to record the EntryBlock for cases where `LoopBypassBlocks` remains empty.

It's simpler to keep the existing skeletal structure intact, and rely on subsequent trivial dce cleanup.

If desired, such alternatives should be proposed as a separate follow-up NFC patch.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2990
+  // If tail is to be folded, we know we don't need to run the remainder.
+  Value *CmpN = Builder.getTrue();
+  if (!Cost->foldTailByMasking())
----------------
hsaito wrote:
> See the comment on CheckMinIters.
ditto.

================
Comment at: test/Transforms/LoopVectorize/X86/optsize.ll:4
+; attributes.
+; RUN: opt < %s -loop-vectorize -S -mtriple=x86_64-apple-darwin -mcpu=skx | FileCheck %s
+
----------------
hsaito wrote:
> Is the test really dependent on the apple triple?
-mtriple=x86_64-unknown-linux works just as well

https://reviews.llvm.org/D50480