[PATCH] D75980: [LV] Generate RT checks up-front and remove them if required.

Fri Feb 19 08:32:27 PST 2021

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:460
                       LoopVectorizationCostModel *CM, BlockFrequencyInfo *BFI,
-                      ProfileSummaryInfo *PSI)
+                      ProfileSummaryInfo *PSI, GeneratedRTChecks &Check)
       : OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),
----------------
ebrevnov wrote:
> Would "RTChecks" be a better name?
Agreed, changed to `RTChecks`, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1888
+/// vectorize, the checks are moved backed. If deciding not to vectorize, the
+/// temporary block is completely removed.
+struct GeneratedRTChecks {
----------------
ebrevnov wrote:
> Some general notes. I would like us to:
> 1) Consolidate all the work related to runtime checks under this helper abstraction. Currently, it looks like some functionality still on it's old place. Find more comments on this as you go.
> 2) Make clear boundaries between this GeneratedRTChecks and its users. Please, convert it to class and define clear public API.
That's a great suggestion, thanks! I moved most of the code from `emitSCEVChecks` & `emitMemRuntimeChecks` here. There's some code (mostly assertions),  which remain in the original functions, because they need access to various member of ILV.

The  new function names potentially can be improved and they should have doc-comments, but I'd prefer to first settle on the final API before doing that.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1915
+    BasicBlock *LoopHeader = Preheader->getSingleSuccessor();
+    TmpBlock = SplitBlock(Preheader, Preheader->getTerminator(), DT, LI,
+                          nullptr, "tmp.rtchecks");
----------------
ebrevnov wrote:
> The goal is to create "un-linked" block(s). Right? In that case, would it be simpler to not depend on Preheader and use CreateBlock directly. I believe it would be easier to un-link the block then.
I updated the code put SCEV and memory checks in separate blocks. It still uses `SplitBlock`, because this keeps the blocks in the DT/LI, which may be used during SCEV expansion and it also keeps them linked to the right predecessors during expansion (SCEVExpander tries to find values to re-use in predecessors).

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1958
+
+    SmallPtrSet<Value *, 8> Removed;
+    // Completely remove the block.
----------------
ebrevnov wrote:
> I'm surprised to see the following piece of code dealing with deletion. I would expected SCEVExpanderCleaner does all necessary cleanup. It should be enough just to properly set "markResultUsed". If this is not the case can we rework in this direction?
The extra cleanup is needed for instructions create directly by `addRuntimeChecks`, like compares, and/ors which compare pointer bounds (which are generated through SCEV expansion). I updated the comment and made it clearer. It is also only needed for the block with the mem-checks.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9519
+  {
+    GeneratedRTChecks Checks(L->getLoopPreheader(), *PSE.getSE(), DT);
+    InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, 1, LVL,
----------------
ebrevnov wrote:
> Why do we need to create GeneratedRTChecks  outside of InnerLoopVectorizer? Can/Should it have bigger life time?
For some users (e.g. epilogue vectorization), `GeneratedRTChecks` needs to be shared  unfortunately.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9768

-  LVP.setBestPlan(VF.Width, IC);
+  if (Requirements.doesNotMeet(F, L, Hints)) {
+    LLVM_DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "
----------------
ebrevnov wrote:
> Why this was moved from its original place?
This is only needed for the follow-up, I moved it to the original place for this patch.

================
Comment at: llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll:59
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[TMP4:%.*]] = phi i8* [ [[TMP1]], %vector.memcheck ], [ [[TMP1]], %loop.preheader ], [ [[TMP1]], %middle.block ]
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 500, %middle.block ], [ 0, %loop.preheader ], [ 0, %vector.memcheck ]
----------------
ebrevnov wrote:
> Looks like we stopped vectorizing something, did we?
We still vectorize, but we now do not create an unnecessary phi node (which has no users)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75980/new/

https://reviews.llvm.org/D75980