[PATCH] D147114: [LV] Use BFI to adjust cost of predicated instructions

Thu Apr 20 01:26:34 PDT 2023

ebrevnov added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10586
     BlockFrequencyInfo *BFI = nullptr;
-    if (PSI && PSI->hasProfileSummary())
+    //if (PSI && PSI->hasProfileSummary())
       BFI = &AM.getResult<BlockFrequencyAnalysis>(F);
----------------
nikic wrote:
> ebrevnov wrote:
> > nikic wrote:
> > > ebrevnov wrote:
> > > > nikic wrote:
> > > > > ebrevnov wrote:
> > > > > > dmgreen wrote:
> > > > > > > Is BFI intended to be required now? Some of the other code makes it look like it is still optional.
> > > > > > > Is BFI intended to be required now? 
> > > > > > Yes. In fact, it should look like:
> > > > > > BlockFrequencyInfo *BFI = &AM.getResult<BlockFrequencyAnalysis>(F);
> > > > > > Will fix.
> > > > > > 
> > > > > > > Some of the other code makes it look like it is still optional.
> > > > > > Right. Before this change there was exactly one use of BFI guarded by PSI. I would suggest not to touch this place (at least now) because if compile time regression shows up we may want to conditionally request BFI for predicated loops only.
> > > > > This will definitely regress compile-time. You are only allowed to query BFI in PGO builds, which implies presence of PSI.
> > > > But that depends on BPI availability from previous passes and needs of  BPI by following passes, right? 
> > > Not sure what you mean here. BPI is a dependency of BFI, so it will be calculated if not already available?
> > > 
> > > 
> > I meant BFI (not BPI). Sorry for confusion.
> I still don't really get what you mean here. If you use getResult (rather than getCachedResult) BFI will be fetched regardless of whether it is available from previous passes.
> 
> The important part is that you shouldn't fetch BFI if PSI is not available. In test cases, you need to request PSI explicitly, in the pass pipeline it will be automatically available for PGO builds.
> I still don't really get what you mean here. If you use getResult (rather than getCachedResult) BFI will be fetched regardless of whether it is available from previous passes.
But if it is available from previous passes there will be no compile time impact.

> "This will definitely regress compile-time."
I'm trying to show that this statement is not necessarily true and depends on different factors including particular pass ordering. Namely, getResult<BlockFrequencyAnalysis> is free (regardless of PGO/PSI)  if BFI is available from previous passes. Moreover, if one of the passes following LV requests BFI and loop is not vectorized (because vectorization invalidates BFI) then there is no extra overhead either. Thus there is a pretty high chance that there will be no any impact in the practice. If it is it will be possible to significantly reduce the impact by requesting BFI lazily... but I would really like not to go this way with out strong need.

> 
> The important part is that you shouldn't fetch BFI if PSI is not available. In test cases, you need to request PSI explicitly, in the pass pipeline it will be automatically available for PGO builds.
Not sure, why you say "you shouldn't fetch BFI if PSI is not available". There are many cases when we do that. Why vectorizer can't do the same (is it due to potential compile time increase)? I think use of BFI is the best solution for the problem. Do you have any ideas how to implement the functionality without BFI?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147114/new/

https://reviews.llvm.org/D147114