[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

Fedor Sergeev via llvm-dev llvm-dev at lists.llvm.org
Fri Mar 1 12:53:49 PST 2019


On 2/28/19 12:47 AM, Hiroshi Yamauchi via llvm-dev wrote:
> Hi all,
>
> To implement more profile-guided optimizations, we’d like to use 
> ProfileSummaryInfo (PSI) and BlockFrequencyInfo (BFI) from more passes 
> of various types, under the new pass manager.
>
> The following is what we came up with. Would appreciate feedback. Thanks.
>
> Issue
>
> It’s not obvious (to me) how to best do this, given that we cannot 
> request an outer-scope analysis result from an inner-scope pass 
> through analysis managers [1] and that we might unnecessarily running 
> some analyses unless we conditionally build pass pipelines for PGO cases.
Indeed, this is an intentional restriction in new pass manager, which is 
more or less a reflection of a fundamental property of outer-inner 
IRUnit relationship
and transformations/analyses run on those units. The main intent for 
having those inner IRUnits (e.g. Loops) is to run local transformations 
and save compile time
on being local to a particular small piece of IR. Loop Pass manager 
allows you to run a whole pipeline of different transformations still 
locally, amplifying the save.
As soon as you run function-level analysis from within the loop pipeline 
you essentially break this pipelining.
Say, as you run your loop transformation it modifies the loop (and the 
function) and potentially invalidates the analysis,
so you have to rerun your analysis again and again. Hence instead of 
saving on compile time it ends up increasing it.

I have hit this issue somewhat recently with dependency of loop passes 
on BranchProbabilityInfo.
(some loop passes, like IRCE can use it for profitability analysis).
The only solution that appears to be reasonable there is to teach all 
the loops passes that need to be pipelined
to preserve BPI (or any other module/function-level analyses) similar to 
how they preserve DominatorTree and
other "LoopStandard" analyses.

> It seems that for different types of passes to be able to get PSI and 
> BFI, we’d need to ensure PSI is cached for a non-module pass, and PSI, 
> BFI and the ModuleAnalysisManager proxy are cached for a loop pass in 
> the pass pipelines. This may mean potentially needing to insert 
> BFI/PSI in front of many passes [2]. It seems not obvious how to 
> conditionally insert BFI for PGO pipelines because there isn’t always 
> a good flag to detect PGO cases [3] or we tend to build pass pipelines 
> before examining the code (or without propagating enough info down) [4].
>
> Proposed approach
>
> - Cache PSI right after the profile summary in the IR is written in 
> the pass pipeline [5]. This would avoid the need to insert 
> RequiredAnalysisPass for PSI before each non-module pass that needs 
> it. PSI can be technically invalidated but unlikely. If it does, we 
> insert another RequiredAnalysisPass[6].
>
> - Conditionally insert RequireAnalysisPass for BFI, if PGO, right 
> before each loop pass that needs it. This doesn't seem avoidable 
> because BFI can be invalidated whenever the CFG changes. We detect PGO 
> based on the command line flags and/or whether the module has the 
> profile summary info (we may need to pass the module to more functions.)
>
> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be 
> able to get to the ModuleAnalysisManager in one step and PSI through it.
>
> Alternative approaches
>
> Dropping BFI and use PSI only
> We could consider not using BFI and solely relying on PSI and 
> function-level profiles only (as opposed to block-level), but profile 
> precision would suffer.
>
> Computing BFI in-place
> We could consider computing BFI “in-place” by directly running BFI 
> outside of the pass manager [7]. This would let us avoid using the 
> analysis manager constraints but it would still involve running an 
> outer-scope analysis from an inner-scope pass and potentially cause 
> problems in terms of pass pipelining and concurrency. Moreover, a 
> potential downside of running analyses in-place is that it won’t take 
> advantage of cached analysis results provided by the pass manager.
>
> Adding inner-scope versions of PSI and BFI
> We could consider adding a function-level and loop-level PSI and 
> loop-level BFI, which internally act like their outer-scope versions 
> but provide inner-scope results only. This way, we could always call 
> getResult for PSI and BFI. However, this would still involve running 
> an outer-scope analysis from an inner-scope pass.
>
> Caching the FAM and the MAM proxies
> We could consider caching the FunctionalAnalysisManager and the 
> ModuleAnalysisManager proxies once early on instead of adding a new 
> proxy. But it seems to not likely work well because the analysis cache 
> key type includes the function or the module and some pass may add a 
> new function for which the proxy wouldn’t be cached. We’d need to 
> write and insert a pass in select locations to just fill the cache. 
> Adding the new proxy would take care of these with a three-line change.
>
> Conditional BFI
> We could consider adding a conditional BFI analysis that is a wrapper 
> around BFI and computes BFI only if profiles are available (either 
> checking the module has profile summary or depend on the PSI.) With 
> this, we wouldn’t need to conditionally build pass pipelines and may 
> work for the new pass manager. But a similar wouldn’t work for the old 
> pass manager because we cannot conditionally depend on an analysis 
> under it.
There is LazyBlockFrequencyInfo.
Not sure how well it fits this idea.

regards,
   Fedor.

>
>
> [1] We cannot call AnalysisManager::getResult for an outer scope but 
> only getCachedResult. Probably because of potential pipelining or 
> concurrency issues.
> [2] For example, potentially breaking up multiple pipelined loop 
> passes and insert RequireAnalysisPass<BlockFrequencyAnalysis> in front 
> of each of them.
> [3] For example, -fprofile-instr-use and -fprofile-sample-use aren’t 
> present in ThinLTO post link builds.
> [4] For example, we could check whether the module has the profile 
> summary metadata annotated when building pass pipelines but we don’t 
> always pass the module down to the place where we build pass pipelines.
> [5] By inserting RequireAnalysisPass<ProfileSummaryInfo> after the 
> PGOInstrumentationUse and the SampleProfileLoaderPass passes (and 
> around the PGOIndirectCallPromotion pass for the Thin LTO post link 
> pipeline.)
> [6] For example, the context-sensitive PGO.
> [7] Directly calling its constructor along with the dependent analyses 
> results, eg. the jump threading pass.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190301/879223df/attachment.html>


More information about the llvm-dev mailing list