[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager
Hiroshi Yamauchi via llvm-dev
llvm-dev at lists.llvm.org
Wed Feb 27 13:47:03 PST 2019
To implement more profile-guided optimizations, we’d like to use
ProfileSummaryInfo (PSI) and BlockFrequencyInfo (BFI) from more passes of
various types, under the new pass manager.
The following is what we came up with. Would appreciate feedback. Thanks.
It’s not obvious (to me) how to best do this, given that we cannot request
an outer-scope analysis result from an inner-scope pass through analysis
managers  and that we might unnecessarily running some analyses unless
we conditionally build pass pipelines for PGO cases.
It seems that for different types of passes to be able to get PSI and BFI,
we’d need to ensure PSI is cached for a non-module pass, and PSI, BFI and
the ModuleAnalysisManager proxy are cached for a loop pass in the pass
pipelines. This may mean potentially needing to insert BFI/PSI in front of
many passes . It seems not obvious how to conditionally insert BFI for
PGO pipelines because there isn’t always a good flag to detect PGO cases
 or we tend to build pass pipelines before examining the code (or
without propagating enough info down) .
- Cache PSI right after the profile summary in the IR is written in the
pass pipeline . This would avoid the need to insert RequiredAnalysisPass
for PSI before each non-module pass that needs it. PSI can be technically
invalidated but unlikely. If it does, we insert another RequiredAnalysisPass
- Conditionally insert RequireAnalysisPass for BFI, if PGO, right before
each loop pass that needs it. This doesn't seem avoidable because BFI can
be invalidated whenever the CFG changes. We detect PGO based on the command
line flags and/or whether the module has the profile summary info (we may
need to pass the module to more functions.)
- Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be able
to get to the ModuleAnalysisManager in one step and PSI through it.
Dropping BFI and use PSI only
We could consider not using BFI and solely relying on PSI and
function-level profiles only (as opposed to block-level), but profile
precision would suffer.
Computing BFI in-place
We could consider computing BFI “in-place” by directly running BFI outside
of the pass manager . This would let us avoid using the analysis manager
constraints but it would still involve running an outer-scope analysis from
an inner-scope pass and potentially cause problems in terms of pass
pipelining and concurrency. Moreover, a potential downside of running
analyses in-place is that it won’t take advantage of cached analysis
results provided by the pass manager.
Adding inner-scope versions of PSI and BFI
We could consider adding a function-level and loop-level PSI and loop-level
BFI, which internally act like their outer-scope versions but provide
inner-scope results only. This way, we could always call getResult for PSI
and BFI. However, this would still involve running an outer-scope analysis
from an inner-scope pass.
Caching the FAM and the MAM proxies
We could consider caching the FunctionalAnalysisManager and the
ModuleAnalysisManager proxies once early on instead of adding a new proxy.
But it seems to not likely work well because the analysis cache key type
includes the function or the module and some pass may add a new function
for which the proxy wouldn’t be cached. We’d need to write and insert a
pass in select locations to just fill the cache. Adding the new proxy would
take care of these with a three-line change.
We could consider adding a conditional BFI analysis that is a wrapper
around BFI and computes BFI only if profiles are available (either checking
the module has profile summary or depend on the PSI.) With this, we
wouldn’t need to conditionally build pass pipelines and may work for the
new pass manager. But a similar wouldn’t work for the old pass manager
because we cannot conditionally depend on an analysis under it.
 We cannot call AnalysisManager::getResult for an outer scope but only
getCachedResult. Probably because of potential pipelining or concurrency
 For example, potentially breaking up multiple pipelined loop passes and
insert RequireAnalysisPass<BlockFrequencyAnalysis> in front of each of them.
 For example, -fprofile-instr-use and -fprofile-sample-use aren’t
present in ThinLTO post link builds.
 For example, we could check whether the module has the profile summary
metadata annotated when building pass pipelines but we don’t always pass
the module down to the place where we build pass pipelines.
 By inserting RequireAnalysisPass<ProfileSummaryInfo> after the
PGOInstrumentationUse and the SampleProfileLoaderPass passes (and around
the PGOIndirectCallPromotion pass for the Thin LTO post link pipeline.)
 For example, the context-sensitive PGO.
 Directly calling its constructor along with the dependent analyses
results, eg. the jump threading pass.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev