[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager

Mon Mar 4 10:55:57 PST 2019

On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev <fedor.sergeev at azul.com>
wrote:

>
>
> On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote:
>
> Here's a sketch of the proposed approach for just one pass (but imagine
> more)
>
> https://reviews.llvm.org/D58845
>
> On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On 2/28/19 12:47 AM, Hiroshi Yamauchi via llvm-dev wrote:
>>
>> Hi all,
>>
>> To implement more profile-guided optimizations, we’d like to use
>> ProfileSummaryInfo (PSI) and BlockFrequencyInfo (BFI) from more passes of
>> various types, under the new pass manager.
>>
>> The following is what we came up with. Would appreciate feedback. Thanks.
>>
>> Issue
>>
>> It’s not obvious (to me) how to best do this, given that we cannot
>> request an outer-scope analysis result from an inner-scope pass through
>> analysis managers [1] and that we might unnecessarily running some analyses
>> unless we conditionally build pass pipelines for PGO cases.
>>
>> Indeed, this is an intentional restriction in new pass manager, which is
>> more or less a reflection of a fundamental property of outer-inner IRUnit
>> relationship
>> and transformations/analyses run on those units. The main intent for
>> having those inner IRUnits (e.g. Loops) is to run local transformations and
>> save compile time
>> on being local to a particular small piece of IR. Loop Pass manager
>> allows you to run a whole pipeline of different transformations still
>> locally, amplifying the save.
>> As soon as you run function-level analysis from within the loop pipeline
>> you essentially break this pipelining.
>> Say, as you run your loop transformation it modifies the loop (and the
>> function) and potentially invalidates the analysis,
>> so you have to rerun your analysis again and again. Hence instead of
>> saving on compile time it ends up increasing it.
>>
>
> Exactly.
>
>
>> I have hit this issue somewhat recently with dependency of loop passes on
>> BranchProbabilityInfo.
>> (some loop passes, like IRCE can use it for profitability analysis).
>>
> The only solution that appears to be reasonable there is to teach all the
>> loops passes that need to be pipelined
>> to preserve BPI (or any other module/function-level analyses) similar to
>> how they preserve DominatorTree and
>> other "LoopStandard" analyses.
>>
>
> Is this implemented - do the loop passes preserve BPI?
>
> Nope, not implemented right now.
> One of the problems is that even loop canonicalization passes run at the
> start of loop pass manager dont preserve it
> (and at least LoopSimplifyCFG does change control flow).
>
>
> In buildFunctionSimplificationPipeline (where LoopFullUnrollPass is added
> as in the sketch), LateLoopOptimizationsEPCallbacks
> and LoopOptimizerEndEPCallbacks seem to allow some arbitrary loop passes to
> be inserted into the pipelines (via flags)?
>
> I wonder how hard it'd be to teach all the relevant loop passes to
> preserve BFI (or BPI)..
>
> Well, each time you restructure control flow around the loops you will
> have to update those extra analyses,
> pretty much the same way as DT is being updated through DomTreeUpdater.
> The trick is to design a proper update interface (and then implement it ;)
> ).
> And I have not spent enough time on this issue to get a good idea of what
> that interface would be.
>

Hm, sounds non-trivial :) noting BFI depends on BPI.

> regards,
>   Fedor.
>
>
>> It seems that for different types of passes to be able to get PSI and
>> BFI, we’d need to ensure PSI is cached for a non-module pass, and PSI, BFI
>> and the ModuleAnalysisManager proxy are cached for a loop pass in the pass
>> pipelines. This may mean potentially needing to insert BFI/PSI in front of
>> many passes [2]. It seems not obvious how to conditionally insert BFI for
>> PGO pipelines because there isn’t always a good flag to detect PGO cases
>> [3] or we tend to build pass pipelines before examining the code (or
>> without propagating enough info down) [4].
>>
>> Proposed approach
>>
>> - Cache PSI right after the profile summary in the IR is written in the
>> pass pipeline [5]. This would avoid the need to insert RequiredAnalysisPass
>> for PSI before each non-module pass that needs it. PSI can be technically
>> invalidated but unlikely. If it does, we insert another RequiredAnalysisPass
>>  [6].
>>
>> - Conditionally insert RequireAnalysisPass for BFI, if PGO, right before
>> each loop pass that needs it. This doesn't seem avoidable because BFI can
>> be invalidated whenever the CFG changes. We detect PGO based on the command
>> line flags and/or whether the module has the profile summary info (we
>> may need to pass the module to more functions.)
>>
>> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be
>> able to get to the ModuleAnalysisManager in one step and PSI through it.
>>
>> Alternative approaches
>>
>> Dropping BFI and use PSI only
>> We could consider not using BFI and solely relying on PSI and
>> function-level profiles only (as opposed to block-level), but profile
>> precision would suffer.
>>
>> Computing BFI in-place
>> We could consider computing BFI “in-place” by directly running BFI
>> outside of the pass manager [7]. This would let us avoid using the analysis
>> manager constraints but it would still involve running an outer-scope
>> analysis from an inner-scope pass and potentially cause problems in terms
>> of pass pipelining and concurrency. Moreover, a potential downside of
>> running analyses in-place is that it won’t take advantage of cached
>> analysis results provided by the pass manager.
>>
>> Adding inner-scope versions of PSI and BFI
>> We could consider adding a function-level and loop-level PSI and
>> loop-level BFI, which internally act like their outer-scope versions but
>> provide inner-scope results only. This way, we could always call getResult
>> for PSI and BFI. However, this would still involve running an outer-scope
>> analysis from an inner-scope pass.
>>
>> Caching the FAM and the MAM proxies
>> We could consider caching the FunctionalAnalysisManager and the
>> ModuleAnalysisManager proxies once early on instead of adding a new proxy.
>> But it seems to not likely work well because the analysis cache key type
>> includes the function or the module and some pass may add a new function
>> for which the proxy wouldn’t be cached. We’d need to write and insert a
>> pass in select locations to just fill the cache. Adding the new proxy would
>> take care of these with a three-line change.
>>
>> Conditional BFI
>> We could consider adding a conditional BFI analysis that is a wrapper
>> around BFI and computes BFI only if profiles are available (either checking
>> the module has profile summary or depend on the PSI.) With this, we
>> wouldn’t need to conditionally build pass pipelines and may work for the
>> new pass manager. But a similar wouldn’t work for the old pass manager
>> because we cannot conditionally depend on an analysis under it.
>>
>> There is LazyBlockFrequencyInfo.
>> Not sure how well it fits this idea.
>>
>
> Good point. LazyBlockFrequencyInfo seems usable with the old pass manager
> (save unnecessary BFI/BPI) and would work for function passes. I think t
> he restriction still applies - a loop pass cannot still request
> (outer-scope) BFI, lazy or not, new or old (pass manager). Another
> assumption is that it'd be cheap and safe to unconditionally depend on
> PSI or check the module's profile summary.
>
>
>> regards,
>>   Fedor.
>>
>>
>>
>> [1] We cannot call AnalysisManager::getResult for an outer scope but only
>> getCachedResult. Probably because of potential pipelining or concurrency
>> issues.
>> [2] For example, potentially breaking up multiple pipelined loop passes
>> and insert RequireAnalysisPass<BlockFrequencyAnalysis> in front of each of
>> them.
>> [3] For example, -fprofile-instr-use and -fprofile-sample-use aren’t
>> present in ThinLTO post link builds.
>> [4] For example, we could check whether the module has the profile
>> summary metadata annotated when building pass pipelines but we don’t always
>> pass the module down to the place where we build pass pipelines.
>> [5] By inserting RequireAnalysisPass<ProfileSummaryInfo> after the
>> PGOInstrumentationUse and the SampleProfileLoaderPass passes (and around
>> the PGOIndirectCallPromotion pass for the Thin LTO post link pipeline.)
>> [6] For example, the context-sensitive PGO.
>> [7] Directly calling its constructor along with the dependent analyses
>> results, eg. the jump threading pass.
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190304/92803f78/attachment.html>