[llvm-dev] RFC: Getting ProfileSummaryInfo and BlockFrequencyInfo from various types of passes under the new pass manager
Fedor Sergeev via llvm-dev
llvm-dev at lists.llvm.org
Wed Mar 13 14:37:40 PDT 2019
Overall seems fine to me.
On 3/11/19 8:12 PM, Hiroshi Yamauchi wrote:
> Here's a revised approach based on the discussion:
>
> - Cache PSI right after the profile summary in the IR is written in
> the pass pipeline. This would avoid the need to insert
> RequireAnalysisPass for PSI before each non-module pass that needs it.
> PSI can be technically invalidated but unlikely as PSI is immutable.
> If it does, we can insert another RequireAnalysisPass.
ProfileSummaryInfo::invalidate always return false, so it does not need
any extra handling
(as soon as it finds its way into ModuleAnalysisManager).
>
> - If PGO, conditionally request BFI from the passes that need it. For
> (pipelined) loop passes, we need to insert a pass that computes BFI
> conditionally (if PGO) in front of them and make them preserve BFI
> through. This is to avoid pipeline interruptions and potential
> invalidation/recomputation of BFI between the loop passes. We detect
> PGO based on whether PSI has profile summary info. (For the old pass
> manager, implement a similar approach by using LazyBlockFrequencyInfo.)
There is already an optional analysis in LoopStandardAnalysisResults -
MemorySSA.
We can do the same for BFI/BPI.
And, yes - preserving those through loop passes is a cornerstone to this
approach.
>
> - Add a new proxy ModuleAnalysisManagerLoopProxy for a loop pass to be
> able to get to the ModuleAnalysisManager in one step and PSI through it.
This is just an optimization of compile-time, saves one indirection
through FunctionAnalysisManager.
I'm not even sure if it is worth the effort. And definitely not crucial
for the overall idea.
regards,
Fedor.
>
>
>
>
> On Mon, Mar 4, 2019 at 2:05 PM Fedor Sergeev <fedor.sergeev at azul.com
> <mailto:fedor.sergeev at azul.com>> wrote:
>
>
>
> On 3/4/19 10:49 PM, Hiroshi Yamauchi wrote:
>>
>>
>> On Mon, Mar 4, 2019 at 10:55 AM Hiroshi Yamauchi
>> <yamauchi at google.com <mailto:yamauchi at google.com>> wrote:
>>
>>
>>
>> On Sat, Mar 2, 2019 at 12:58 AM Fedor Sergeev
>> <fedor.sergeev at azul.com <mailto:fedor.sergeev at azul.com>> wrote:
>>
>>
>>
>> On 3/2/19 2:38 AM, Hiroshi Yamauchi wrote:
>>> Here's a sketch of the proposed approach for just one
>>> pass(but imagine more)
>>>
>>> https://reviews.llvm.org/D58845
>>>
>>> On Fri, Mar 1, 2019 at 12:54 PM Fedor Sergeev via
>>> llvm-dev <llvm-dev at lists.llvm.org
>>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>
>>> On 2/28/19 12:47 AM, Hiroshi Yamauchi via llvm-dev
>>> wrote:
>>>> Hi all,
>>>>
>>>> To implement more profile-guided optimizations,
>>>> we’d like to use ProfileSummaryInfo (PSI) and
>>>> BlockFrequencyInfo (BFI) from more passes of
>>>> various types, under the new pass manager.
>>>>
>>>> The following is what we came up with. Would
>>>> appreciate feedback. Thanks.
>>>>
>>>> Issue
>>>>
>>>> It’s not obvious (to me) how to best do this, given
>>>> that we cannot request an outer-scope analysis
>>>> result from an inner-scope pass through analysis
>>>> managers [1] and that we might unnecessarily
>>>> running some analyses unless we conditionally build
>>>> pass pipelines for PGO cases.
>>> Indeed, this is an intentional restriction in new
>>> pass manager, which is more or less a reflection of
>>> a fundamental property of outer-inner IRUnit
>>> relationship
>>> and transformations/analyses run on those units. The
>>> main intent for having those inner IRUnits (e.g.
>>> Loops) is to run local transformations and save
>>> compile time
>>> on being local to a particular small piece of IR.
>>> Loop Pass manager allows you to run a whole pipeline
>>> of different transformations still locally,
>>> amplifying the save.
>>> As soon as you run function-level analysis from
>>> within the loop pipeline you essentially break this
>>> pipelining.
>>> Say, as you run your loop transformation it modifies
>>> the loop (and the function) and potentially
>>> invalidates the analysis,
>>> so you have to rerun your analysis again and again.
>>> Hence instead of saving on compile time it ends up
>>> increasing it.
>>>
>>>
>>> Exactly.
>>>
>>>
>>> I have hit this issue somewhat recently with
>>> dependency of loop passes on BranchProbabilityInfo.
>>> (some loop passes, like IRCE can use it for
>>> profitability analysis).
>>>
>>> The only solution that appears to be reasonable
>>> there is to teach all the loops passes that need to
>>> be pipelined
>>> to preserve BPI (or any other module/function-level
>>> analyses) similar to how they preserve DominatorTree and
>>> other "LoopStandard" analyses.
>>>
>>>
>>> Is this implemented - do the loop passes preserve BPI?
>> Nope, not implemented right now.
>> One of the problems is that even loop canonicalization
>> passes run at the start of loop pass manager dont preserve it
>> (and at least LoopSimplifyCFG does change control flow).
>>>
>>> In buildFunctionSimplificationPipeline
>>> (where LoopFullUnrollPass is added as in the sketch),
>>> LateLoopOptimizationsEPCallbacks
>>> and LoopOptimizerEndEPCallbacks seem to allow some
>>> arbitrary loop passes to be inserted into the pipelines
>>> (via flags)?
>>>
>>> I wonder how hard it'd be to teach all the relevant loop
>>> passes to preserve BFI(or BPI)..
>> Well, each time you restructure control flow around the
>> loops you will have to update those extra analyses,
>> pretty much the same way as DT is being updated through
>> DomTreeUpdater.
>> The trick is to design a proper update interface (and
>> then implement it ;) ).
>> And I have not spent enough time on this issue to get a
>> good idea of what that interface would be.
>>
>>
>> Hm, sounds non-trivial :) noting BFI depends on BPI.
>>
>>
>> To step back, it looks like:
>>
>> want to use profiles from more passes -> need to get BFI (from
>> loop passes) -> need all the loop passes to preserve BFI.
>>
>> I wonder if there's no way around this.
> Indeed. I believe this is a general consensus here.
>
> regards,
> Fedor.
>
>>
>>
>>
>> regards,
>> Fedor.
>>
>>>
>>>> It seems that for different types of passes to be
>>>> able to get PSI and BFI, we’d need to ensure PSI is
>>>> cached for a non-module pass, and PSI, BFI and the
>>>> ModuleAnalysisManager proxy are cached for a loop
>>>> pass in the pass pipelines. This may mean
>>>> potentially needing to insert BFI/PSI in front of
>>>> many passes [2]. It seems not obvious how to
>>>> conditionally insert BFI for PGO pipelines because
>>>> there isn’t always a good flag to detect PGO cases
>>>> [3] or we tend to build pass pipelines before
>>>> examining the code (or without propagating enough
>>>> info down) [4].
>>>>
>>>> Proposed approach
>>>>
>>>> - Cache PSI right after the profile summary in the
>>>> IR is written in the pass pipeline [5]. This would
>>>> avoid the need to insert RequiredAnalysisPass for
>>>> PSI before each non-module pass that needs it. PSI
>>>> can be technically invalidated but unlikely. If it
>>>> does, we insert another RequiredAnalysisPass[6].
>>>>
>>>> - Conditionally insert RequireAnalysisPass for BFI,
>>>> if PGO, right before each loop pass that needs it.
>>>> This doesn't seem avoidable because BFI can be
>>>> invalidated whenever the CFG changes. We detect PGO
>>>> based on the command line flags and/or whether the
>>>> module has the profile summary info (we may need to
>>>> pass the module to more functions.)
>>>>
>>>> - Add a new proxy ModuleAnalysisManagerLoopProxy
>>>> for a loop pass to be able to get to the
>>>> ModuleAnalysisManager in one step and PSI through it.
>>>>
>>>> Alternative approaches
>>>>
>>>> Dropping BFI and use PSI only
>>>> We could consider not using BFI and solely relying
>>>> on PSI and function-level profiles only (as opposed
>>>> to block-level), but profile precision would suffer.
>>>>
>>>> Computing BFI in-place
>>>> We could consider computing BFI “in-place” by
>>>> directly running BFI outside of the pass manager
>>>> [7]. This would let us avoid using the analysis
>>>> manager constraints but it would still involve
>>>> running an outer-scope analysis from an inner-scope
>>>> pass and potentially cause problems in terms of
>>>> pass pipelining and concurrency. Moreover, a
>>>> potential downside of running analyses in-place is
>>>> that it won’t take advantage of cached analysis
>>>> results provided by the pass manager.
>>>>
>>>> Adding inner-scope versions of PSI and BFI
>>>> We could consider adding a function-level and
>>>> loop-level PSI and loop-level BFI, which internally
>>>> act like their outer-scope versions but provide
>>>> inner-scope results only. This way, we could always
>>>> call getResult for PSI and BFI. However, this would
>>>> still involve running an outer-scope analysis from
>>>> an inner-scope pass.
>>>>
>>>> Caching the FAM and the MAM proxies
>>>> We could consider caching the
>>>> FunctionalAnalysisManager and the
>>>> ModuleAnalysisManager proxies once early on instead
>>>> of adding a new proxy. But it seems to not likely
>>>> work well because the analysis cache key type
>>>> includes the function or the module and some pass
>>>> may add a new function for which the proxy wouldn’t
>>>> be cached. We’d need to write and insert a pass in
>>>> select locations to just fill the cache. Adding the
>>>> new proxy would take care of these with a
>>>> three-line change.
>>>>
>>>> Conditional BFI
>>>> We could consider adding a conditional BFI analysis
>>>> that is a wrapper around BFI and computes BFI only
>>>> if profiles are available (either checking the
>>>> module has profile summary or depend on the PSI.)
>>>> With this, we wouldn’t need to conditionally build
>>>> pass pipelines and may work for the new pass
>>>> manager. But a similar wouldn’t work for the old
>>>> pass manager because we cannot conditionally depend
>>>> on an analysis under it.
>>> There is LazyBlockFrequencyInfo.
>>> Not sure how well it fits this idea.
>>>
>>>
>>> Good point. LazyBlockFrequencyInfo seems usable with the
>>> old pass manager (save unnecessary BFI/BPI) and would
>>> work for function passes. I think the restriction still
>>> applies - a loop pass cannot still request (outer-scope)
>>> BFI, lazy or not, new or old (pass manager). Another
>>> assumption is that it'd be cheap and safe to
>>> unconditionally depend on PSI or check the module's
>>> profile summary.
>>>
>>>
>>> regards,
>>> Fedor.
>>>
>>>>
>>>>
>>>> [1] We cannot call AnalysisManager::getResult for
>>>> an outer scope but only getCachedResult. Probably
>>>> because of potential pipelining or concurrency issues.
>>>> [2] For example, potentially breaking up multiple
>>>> pipelined loop passes and insert
>>>> RequireAnalysisPass<BlockFrequencyAnalysis> in
>>>> front of each of them.
>>>> [3] For example, -fprofile-instr-use and
>>>> -fprofile-sample-use aren’t present in ThinLTO post
>>>> link builds.
>>>> [4] For example, we could check whether the module
>>>> has the profile summary metadata annotated when
>>>> building pass pipelines but we don’t always pass
>>>> the module down to the place where we build pass
>>>> pipelines.
>>>> [5] By inserting
>>>> RequireAnalysisPass<ProfileSummaryInfo> after the
>>>> PGOInstrumentationUse and the
>>>> SampleProfileLoaderPass passes (and around the
>>>> PGOIndirectCallPromotion pass for the Thin LTO post
>>>> link pipeline.)
>>>> [6] For example, the context-sensitive PGO.
>>>> [7] Directly calling its constructor along with the
>>>> dependent analyses results, eg. the jump threading
>>>> pass.
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190314/ce875719/attachment-0001.html>
More information about the llvm-dev
mailing list