[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

Thu Dec 10 17:03:21 PST 2015

On Thu, Dec 10, 2015 at 4:51 PM, Philip Reames
<listmail at philipreames.com> wrote:
>
>
> On 12/10/2015 04:29 PM, Xinliang David Li wrote:
>>
>> On Thu, Dec 10, 2015 at 4:00 PM, Philip Reames
>> <listmail at philipreames.com> wrote:
>>>
>>> Given I didn't get any response to my original query, I chose not to
>>> invest
>>> time in this at the time.  I am unlikely to get time for this in the near
>>> future.
>>>
>>> On 12/07/2015 03:13 PM, Easwaran Raman wrote:
>>>
>>> (Resending after  removing llvmdev at cs.uiuc.edu and using
>>> llvm-dev at lists.llvm.org)
>>>
>>> On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com> wrote:
>>>>
>>>> Hi Philip,
>>>>
>>>>   Is there any update on this? I've been sending patches to get rid of
>>>> the
>>>> callee hotness based inline hints from the frontend and move the logic
>>>> to
>>>> the inliner. The next step is to use the callsite hotness instead. I
>>>> also
>>>> want to focus on the infrastructure to enable this and what I've been
>>>> experimenting with is similar to your two alternative approaches:
>>>>
>>>>> Alternate Approaches:
>>>>> 1) We could just recompute BFI explicitly in the inliner right before
>>>>> passing the result to ICA for the purposes of prototyping. If this was
>>>>> off
>>>>> by default, this might be a reasonable scheme for investigation.  This
>>>>> could
>>>>> likely never be enabled for real uses.
>>>>> 2) We could pre-compute BFI once per function within a given SCC and
>>>>> then
>>>>> try to keep it up to date during inlining.  If we cached the call
>>>>> frequencies for the initial call sites, we could adjust the visit order
>>>>> to
>>>>> minimize the number of times we need to recompute a given functions
>>>>> block
>>>>> frequencies.  (e.g. we can look at all the original call sites within a
>>>>> function before looking at newly inlined ones)
>>>>>
>>>> My proposal is very similar (perhaps identical) to your option 2 above.
>>>> I
>>>> don't understand the part where you talk about adjusting the visit order
>>>> to
>>>> minimize BFI computation.
>>>>
>>>> BFI computation:  BFI for a function is computed on demand and cached.
>>>
>>> The problem here is that the old pass manager does not have a place/way
>>> to
>>> cache the function level analysis.  You could do this within the inliner
>>> itself, but the function passes run by the SCC pass manager will
>>> invalidate
>>> everything.  That's the problem I was trying to solve.
>>
>> Easwaran's proposal does not rely on the analysis pass framework of
>> the legacy pass manager -- that is the key idea. BFI info is designed
>> in a way such that it can be computed on-demand.
>>
>> BFI for the caller will be incrementally updated during inlining, but
>> after inlining is done for a node, the BFI info will be explicitly
>> invalidated for that node due to the function passes run later by SCC
>> pass manager.  This is expected.
>
> Just to make sure, you're aware this will cost at least one re-computation
> per iteration in CGPassManager::runOnModule right? If that's expected and
> acceptable, then yes, the approach you seem to be describing should be
> workable.
>

For each leaf node, the number of BFI computation is <= 1. For other
nodes, it is <=2 (one as caller, one as callee) -- in other words, it
is linear to the number of functions.

> Particularly as an intermediate step to parallelize work with the new pass
> manager, this sounds very much worthwhile.
>>

Yes, that is exactly the intention -- it can buy pass manager rework
more time while unlock lots of pending performance tuning work we plan
to do.

thanks,

David

>>
>>>> Update: When 'bar' gets inlined into 'foo', the BFI for 'foo' is
>>>> updated.
>>>> Let OldBB in 'bar' gets cloned as NewBB in 'foo'.  NewBB's block
>>>> frequency
>>>> can be incrementally computed from OldBB's block frequency, entry block
>>>> frequency of 'bar' and the frequency of the block containing the 'foo'
>>>> ->
>>>> 'bar' callsite. Even when the new CGSCC level BFI analysis is in place,
>>>> this
>>>> incremental update is useful to minimize computation.
>>>>
>>>> Invalidation:  Once inlining is completed in an SCC (at the end of
>>>> runOnSCC), the entries for functions in that SCC are invalidated since
>>>> other
>>>> passes run by the CGSCC pass manager (including those run by the
>>>> function
>>>> pass manager run under CGSCC pass manager) might affect the computed BFI
>>>> for
>>>> the functions in the SCC.
>>>
>>> This sounds like you're essentially planning to extend the old pass
>>> manager
>>> with function level analysis support in the SCC pass manager.  I advise
>>> against this given the fact that Chandler is actively working on the new
>>> one
>>> and getting patches reviewed for the old one is "challenging" at best.
>>> If
>>> you take this approach, make sure Chandler - who is pretty much the only
>>> active reviewer in this area - is on-board with your approach before you
>>> start.
>>
>> There is no plan to extend old pass manager for this capability.  The
>> proposed solution is intended to work with the new pass manager too,
>> so there is no conflict here. The BFI book-keeping by inliner can be
>> removed with pass manager change is in place.
>
> Thanks for the clarification.
>
>>
>> David
>>
>>
>>>> When the new PM infrastructure and a CGSCC based BFI analysis is in
>>>> place,
>>>> the transition should be easy assuming it will provide getBFI(Function
>>>> *)
>>>> and invalidateBFI(Function *) interfaces. BFI for a function is computed
>>>> at
>>>> most twice in this approach. Thoughts?
>>>>
>>>>
>>>> Thanks,
>>>> Easwaran
>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>
>