[llvm-dev] [LLVMdev] Path forward on profile guided inlining?
Xinliang David Li via llvm-dev
llvm-dev at lists.llvm.org
Tue Dec 8 09:20:22 PST 2015
This makes great sense to me. I am looking forward to the patch --
this is a critical for PGO performance tuning.
On Mon, Dec 7, 2015 at 3:13 PM, Easwaran Raman <eraman at google.com> wrote:
> (Resending after removing llvmdev at cs.uiuc.edu and using
> llvm-dev at lists.llvm.org)
> On Mon, Dec 7, 2015 at 3:08 PM, Easwaran Raman <eraman at google.com> wrote:
>> Hi Philip,
>> Is there any update on this? I've been sending patches to get rid of the
>> callee hotness based inline hints from the frontend and move the logic to
>> the inliner. The next step is to use the callsite hotness instead. I also
>> want to focus on the infrastructure to enable this and what I've been
>> experimenting with is similar to your two alternative approaches:
>>> Alternate Approaches:
>>> 1) We could just recompute BFI explicitly in the inliner right before
>>> passing the result to ICA for the purposes of prototyping. If this was off
>>> by default, this might be a reasonable scheme for investigation. This could
>>> likely never be enabled for real uses.
>>> 2) We could pre-compute BFI once per function within a given SCC and then
>>> try to keep it up to date during inlining. If we cached the call
>>> frequencies for the initial call sites, we could adjust the visit order to
>>> minimize the number of times we need to recompute a given functions block
>>> frequencies. (e.g. we can look at all the original call sites within a
>>> function before looking at newly inlined ones)
>> My proposal is very similar (perhaps identical) to your option 2 above. I
>> don't understand the part where you talk about adjusting the visit order to
>> minimize BFI computation.
>> BFI computation: BFI for a function is computed on demand and cached.
>> Update: When 'bar' gets inlined into 'foo', the BFI for 'foo' is updated.
>> Let OldBB in 'bar' gets cloned as NewBB in 'foo'. NewBB's block frequency
>> can be incrementally computed from OldBB's block frequency, entry block
>> frequency of 'bar' and the frequency of the block containing the 'foo' ->
>> 'bar' callsite. Even when the new CGSCC level BFI analysis is in place, this
>> incremental update is useful to minimize computation.
>> Invalidation: Once inlining is completed in an SCC (at the end of
>> runOnSCC), the entries for functions in that SCC are invalidated since other
>> passes run by the CGSCC pass manager (including those run by the function
>> pass manager run under CGSCC pass manager) might affect the computed BFI for
>> the functions in the SCC.
>> When the new PM infrastructure and a CGSCC based BFI analysis is in place,
>> the transition should be easy assuming it will provide getBFI(Function *)
>> and invalidateBFI(Function *) interfaces. BFI for a function is computed at
>> most twice in this approach. Thoughts?
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev