[llvm-dev] [LLVMdev] Path forward on profile guided inlining?

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Tue Sep 1 12:40:21 PDT 2015


Ivan, thanks for looking into this. See my reply below.

> level) to be available during IPA/Inliner. For instance to estimate
> icache footprint using weighted size etc.
>
> I addressed this specifically in my original email.  The use of the  extra
> metadata based caching scheme is temporary.  The inlining
> heuristics and updating of entry counts would not be.  They'd just be
> driven by BFI directly.

yes -- any profile update patch that is independent of pass manager
change is welcome :)



>>> are also now approximate. Given typical profiling mechanisms are lossy,
> that's not a big deal, but is worth noting explicitly.)
>> yes. BB count is computed from entry_count and BB freq; call count is
> obtained from BB count. Two things need to be updated during inlining  1)
> entry count of the remaining out of line instance of the callee  2) BB
> frequency of inline instance of the callee.
>> see also
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-March/083943.html
>
> This is exactly correct.  I was assuming that the BFI function pass new
> how to update itself (i.e. lazy rebuild), but if you wanted the inliner
> to explicitly keep the analysis up to date, that would work too.

I am not sure what you mean by BFI function pass know how to update
itself. What we want here is that we need  to compute BFI for inline
instances via scaling (of the callee BFI) to avoid expensive
recomputation of BFI in caller after inlining happens.

The good news is that Cong has made changes in BFI/BPI so that they
can be computed on demand and decoupled from pass manager. This
greatly simplifies the profile update work in inliner.  Since the
inliner is bottom up, we can also release BFI info for the callees
once the inliner is done with them (i.e. inlined into callers).

>> Can you use callgraph to keep this information instead?
>
> That would be another option.  Not a bad one actually.  It would mean
> having to explicitly populate the information into the callgraph by
> calling per function analysis passes.  Not sure how easy that is to do.
>
> I'm also leery relying on the call graph since I know that's going to
> radically change with the new pass manager.  I really, really, really do
> not want to be blocked behind Chandlers progress.
>
>
> [Ivan] At this point, it seems better to use metadata on the call site
> (Philip's proposal).

See above. I think it is possible to do BFI update now without relying
on the new pass manager. I also expect this change to be compatible
with (or easily adaptable to) the new pass manager once it is in
place.


> below which it does not apply. However this is no better way.
>
> I assume you meant "there is no better way".  If so, agreed. Assuming
> that you're not recomputing the call count from another source at least.
>
>
> [Ivan] That is the Constant ratio assumption: for any procedure body and
> any call site contained in the body, the expected number of executions of
> the call site per execution of the body is constant (Scheiffer, An
> analysis of inline substitution for a structured programming language,
> CACM, 1977).
> The results in this paper show that the use of multilevel history
> (context-sensitivity) is not warranted, and several compilers use this
> constant ratio assumption. For now, it might be reasonable to use it in
> LLVM too.

You are quoting a pretty old paper. Context sensitivity is pretty
important for modern C++ programs with lots of abstractions. Same
small functions can be used in large number of contexts with totally
different profile: the same loop can have iterations from 1 to
millions; the same indirect callsite can have too many hot targets in
the merged context; the merged branch prob can be close to 50/50
making it useless for block layout; the merged switch profile can be
evenly distributed making it less useful.

The best/efficient way to solve this is to enable pre-inlining (see
Rong's RFC) -- but that is a different topic.  Smarter post-inline
update can also be helpful -- but probably not something we need to
worry about initially.

>
>

> minimize the number of times we need to recompute a given
> functions
>>> block
>>> frequencies.  (e.g. we can look at all the original call sites within a
> function before looking at newly inlined ones)
>
> [Ivan] How about a pass called immediately before the Inliner, which will
> use BFI to compute and add calsite frequencies as metadata to callsite
> instructions? During each inline step/transformation, the Inliner will
> update the impacted callsite frequencies as well as function entry counts.
>
>

See above, I think it is possible to do this without requiring  using
new meta data.

thanks,

David

>
> _______________________________________________
> LLVM Developers mailing list
> LLV... at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


More information about the llvm-dev mailing list