[LLVMdev] RFC: Indirect Call Promotion LLVM Pass
Xinliang David Li
davidxl at google.com
Fri Apr 17 15:46:52 PDT 2015
On Fri, Apr 17, 2015 at 2:13 PM, Ivan Baev <ibaev at codeaurora.org> wrote:
> Hi, we've implemented an indirect call promotion llvm pass. The design
> notes including examples are shown below. This pass complements the
> indirect call profile infrastructure
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
There are issues with the profiling infrastructure proposal which will
addressed separately.
This part looks sane in general to me. See my rely inline.
> We've implemented two heuristics from this paper [1].
>
> a. Hot call site heuristic: only consider for promotion a call site which
> contribution (profile count) is more than 0.1% of the total count of all
> indirect calls executed during the profile runs.
>
> b. Hot target heuristic: promote the most frequent target at a call site
> if it gets at least 40% of all receivers at the call site.
Is the heuristics a || b, or a && b ?
>
> Only the hottest target from a call site is possibly promoted, similarly
> to the approach taken in the paper.
>
> In addition to Aigner & Hölzle-based heuristics, we add an inline hint to
> the promoted direct call/invoke instruction if it is the single receiver
> at the call site according to the profile data or the number of times the
> target is executed at the call site is at least 4% of the total count of
> all indirect calls. Once the function entry profile counts become
> available we will use them to tune the above inline-related heuristic.
>
>
I don't think indirectly promoted callsites should be treated any
differently from original direct callsites -- after promotion, the
direct callsites have call count info and the same inline heuristics
should apply.
>
> if (ptr->foo == A::foo)
> to
> if (ptr->_vptr == A::_vtable)
You can do that if you know class A is final. In general, you need
type or vtable profiling to get it.
>
> This will sink one load from the original block into the less frequently
> executed if.false block. This opportunity was found by Balaram Makam.
>
>
> 4. New enhancement patch
> -------------------------
> Currently our implementation has the following shortcomings:
> a. Our heuristics do not depend on the global information on function
> counts. It could be that none of the indirect call sites are contributing
> highly to the overall calls. Because our current implementation is
> deciding what to inline based on the indirect call site sum only, it could
> be inlining functions that are in essence cold when all functions in the
> source base are considered. This situation will be improved when the
> function entry profile counts become available in llvm IR.
Our plan is to add program level summary data for PGO. Any global
decisions need to made based on that because only relative global
hotness matters.
>
> b. Current implementation only transforms the first hot target, the rest
> of the targets are never considered even if they are relatively hot.
This is probably a good thing. Going beyond 2 can have negative effect.
>
> We are evaluating a new solution which depends on the
> presence/availability of functions counts in clang. We form a sorted
> multiset of all functions counts. A given indirect target is considered
> for inlining if the target’s count at the call site falls within one of
> the ranges that form the top 0-10%, 10-20% or 20-30% of the sorted
> multiset. We’ve added checks which become stricter as the target count
> falls farther away from the top most called 10%, 20% or 30% of all
> functions respectively.
>
> Targets that are classified as making calls to one of the top most called
> 30% of the functions receive inline hints. Inline hints are communicated
> from clang down to LLVM in metadata. Then, on the LLVM side the
> transformation pass uses the metadata field for the hint to add an inline
> hint at the transformed call site.
Again, there is no need to invent indirect call (promoted) specific
inline heuristics.
thanks,
David
>
> -------------------------
> [1] G. Aigner and U. Hölzle. Eliminating virtual function calls in C++
> programs. ECOOP, 1996.
> [2] X. Li, R. Ashok, R. Hundt. Lightweight Feedback-Directed Cross-Module
> Optimization. CGO, 2010.
>
>
>
>
>
>
>
>
>
>
>
>
More information about the llvm-dev
mailing list