[LLVMdev] RFC: Indirect Call Promotion LLVM Pass

Xinliang David Li xinliangli at gmail.com
Thu Apr 23 09:24:55 PDT 2015


On Fri, Apr 17, 2015 at 4:24 PM, Ivan Baev <ibaev at codeaurora.org> wrote:

> > On Fri, Apr 17, 2015 at 2:13 PM, Ivan Baev <ibaev at codeaurora.org> wrote:
> >> Hi, we've implemented an indirect call promotion llvm pass. The design
> >> notes including examples are shown below. This pass complements the
> >> indirect call profile infrastructure
> >> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
> >
> > There are issues with the profiling infrastructure proposal which will
> > addressed separately.
>
> -- Please send these to Betul and me.
>
>
yes.



> >
> > This part looks sane in general to me. See my rely inline.
> >
> >
> >> We've implemented two heuristics from this paper [1].
> >>
> >> a. Hot call site heuristic: only consider for promotion a call site
> >> which
> >> contribution (profile count) is more than 0.1% of the total count of all
> >> indirect calls executed during the profile runs.
> >>
> >> b. Hot target heuristic: promote the most frequent target at a call site
> >> if it gets at least 40% of all receivers at the call site.
> >
> > Is the heuristics a || b, or a && b ?
>
> -- a && b
>
> >
> >>
> >> Only the hottest target from a call site is possibly promoted, similarly
> >> to the approach taken in the paper.
> >>
> >> In addition to Aigner & Hölzle-based heuristics, we add an inline hint
> >> to
> >> the promoted direct call/invoke instruction if it is the single receiver
> >> at the call site according to the profile data or the number of times
> >> the
> >> target is executed at the call site is at least 4% of the total count of
> >> all indirect calls.  Once the function entry profile counts become
> >> available we will use them to tune the above inline-related heuristic.
> >>
> >>
> >
> > I don't think indirectly promoted callsites should be treated any
> > differently from original direct callsites -- after promotion, the
> > direct callsites have call count info and the same inline heuristics
> > should apply.
>
> --- Agree in general, we should live this decision to inliner, especially
> when it becomes a profile-based inliner.
> At ICP pass we currenty have the profile counts for indirect call sites
> and their receivers (targets) and it is tempting to pass some of this
> information to the inliner.
>

The inliner will get the profile information in the very near future, so
any other way to pass the info to inliner is a hack and not recommended.
Note that Value profile transformation can easily compute the branch
probability for the newly generated branch with target count and site total
count.



>
>
> >>
> >>   if (ptr->foo == A::foo)
> >> to
> >>   if (ptr->_vptr == A::_vtable)
> >
> > You can do that if you know class A is final. In general, you need
> > type or vtable profiling to get it.
>
> -- It is a future enhancement. Could you please provide some more details,
> in particular is it valid for C++ programs?
>
>
It is valid. HP compiler does that. Comparing vtable profiling with call
target profiling: while it has its own downsides too. It tends to increase
the number of target values to track because of the complicated class
hierarchy. For instance, B has a member function 'foo' and is inherited by
D1, D2, D3 (not overriding), and D4 (overriding). The number of values to
track increases from 2 to 4. The transformation may also become more
complicated: if (vptr == D1_vtb || vptr == D2_vtb ....) invoke B::foo(...)
...


> >
> >>
> >> This will sink one load from the original block into the less frequently
> >> executed if.false block. This opportunity was found by Balaram Makam.
> >>
> >>
> >> 4. New enhancement patch
> >> -------------------------
> >> Currently our implementation has the following shortcomings:
> >> a. Our heuristics do not depend on the global information on function
> >> counts. It could be that none of the indirect call sites are
> >> contributing
> >> highly to the overall calls. Because our current implementation is
> >> deciding what to inline based on the indirect call site sum only, it
> >> could
> >> be inlining functions that are in essence cold when all functions in the
> >> source base are considered. This situation will be improved when the
> >> function entry profile counts become available in llvm IR.
> >
> > Our plan is to add program level summary data for PGO.  Any global
> > decisions need to made based on that because only relative global
> > hotness matters.
> >
> >>
> >> b. Current implementation only transforms the first hot target, the rest
> >> of the targets are never considered even if they are relatively hot.
> >
> > This is probably a good thing.  Going beyond 2 can have negative effect.
> >
>
> -- With 2 we're getting incremental improvements, and we plan to further
> tune it.
>

David



> Thanks for the feedback, David.
> Ivan
>
>
> >>
> >> We are evaluating a new solution which depends on the
> >> presence/availability of functions counts in clang. We form a sorted
> >> multiset of all functions counts. A given indirect target is considered
> >> for inlining if the target’s count at the call site falls within one
> >> of
> >> the ranges that form the top 0-10%, 10-20% or 20-30% of the sorted
> >> multiset.  We’ve added checks which become stricter as the target
> >> count
> >> falls farther away from the top most called 10%, 20% or 30% of all
> >> functions respectively.
> >>
> >> Targets that are classified as making calls to one of the top most
> >> called
> >> 30% of the functions receive inline hints.  Inline hints are
> >> communicated
> >> from clang down to LLVM in metadata. Then, on the LLVM side the
> >> transformation pass uses the metadata field for the hint to add an
> >> inline
> >> hint at the transformed call site.
> >
> > Again, there is no need to invent indirect call (promoted) specific
> > inline heuristics.
> >
> >
> > thanks,
> >
> > David
> >
> >
> >>
> >> -------------------------
> >> [1] G. Aigner and U. Hölzle. Eliminating virtual function calls in C++
> >> programs. ECOOP, 1996.
> >> [2] X. Li, R. Ashok, R. Hundt. Lightweight Feedback-Directed
> >> Cross-Module
> >> Optimization. CGO, 2010.
> >>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150423/b387c61b/attachment.html>


More information about the llvm-dev mailing list