[LLVMdev] RFC: Indirect Call Promotion LLVM Pass

Ivan Baev ibaev at codeaurora.org
Fri Apr 17 16:24:18 PDT 2015


> On Fri, Apr 17, 2015 at 2:13 PM, Ivan Baev <ibaev at codeaurora.org> wrote:
>> Hi, we've implemented an indirect call promotion llvm pass. The design
>> notes including examples are shown below. This pass complements the
>> indirect call profile infrastructure
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
>
> There are issues with the profiling infrastructure proposal which will
> addressed separately.

-- Please send these to Betul and me.

>
> This part looks sane in general to me. See my rely inline.
>
>
>> We've implemented two heuristics from this paper [1].
>>
>> a. Hot call site heuristic: only consider for promotion a call site
>> which
>> contribution (profile count) is more than 0.1% of the total count of all
>> indirect calls executed during the profile runs.
>>
>> b. Hot target heuristic: promote the most frequent target at a call site
>> if it gets at least 40% of all receivers at the call site.
>
> Is the heuristics a || b, or a && b ?

-- a && b

>
>>
>> Only the hottest target from a call site is possibly promoted, similarly
>> to the approach taken in the paper.
>>
>> In addition to Aigner & Hölzle-based heuristics, we add an inline hint
>> to
>> the promoted direct call/invoke instruction if it is the single receiver
>> at the call site according to the profile data or the number of times
>> the
>> target is executed at the call site is at least 4% of the total count of
>> all indirect calls.  Once the function entry profile counts become
>> available we will use them to tune the above inline-related heuristic.
>>
>>
>
> I don't think indirectly promoted callsites should be treated any
> differently from original direct callsites -- after promotion, the
> direct callsites have call count info and the same inline heuristics
> should apply.

--- Agree in general, we should live this decision to inliner, especially
when it becomes a profile-based inliner.
At ICP pass we currenty have the profile counts for indirect call sites
and their receivers (targets) and it is tempting to pass some of this
information to the inliner.


>>
>>   if (ptr->foo == A::foo)
>> to
>>   if (ptr->_vptr == A::_vtable)
>
> You can do that if you know class A is final. In general, you need
> type or vtable profiling to get it.

-- It is a future enhancement. Could you please provide some more details,
in particular is it valid for C++ programs?

>
>>
>> This will sink one load from the original block into the less frequently
>> executed if.false block. This opportunity was found by Balaram Makam.
>>
>>
>> 4. New enhancement patch
>> -------------------------
>> Currently our implementation has the following shortcomings:
>> a. Our heuristics do not depend on the global information on function
>> counts. It could be that none of the indirect call sites are
>> contributing
>> highly to the overall calls. Because our current implementation is
>> deciding what to inline based on the indirect call site sum only, it
>> could
>> be inlining functions that are in essence cold when all functions in the
>> source base are considered. This situation will be improved when the
>> function entry profile counts become available in llvm IR.
>
> Our plan is to add program level summary data for PGO.  Any global
> decisions need to made based on that because only relative global
> hotness matters.
>
>>
>> b. Current implementation only transforms the first hot target, the rest
>> of the targets are never considered even if they are relatively hot.
>
> This is probably a good thing.  Going beyond 2 can have negative effect.
>

-- With 2 we're getting incremental improvements, and we plan to further
tune it.

Thanks for the feedback, David.
Ivan


>>
>> We are evaluating a new solution which depends on the
>> presence/availability of functions counts in clang. We form a sorted
>> multiset of all functions counts. A given indirect target is considered
>> for inlining if the target’s count at the call site falls within one
>> of
>> the ranges that form the top 0-10%, 10-20% or 20-30% of the sorted
>> multiset.  We’ve added checks which become stricter as the target
>> count
>> falls farther away from the top most called 10%, 20% or 30% of all
>> functions respectively.
>>
>> Targets that are classified as making calls to one of the top most
>> called
>> 30% of the functions receive inline hints.  Inline hints are
>> communicated
>> from clang down to LLVM in metadata. Then, on the LLVM side the
>> transformation pass uses the metadata field for the hint to add an
>> inline
>> hint at the transformed call site.
>
> Again, there is no need to invent indirect call (promoted) specific
> inline heuristics.
>
>
> thanks,
>
> David
>
>
>>
>> -------------------------
>> [1] G. Aigner and U. Hölzle. Eliminating virtual function calls in C++
>> programs. ECOOP, 1996.
>> [2] X. Li, R. Ashok, R. Hundt. Lightweight Feedback-Directed
>> Cross-Module
>> Optimization. CGO, 2010.
>>






More information about the llvm-dev mailing list