[LLVMdev] RFC: Indirect Call Promotion LLVM Pass
betulb at codeaurora.org
betulb at codeaurora.org
Fri Apr 17 16:26:20 PDT 2015
> On Fri, Apr 17, 2015 at 2:13 PM, Ivan Baev <ibaev at codeaurora.org> wrote:
>> Hi, we've implemented an indirect call promotion llvm pass. The design
>> notes including examples are shown below. This pass complements the
>> indirect call profile infrastructure
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
>
> There are issues with the profiling infrastructure proposal which will
> addressed separately.
>
> This part looks sane in general to me. See my rely inline.
>
>
>> We've implemented two heuristics from this paper [1].
>>
>> a. Hot call site heuristic: only consider for promotion a call site
>> which
>> contribution (profile count) is more than 0.1% of the total count of all
>> indirect calls executed during the profile runs.
>>
>> b. Hot target heuristic: promote the most frequent target at a call site
>> if it gets at least 40% of all receivers at the call site.
>
> Is the heuristics a || b, or a && b ?
>
>>
>> Only the hottest target from a call site is possibly promoted, similarly
>> to the approach taken in the paper.
>>
>> In addition to Aigner & Hölzle-based heuristics, we add an inline hint
>> to
>> the promoted direct call/invoke instruction if it is the single receiver
>> at the call site according to the profile data or the number of times
>> the
>> target is executed at the call site is at least 4% of the total count of
>> all indirect calls. Once the function entry profile counts become
>> available we will use them to tune the above inline-related heuristic.
>>
>>
>
> I don't think indirectly promoted callsites should be treated any
> differently from original direct callsites -- after promotion, the
> direct callsites have call count info and the same inline heuristics
> should apply.
>
>
>>
>> if (ptr->foo == A::foo)
>> to
>> if (ptr->_vptr == A::_vtable)
>
> You can do that if you know class A is final. In general, you need
> type or vtable profiling to get it.
>
>>
>> This will sink one load from the original block into the less frequently
>> executed if.false block. This opportunity was found by Balaram Makam.
>>
>>
>> 4. New enhancement patch
>> -------------------------
>> Currently our implementation has the following shortcomings:
>> a. Our heuristics do not depend on the global information on function
>> counts. It could be that none of the indirect call sites are
>> contributing
>> highly to the overall calls. Because our current implementation is
>> deciding what to inline based on the indirect call site sum only, it
>> could
>> be inlining functions that are in essence cold when all functions in the
>> source base are considered. This situation will be improved when the
>> function entry profile counts become available in llvm IR.
>
> Our plan is to add program level summary data for PGO. Any global
> decisions need to made based on that because only relative global
> hotness matters.
>
>>
>> b. Current implementation only transforms the first hot target, the rest
>> of the targets are never considered even if they are relatively hot.
>
> This is probably a good thing. Going beyond 2 can have negative effect.
>
>>
>> We are evaluating a new solution which depends on the
>> presence/availability of functions counts in clang. We form a sorted
>> multiset of all functions counts. A given indirect target is considered
>> for inlining if the targetâs count at the call site falls within one
>> of
>> the ranges that form the top 0-10%, 10-20% or 20-30% of the sorted
>> multiset. Weâve added checks which become stricter as the target
>> count
>> falls farther away from the top most called 10%, 20% or 30% of all
>> functions respectively.
>>
>> Targets that are classified as making calls to one of the top most
>> called
>> 30% of the functions receive inline hints. Inline hints are
>> communicated
>> from clang down to LLVM in metadata. Then, on the LLVM side the
>> transformation pass uses the metadata field for the hint to add an
>> inline
>> hint at the transformed call site.
>
> Again, there is no need to invent indirect call (promoted) specific
> inline heuristics.
These heuristics are there to decide which indirect call targets to
promote at an indirect call site. Not necessarily an additional inline
heuristic for the promoted targets. However, we do add an inline hint
currently at the transformed call sites to help trigger the inlines at the
new call site.
>
> thanks,
>
> David
>
>
>>
>> -------------------------
>> [1] G. Aigner and U. Hölzle. Eliminating virtual function calls in C++
>> programs. ECOOP, 1996.
>> [2] X. Li, R. Ashok, R. Hundt. Lightweight Feedback-Directed
>> Cross-Module
>> Optimization. CGO, 2010.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
More information about the llvm-dev
mailing list