[LLVMdev] RFC: Indirect Call Promotion LLVM Pass

Ivan Baev ibaev at codeaurora.org
Fri Apr 17 14:13:59 PDT 2015


Hi, we've implemented an indirect call promotion llvm pass. The design
notes including examples are shown below. This pass complements the
indirect call profile infrastructure
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html

Your feedback and comments will be highly appreciated.

Thanks,
Ivan


============================================================================RFC:
Indirect Call Promotion LLVM Pass
Betul Buyukkurt and Ivan Baev

1. Introduction
Indirect call promotion (ICP) replaces an indirect call instruction to a
set of target addresses with a sequence of tests guarding direct calls to
selected targets, plus a fall through branch containing the original
indirect call. The ICP optimization is found to be the second most
profitable (after inlining) profile-based optimization in a recent study
[2].

We've implemented an ICP LLVM pass that iterates over all indirect call
sites in the module and selectively (under heuristics) performs the
promotion. Here is one example of the transformation.

--------------------ico.ll--------------------------------------------------
define void @foo(i32 %a) {
entry:
  %a1 = add i32 %a, 1
  ret void
}

define void @bar(i32 %a) {
entry:
  %a2 = add i32 %a, 2
  ret void
}

define void @main(void (i32)* %fun) {
entry:
  call void %fun(i32 10), !prof !1
  ret void
}

!1 = !{!"indirect_call_targets", i64 6000, !"foo", i64 5000, !"bar", i64 100}
----------------------------------------------------------------------------

> opt -ic-opt ico.ll -o ico-post.bc

--------------------ico-post.ll---------------------------------------------define
void @foo(i32 %a) {
entry:
  %a1 = add i32 %a, 1
  ret void
}

define void @bar(i32 %a) {
entry:
  %a2 = add i32 %a, 2
  ret void
}

define void @main(void (i32)* %fun) {
entry:
  %0 = bitcast void (i32)* %fun to i8*
  %1 = bitcast void (i32)* @foo to i8*
  %2 = icmp eq i8* %0, %1
  br i1 %2, label %if.true, label %if.false, !prof !0

if.merge:                                         ; preds = %if.false,
%if.true
  ret void

if.true:                                          ; preds = %entry
  call void @foo(i32 10) #0
  br label %if.merge

if.false:                                         ; preds = %entry
  call void %fun(i32 10), !prof !1
  br label %if.merge
}

attributes #0 = { inlinehint }
!0 = !{!"branch_weights", i32 5000, i32 1000}
!1 = !{!"indirect_call_targets", i64 1000, !"bar", i64 100}
----------------------------------------------------------------------------

The ICP pass handles indirect call and indirect invoke LLVM IR
instructions. It depends on the availability of indirect call metadata
provided by the indirect call profile infrastructure briefly described at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
Here is the new indirect call (IC) metadata type used in the example above:

 !1 = !{!"indirect_call_targets", i64 6000, !"foo", i64 5000, !"bar", i64
100}

The 6000 represents the number of times the indirect call was executed
during the profiling runs, of which function foo was the receiver 5000
times, and function bar was the receiver 100 times.

The input for ICP pass:
LLVM IR including IC metadata, (future) function entry count metadata

The output:
Modified LLVM IR with modified indirect call sites. Selected indirect call
targets are promoted and inline hint attributes are added subject to
heuristics.


2. Aigner & Hölzle-based heuristics
------------------------------------
We've implemented two heuristics from this paper [1].

a. Hot call site heuristic: only consider for promotion a call site which
contribution (profile count) is more than 0.1% of the total count of all
indirect calls executed during the profile runs.

b. Hot target heuristic: promote the most frequent target at a call site
if it gets at least 40% of all receivers at the call site.

Only the hottest target from a call site is possibly promoted, similarly
to the approach taken in the paper.

In addition to Aigner & Hölzle-based heuristics, we add an inline hint to
the promoted direct call/invoke instruction if it is the single receiver
at the call site according to the profile data or the number of times the
target is executed at the call site is at least 4% of the total count of
all indirect calls.  Once the function entry profile counts become
available we will use them to tune the above inline-related heuristic.


3. Handling virtual function calls
-----------------------------------

Consider the following C++ example involving virtual functions and calls.
----------------------------------------------------------------------------class
Operation {
  public:
    virtual int test_add(int a, int b)=0;
    virtual int test_sub(int a, int b)=0;
};

class A : public Operation {
  public:
  int test_add(int a, int b) {
    return a + b;
  }
  int test_sub(int a, int b) {
    return a - b;
  }
};

A myA;

int __attribute__ ((noinline)) testmain(int (A::*fptr)(int, int)) {
  return (myA.*fptr)(1, 2);
}
----------------------------------------------------------------------------

The debugging output of the ICP pass for this example is as follows:

----------------------------------------------------------------------------****
INDIRECT CALL OPTIMIZATION ****
IC target hotness threshold = 40
Total IC execution count = 1.000000e+00

Attempting IC_opt on:   %call = call i32 %5(%class.A* %this.adjusted, i32
1, i32 2), !prof !8
with: !{!"indirect_call_targets", i64 1, !"_ZN1A8test_subEii", i64 1} in
function: _Z8testmainM1AFiiiE
CS hotness% = 1.000000e+02
Target hotness% = 100

== Basic Block Before ==
memptr.end:                                       ; preds =
%memptr.nonvirtual, %memptr.virtual
  %5 = phi i32 (%class.A*, i32, i32)* [ %memptr.virtualfn, %memptr.virtual
], [ %memptr.nonvirtualfn, %memptr.nonvirtual ]
  %call = call i32 %5(%class.A* %this.adjusted, i32 1, i32 2), !prof !8
ret i32 %call

...
!8 = !{!"indirect_call_targets", i64 1, !"_ZN1A8test_subEii", i64 1}

== Basic Blocks After == // code after the ICP pass
memptr.end:                                       ; preds =
%memptr.nonvirtual, %memptr.virtual
  %5 = phi i32 (%class.A*, i32, i32)* [ %memptr.virtualfn, %memptr.virtual
], [ %memptr.nonvirtualfn, %memptr.nonvirtual ]
  %6 = bitcast i32 (%class.A*, i32, i32)* %5 to i8*
  %7 = bitcast i32 (%class.A*, i32, i32)* @_ZN1A8test_subEii to i8* %8 =
icmp eq i8* %6, %7
  br i1 %8, label %if.true, label %if.false, !prof !8

if.true:                                          ; preds = %memptr.end
  %10 = call i32 @_ZN1A8test_subEii(%class.A* %this.adjusted, i32 1, i32
2) #3
  br label %if.merge

if.false:                                         ; preds = %memptr.end
  %call = call i32 %5(%class.A* %this.adjusted, i32 1, i32 2), !prof !9 br
label %if.merge

if.merge:                                         ; preds = %if.false,
%if.true
  %9 = phi i32 [ %10, %if.true ], [ %call, %if.false ]
  ret i32 %9

...
attributes #3 = { inlinehint }
!8 = !{!"branch_weights", i32 1, i32 0}
!9 = !{!"indirect_call_targets", i64 0}
----------------------------------------------------------------------------

At LLVM IR level the ICP pass sees virtual function calls as normal
indirect calls, and proceeds as in the first example. Currently ICP is
oblivious to vtables and virtial function support. On a more complicated
example found in eon benchmark, one future enhancement is to identify that
an indirect call is virtual and change the comparison (shown at a higher
IR level) from

  if (ptr->foo == A::foo)
to
  if (ptr->_vptr == A::_vtable)

This will sink one load from the original block into the less frequently
executed if.false block. This opportunity was found by Balaram Makam.


4. New enhancement patch
-------------------------
Currently our implementation has the following shortcomings:
a. Our heuristics do not depend on the global information on function
counts. It could be that none of the indirect call sites are contributing
highly to the overall calls. Because our current implementation is
deciding what to inline based on the indirect call site sum only, it could
be inlining functions that are in essence cold when all functions in the
source base are considered. This situation will be improved when the
function entry profile counts become available in llvm IR.

b. Current implementation only transforms the first hot target, the rest
of the targets are never considered even if they are relatively hot.

We are evaluating a new solution which depends on the
presence/availability of functions counts in clang. We form a sorted
multiset of all functions counts. A given indirect target is considered
for inlining if the target’s count at the call site falls within one of
the ranges that form the top 0-10%, 10-20% or 20-30% of the sorted
multiset.  We’ve added checks which become stricter as the target count
falls farther away from the top most called 10%, 20% or 30% of all
functions respectively.

Targets that are classified as making calls to one of the top most called
30% of the functions receive inline hints.  Inline hints are communicated
from clang down to LLVM in metadata. Then, on the LLVM side the
transformation pass uses the metadata field for the hint to add an inline
hint at the transformed call site.

-------------------------
[1] G. Aigner and U. Hölzle. Eliminating virtual function calls in C++
programs. ECOOP, 1996.
[2] X. Li, R. Ashok, R. Hundt. Lightweight Feedback-Directed Cross-Module
Optimization. CGO, 2010.
















More information about the llvm-dev mailing list