[PATCH] Indirect call target profiling related profile reader/writer changes

Mon Apr 13 13:51:52 PDT 2015

On 04/13/2015 11:20 AM, Xinliang David Li wrote:
> Indirect call promotion + inlining does have very large performance 
> benefit. IIRC, it improves eon, h264ref, povray, gap, ......etc. Many 
> requires cross module inlining too. For instance povray improves >20% 
> (if my memory does not betray me).  Our internal programs all show 
> large improvement with it.
I am not questioning the benefit of profile guided devirtualization and 
inlining.  Getting these optimizations implemented is a clear win if 
done right.

I am seeking clarification on the approach being taken to implement them 
and the design choices being proposed.  That's a different matter.
>
> David
>
>
> On Mon, Apr 13, 2015 at 11:02 AM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>
>
>     On 04/13/2015 10:57 AM, betulb at codeaurora.org
>     <mailto:betulb at codeaurora.org> wrote:
>
>                 On Apr 10, 2015, at 9:25 AM, betulb at codeaurora.org
>                 <mailto:betulb at codeaurora.org> wrote:
>
>                     On 04/09/2015 11:06 AM, Betul Buyukkurt wrote:
>
>                         In http://reviews.llvm.org/D8908#153838,
>                         @reames wrote:
>
>                             Have the IR level construct patches made
>                             it up for review?  If so,
>                             can
>
>                         So far I've posted two patches. These two
>                         patches should apply cleanly
>                         to the tip, working with the present profile
>                         infrastructure. The next
>                         set of patches will be the enabler ones: i.e.
>                         three more patches one
>                         for
>                         each of clang, llvm and compiler-rt. Clang
>                         patch will be up for review
>                         later today.
>
>                             you send me a link?  I managed to miss them.
>
>                         So far there is this patch and the instrinsic
>                         instruction definitions:
>                         http://reviews.llvm.org/D8877. All patches are
>                         necessary for getting
>                         the
>                         IC targets and having them displayed by the
>                         llvm-profdata.
>
>                     Ok, I'm really not convinced that the
>                     instrumentation code needs to be
>                     or should be an intrinsic.  This seems like
>                     something which should be
>                     emitted by the frontend and optimized like any
>                     other code.  To say this
>                     a different way, my instrumentation is going to be
>                     entirely different
>                     than your instrumentation.
>
>                     Having said that, I really don't care about this
>                     part of the proposed
>                     changes since they aren't going to impact me at
>                     all.  I'm am
>                     specifically not objecting to the changes, just
>                     commenting.  :)
>
>                             I'm assuming this will be some type of per
>                             call site metadata?
>
>                         We do assign metadata at the indirect call
>                         sites. Format looks like as
>                         follows:
>
>                         !33 = metadata !{metadata
>                         !"indirect_call_targets", i64
>                         <total_exec_count>, metadata !"target_fn1Ã¢Â€Â
>                         , i64
>                         <target_fn1_count>,
>                         metadata !"target_fn2Ã¢Â€Â , i64
>                         <target_fn2_count>, Ã¢Â€Â¦.}
>
>                         Currently, we're recording only the top most
>                         called five function
>                         names
>                         at each indirect call site. Following the
>                         string literal
>                         Ã¢Â€Âœindirect_call_targetsÃ¢Â€Â  are the
>                         fields  <total_exec_count>
>                         i.e. a 64
>                         bit value for the total number of times the
>                         indirect call is executed
>                         followed by the function names and execution
>                         counts of each target.
>
>                     This was the part I was trying to ask about.  I
>                     really want to see
>                     where
>                     you're going with this optimization wise.  My
>                     naive guess is that this
>                     is going to be slightly off for what you actually
>                     want.
>
>                     Assuming you're going for profile guided
>                     devirtualization (and thus
>                     inlining), being able to check the type of the
>                     receiver (as opposed to
>                     the result of the virtual lookup) might be
>                     advantageous.  (Or, to say
>                     it
>                     differently, that's what I'm used to seeing. Your
>                     approach might be
>                     completely reasonable, it's just not what I'm used
>                     to seeing.)  Have
>                     you
>                     thought about the tradeoffs here?
>
>                 Not sure if I understood the problem here, however,
>                 we're recording both
>                 the target address and the addresses/names of the
>                 instrumented functions
>                 during the execution of the instrumented binary.
>                 During profile reading
>                 these addresses are used to match the target addresses
>                 to corresponding
>                 functions. During optimization, target function name
>                 should be adequate
>                 to
>                 retrieve the function type information which then can
>                 be used to compare
>                 the types of the arguments as well as the return type
>                 w/ the call site
>                 to
>                 ensure the match.
>
>                 To clarify, our interest at this time is to upstream
>                 all the
>                 infrastructure related changes. The optimizations
>                 using the profile data
>                 will not be part of our initial set of patches. On the
>                 other hand,
>                 agreeing on the metadata format at the call sites
>                 would be advantageous
>                 in
>                 terms of supporting multiple profile methods.
>
>                 Thanks,
>                 -Betul
>
>             I have the same questions and concerns as Philip. If the
>             goal of this is
>             to support devirtualization, this is not how I would
>             expect it to be done.
>             If you really want something more general, can you provide
>             data to show
>             that it is worth the cost? Recording all that information
>             when profiling
>             will not be cheap, and it adds significant complexity as well.
>
>         There was an earlier email of mine that went to the llvm-dev
>         mailing list
>         discussing about the implementation dependencies and gains/costs.
>         http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084271.html
>
>     On that thread, you claim an 8% performance boost on spec. What
>     optimizations did you implement to get that?  Can you outline the
>     structure?
>
>
>             I also donâ€™t think we should consider taking all of your
>             infrastructure
>             changes without first seeing at least preliminary versions
>             of the
>             optimization patches that demonstrate the benefits.
>
>                         Thanks.
>                         -Betul
>
>
>                         http://reviews.llvm.org/D8908
>                         <http://reviews.llvm.org/D8908>
>
>                         EMAIL PREFERENCES
>                         http://reviews.llvm.org/settings/panel/emailpreferences/
>                         <http://reviews.llvm.org/settings/panel/emailpreferences/>
>
>
>
>
>
>     _______________________________________________
>     llvm-commits mailing list
>     llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150413/47df675b/attachment.html>