<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, May 6, 2015 at 5:38 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>

<br>

On 04/14/2015 11:56 AM, <a href="mailto:betulb@codeaurora.org" target="_blank">betulb@codeaurora.org</a> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 04/10/2015 09:25 AM, <a href="mailto:betulb@codeaurora.org" target="_blank">betulb@codeaurora.org</a> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 04/09/2015 11:06 AM, Betul Buyukkurt wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

In <a href="http://reviews.llvm.org/D8908#153838" target="_blank">http://reviews.llvm.org/D8908#153838</a>, @reames wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Have the IR level construct patches made it up for review?  If so,<br>

can<br>

</blockquote>

So far I've posted two patches. These two patches should apply cleanly<br>

to the tip, working with the present profile infrastructure. The next<br>

set of patches will be the enabler ones: i.e. three more patches one<br>

for<br>

each of clang, llvm and compiler-rt. Clang patch will be up for review<br>

later today.<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

you send me a link?  I managed to miss them.<br>

</blockquote>

So far there is this patch and the instrinsic instruction definitions:<br>

<a href="http://reviews.llvm.org/D8877" target="_blank">http://reviews.llvm.org/D8877</a>. All patches are necessary for getting<br>

the<br>

IC targets and having them displayed by the llvm-profdata.<br>

</blockquote>

Ok, I'm really not convinced that the instrumentation code needs to be<br>

or should be an intrinsic.  This seems like something which should be<br>

emitted by the frontend and optimized like any other code.  To say this<br>

a different way, my instrumentation is going to be entirely different<br>

than your instrumentation.<br>

<br>

Having said that, I really don't care about this part of the proposed<br>

changes since they aren't going to impact me at all.  I'm am<br>

specifically not objecting to the changes, just commenting.  :)<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I'm assuming this will be some type of per call site metadata?<br>

</blockquote>

We do assign metadata at the indirect call sites. Format looks like as<br>

follows:<br>

<br>

!33 = metadata !{metadata !"indirect_call_targets", i64<br>

<total_exec_count>, metadata !"target_fn1â€ , i64 <target_fn1_count>,<br>

metadata !"target_fn2â€ , i64 <target_fn2_count>, â€¦.}<br>

<br>

Currently, we're recording only the top most called five function<br>

names<br>

at each indirect call site. Following the string literal<br>

â€œindirect_call_targetsâ€  are the fields  <total_exec_count> i.e. a<br>

64<br>

bit value for the total number of times the indirect call is executed<br>

followed by the function names and execution counts of each target.<br>

</blockquote>

This was the part I was trying to ask about.  I really want to see<br>

where<br>

you're going with this optimization wise.  My naive guess is that this<br>

is going to be slightly off for what you actually want.<br>

<br>

Assuming you're going for profile guided devirtualization (and thus<br>

inlining), being able to check the type of the receiver (as opposed to<br>

the result of the virtual lookup) might be advantageous.  (Or, to say<br>

it<br>

differently, that's what I'm used to seeing.  Your approach might be<br>

completely reasonable, it's just not what I'm used to seeing.)  Have<br>

you<br>

thought about the tradeoffs here?<br>

</blockquote>

Not sure if I understood the problem here,<br>

</blockquote>

First, I am not trying to say there is a problem with your approach; I<br>

am only saying that it's not what I would have expected based on past<br>

experience.  You may be entirely correct in your approach, you just need<br>

to convince me of that.  :)<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

however, we're recording both<br>

the target address and the addresses/names of the instrumented functions<br>

during the execution of the instrumented binary. During profile reading<br>

these addresses are used to match the target addresses to corresponding<br>

functions.<br>

</blockquote>

Ok, let's start from the basics.  For profile guided devirtualization,<br>

you're constructing a cache from (something) to function pointer and<br>

using that cache lookup to enable inlining of the hot target.  You have<br>

two standard choices on what to use as your cache key: the result of the<br>

virtual lookup and the inputs to the virtual lookup.<br>

<br>

Option 1 - Inputs to virtual lookup<br>

if ((receiver, vtable index) == what I predicted)<br>

    tartget_I_predicted(); // inline me!!<br>

else {<br>

    target = full virtual dispatch();<br>

    target();<br>

}<br>

<br>

Option 2 - result of virtual lookup<br>

target = full virtual dispatch();<br>

if ('target' == what I predicted)<br>

    tartget_I_predicted(); // inline me!!<br>

else {<br>

    target();<br>

}<br>

<br>

You seem to be proposing option 2.  I'm saying that I'm used to seeing<br>

option 1 used.  Both approaches have their appeal, I'm just asking you<br>

to explain *why* you've chosen the one you apparently have.<br>

</blockquote>

Not all indirect calls occur from C++ like codes. We're profiling and<br>

optimizing out indirect calls from C codes as well. We're seeing up to 8%<br>

gains on individual benchmarks in spec. This was measured on our platform.<br>

</blockquote></div></div>

This answers my original question and is a good point.  I was definitely looking at this through a virtual dispatch centric view.<br>

<br>

I have no objection to supporting profiling/predication of arbitrary indirect call sites.  As someone else pointed out in the discussion, there's no reason we can't specialize for virtual call profiling and support the generic profiling as a fallback.  That seems like a good way forward.  The details still need to worked out, but the general approach makes sense.<br></blockquote><div><br></div><div>Right. Besides the runtime support has been enhanced to be more generic, which makes type/vtable profiling just as easy in the future.</div><div><br></div><div>David</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

p.s. My bandwidth to devote to this topic has pretty much disappeared.  I probably won't be responding further unless specifically requested to.<div class="HOEnZb"><div class="h5"><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

During optimization, target function name should be adequate to<br>

retrieve the function type information which then can be used to compare<br>

the types of the arguments as well as the return type w/ the call site<br>

to<br>

ensure the match.<br>

</blockquote>

Er, I'm now really confused.  With a properly constructed check, there<br>

should be no runtime checking of types required.  Can you explain why<br>

you think this is needed?<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

To clarify, our interest at this time is to upstream all the<br>

infrastructure related changes. The optimizations using the profile data<br>

will not be part of our initial set of patches. On the other hand,<br>

agreeing on the metadata format at the call sites would be advantageous<br>

in<br>

terms of supporting multiple profile methods.<br>

</blockquote>

See my other response to this thread.<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Thanks,<br>

-Betul<br>

<br>

<br>

</blockquote></blockquote>

<br>

</blockquote>

<br>

<br>

</div></div><br>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div><br></div></div>