<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

><br>

><br>

> On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>><br>

> wrote:<br>

>><br>

>> ><br>

>> > I think it is reasonable to simply replace the key we currently use with<br>

>> > MD5(key) for getting a size reduction.  In practice for my use cases, I<br>

>> > have<br>

>> > not observed any of the issues you mentioned under "Large size of<br>

>> > overhead<br>

>> > can limit the usability of PGO greatly", but I can understand that some<br>

>> > of<br>

>> > these issues could become problems in Google's use case. I would<br>

>> > personally<br>

>> > prefer to keep the existing behavior as the default (see below), and<br>

>> > have<br>

>> > MD5(key) as an option.<br>

>><br>

>> The problem is that this requires profile format changes.<br>

><br>

><br>

> Why? AFAIK the "function name" is just an arbitrary string. Using s or<br>

> MD5(s) shouldn't matter. Of course, the user will need to pass consistent<br>

> flags to clang.<br>

<br>

</span>The raw format for 64bit target can be made 'unchanged', but not for<br>

the 32bit raw format -- the nameptr field is only 32bit.<br>

<br>

The indexed format can not be made the same --  The ondisk profile<br>

record layout totally changes. The key field changes from a blob of<br>

chars into an 64bit integer.<br></blockquote><div><br></div><div>An MD5 sum cannot be represented as a blob of chars?</div><div><br></div><div>Or to say it another way, suppose that Itanium mangling required as a final step to replace the string with its md5 sum in hex. Therefore all symbol names are "small". My understanding is that this is effectively all your patch is doing.</div><div><br></div><div>If this is not the case, you should show your current patch so that we can discuss things concretely.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div><div class="h5">><br>

><br>

>><br>

>> It will be<br>

>> very messy to support multiple formats in instr-codegen and<br>

>> instr-runtime.  For compatibility concerns, the reader is taught to<br>

>> support previous format, but the changes there are isolated (also<br>

>> expected to be removed in the future).<br>

>><br>

>> ><br>

>> > My primary concern is that if the function name are not kept at all<br>

>> > stages,<br>

>> > then it becomes difficult to analyze the profile data in a standalone<br>

>> > way.<br>

>> > Many times, I have used `llvm-profdata show -all-functions foo.profdata`<br>

>> > on<br>

>> > the resulting profile data and then imported that data into Mathematica<br>

>> > for<br>

>> > analysis.<br>

>><br>

>> This is certainly a very valid use case.<br>

>><br>

>> >My understanding of your proposal is that `llvm-profdata show<br>

>> > -all-functions foo.profdata` will not show the actual function names but<br>

>> > instead MD5 hashes,<br>

>><br>

>> Yes.<br>

>><br>

>> To support your use case, there are two solutions:<br>

>><br>

>> 1) user can add -fcoverage-mapping option in the build<br>

>> 2) introduce a new option -fprofile-instr-names that force the<br>

>> emission of the name sections in the .o file. This is similar to 1),<br>

>> but no covmap section is needed.<br>

>><br>

>> llvm-profdata tool  will be taught to read the name section and attach<br>

>> function names to the profile records.<br>

><br>

><br>

> Needing to pass the executable to llvm-profdata would cause deployment<br>

> issues for my customers in practice.<br>

<br>

</div></div>Why? The deployment needs to pass the profile data anyway right?</blockquote><div><br></div><div>Yes, but not the executable.</div><div><br></div><div>The PGO training run is likely being run by a gameplay tester (non-programmer). In general the binary will not be lying around as a loose file anywhere, it will be part of a full package of the binary+assets (think like what will end up on a bluray disc). A game's binary *completely useless* without the assets, so except locally on a programmer's machine while they iterate/debug, there is no reason for a binary to ever exist as a standalone file.<br></div><div><br></div><div>I'm not saying that needing the binary is insurmountable in any particular scenario. Just that it will cause a strict increase in the number of issues to deploying PGO.</div><div><br></div><div>These are much bigger "compatibility concerns" for me than for newer toolchains to accept the old format. For a change in format I can easily tell my users to replace an exe with a newer one and that is all they need to do and it takes 10 seconds, guaranteed. A workflow change is potentially a massive disruption and guaranteed to take more than 10 seconds to fix (perhaps hours or days).</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">  This<br>

is no different from llvm-cov usage model.<br></blockquote><div><br></div><div>In practice, getting the performance of PGO is a higher priority for my users, so we should not assume that llvm-cov is being used.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span class=""><font color="#888888"><br>

David<br>

</font></span><div class=""><div class="h5"><br>

<br>

<br>

><br>

>><br>

>><br>

>> Note that with 1) or 2), the user can still benefit from the reduced<br>

>> profile size.<br>

><br>

><br>

> Let me reiterate that the size of the profile is not a problem I have<br>

> observed in practice (nor have I heard of this being a problem in practice<br>

> until this thread). Therefore I'm skeptical of any changes to our default<br>

> behavior or any new requirements that are not opt-in.<br>

><br>

> -- Sean Silva<br>

><br>

>><br>

>><br>

>> thanks,<br>

>><br>

>> David<br>

>><br>

>><br>

>><br>

>><br>

>> >which will make it more difficult for me to do this kind<br>

>> > of analysis (would require using nm on the original binary, hashing<br>

>> > everything, etc.).<br>

>> ><br>

>> > btw, feel free to attach the patch even if it in a rough state. It can<br>

>> > still<br>

>> > help to clarify the proposal and be a good talking point. Fine-grained<br>

>> > patch<br>

>> > review for caring about the rough parts will happen on llvm-commits; the<br>

>> > rough parts will not distract the discussion here on llvm-dev.<br>

>> ><br>

>> > -- Sean Silva<br>

>> ><br>

>> >><br>

>> >><br>

>> >> thanks,<br>

>> >><br>

>> >> David<br>

>> >> _______________________________________________<br>

>> >> LLVM Developers mailing list<br>

>> >> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

>> >> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>> ><br>

>> ><br>

><br>

><br>

</div></div></blockquote></div><br></div></div>