<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 9, 2015 at 4:29 PM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">On Wed, Sep 9, 2015 at 1:16 AM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

><br>

><br>

> On Tue, Sep 8, 2015 at 10:34 PM, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>><br>

> wrote:<br>

>><br>

>> On Tue, Sep 8, 2015 at 10:00 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

>> > silvas added a comment.<br>

>> ><br>

>> > Fundamentally, this approach is not a long-term solution to reducing PGO<br>

>> > data size. The correct long-term solution for reducing PGO data size<br>

>> > requires passing the binary to llvm-profdata.<br>

>><br>

>> This basically goes back to my original proposed solution.  Whether<br>

>> this solution is longer term or not depends on whether it is<br>

>> sufficiently effective in reducing size -- the data we have now<br>

>> confirm that it is indeed effective.<br>

><br>

><br>

> If we have the binary, the size of the names should not matter because they<br>

> should not be present in the runtime memory image or in the raw profile<br>

> data.<br>

<br>

<br>

</span>If we can tolerate the inconvenience of passing binary to<br>

llvm-profdata,</blockquote><div><br></div><div>Quick aside: for compatibility reasons (and I think for general convenience reasons), this would need to be opt-in (we can consider breaking compatibility, but for a user-visible frontend feature this is likely a long discussion and it is easier to just provide compatibility). But otherwise, I think requiring the binary is clearly the right approach.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> I think there is a very simple and nice way of doing<br>

this.<br>

<br>

Currently LLVM emits symbols like __llvm_profile_name_foo with<br>

contents 'foo', this is basically a waste -- the string of 'foo' is<br>

actually a substring of the symbol name itself.<br>

<br>

In other words, we can choose to continue emit __llvm_profile_name_xxx<br>

symbols as before, but with zero size.  llvm-profdata/llvm-cov tool<br>

can read the symtab and reconstruct the md5hash values for them.<br>

<br>

With this change,<br>

<br>

1) no new option is needed<br>

2) binary size, raw/indexed profile size will be minimized<br>

consistently (even when coverage mapping is on).<br></blockquote><div><br></div><div>Could you explain how this affects the 4 different sizes I lay out below? It's not clear how saving a string by pointing into the symbol name affects 2. or 3. below.</div><div><br></div><div>I think we should make clear the different "sizes", which may matter differently for different use cases:</div><div>1. memory overhead in the program image. Ideal case: just the raw counter values (+ relevant text size increase)</div><div>2. size overhead in the raw profile. Ideal case: just the raw counter values.</div><div>3. size overhead in the indexed profile. No clear ideal case.</div><div>4. raw binary size of the program's ELF file. No clear ideal case.<br></div><div>(did I miss any?)</div><div><br></div><div>For me, only 1. is of primary importance. 2. and 3. are not relevant for programs of the size I have (~ size of Clang) and my users' setups (an entire host machine is available to handle the data, and only a single profile is written at a given time). 4. is irrelevant since large binaries must already be handled anyway for debug info.</div><div><br></div><div>For an embedded use case 1. and 2. are important but 3,4 are irrelevant since that happens on the host (assume host has much more resources than target).</div><div><br></div><div>It sounds like for your use case, 2,3,4 are the primary concerns. Is this correct?</div><div><br></div><div>-- Sean Silva</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span class=""><br>

>Do you know how much improvement there would be over the current patch<br>

> if we could completely eliminate the names from the memory image and from<br>

> the raw profile?<br>

<br>

</span>There are 16 byte overhead per function for emitting the name. For<br>

clang, the overhead of the additional rodata is about 6% of the total<br>

text size.<br>

<br>

Note that if we don't use strings (name or md5string) but just 64bit<br>

md5hash as the key (as done by my original patch), the perf-function<br>

overhead for the keys  is zero -- because the Name reference field in<br>

llvm_profile_data_* structure can be reused.<br>

<span class=""><br>

> I.e., how large are the just the counters by themselves<br>

> compared to the current raw profile data? Quickly looking at some source<br>

> code (but not testing anything) suggests that there is one `struct<br>

> __llvm_profile_data` per counter. Depending on the pointer size, this struct<br>

> is 3-4x as large as the counter itself.<br>

<br>

</span>_llvm_profile_data_*** is  one per function. The size of the<br>

per-function data is 32 bytes (64bit arch), so __llvm_prf_data section<br>

is larger than __llvm_prf_name.  So the largest is the counter section<br>

when this patch is applied. Without this patch, the largest is<br>

__llvm_prf_name (way larger).<br>

<span class=""><br>

<br>

><br>

> One other thing that bothers me is that (while I don't use llvm-cov) this<br>

> size reduction in this patch is incompatible with llvm-cov.<br>

<br>

</span>yes -- but they can be made compatible if we are ok with more drastic change.<br>

<span class=""><br>

><br>

> My thinking is that if we want to provide a feature for reducing the<br>

> overhead in the runtime memory image or in the profraw, it should:<br>

> - be fully general (e.g. cover the llvm-cov use case)<br>

> - have nearly optimal size (3-4x overhead is not acceptable)<br>

> - be completely transparent to the user, except for the fact that the binary<br>

> must be passed to llvm-profdata (but not to clang; llvm-profdata should<br>

> extract all necessary information from the binary)<br>

><br>

> I think this is achievable and not hugely more complicated than the current<br>

> patch.<br>

><br>

<br>

</span>agreed (other than the 3-4x overhead statement above).<br>

<span class=""><br>

> When it comes to adding a feature to clang that we must maintain forever, it<br>

> makes more sense to have it be something like<br>

> `-fprofile-instr-reduced-image-size` which can have a clear contract and the<br>

> implementation of which we can continually improve (or make optimal). For<br>

> this, the transparency is important (e.g. output of llvm-profdata should be<br>

> the same). The current patch is highly tied to the precise implementation of<br>

> what we are doing to reduce the size, and does not leave margin for<br>

> improvement or for changing the underlying approach to cover new use cases<br>

> (see below).<br>

><br>

> What do you think?<br>

<br>

</span>See above, we can even do without the option and make every use case happy.<br>

<span class=""><br>

><br>

> Out of curiosity, could you post the patch for your original approach that<br>

> required the binary for llvm-profdata? Could it potentially work as an<br>

> initial approach to "`-fprofile-instr-reduced-image-size`" that we could<br>

> incrementally build on?<br>

><br>

<br>

</span>I have not added the llvm-profdata debug dump support yet.  If we<br>

reach an agreement that is a good thing to do (which I agree), I will<br>

add that submit that patch instead.<br>

<span class=""><br>

><br>

> The idea is that if a user says that there is too much overhead in the<br>

> runtime memory image or in the profraw, then we can tell them to use a<br>

> particular option to improve the situation, with the only caveat being that<br>

> they must pass the binary to llvm-profdata;<br>

<br>

</span>I don't have supporting data, but if majority of users do not need<br>

debug symbol info for llvm-profdata (that often), would it be better<br>

to require passing binary if per function dump is needed? Post<br>

proposing script can also be easily developed for that. It also makes<br>

the use model consistent with llvm-cov.<br>

<span class=""><br>

>besides that, there are no<br>

> strings attached, and no hidden issues for the user (e.g. in an embedded<br>

> project, the code may be mostly C and the function names may typically be<br>

> shorter than the hash, so this patch could make things worse). The current<br>

> patch assumes that function name size is the primary overhead which makes<br>

> PGO not usable; do we don't have a reason to believe that to be the case in<br>

> general? I am interested to hear from the other commenter on the thread who<br>

> mentioned issues with pgo size; still our sample set is very small.<br>

<br>

</span>For C programs, using name hash string instead of name itself may<br>

actually increase overhead -- it is probably a good reason that the<br>

zero overhead approach is preferred.  Here are some data from SPEC<br>

(average name size):<br>

<br>

GCC (C): 20<br>

SJENG (C): 12<br>

GZIP (C): 11<br>

BZIP2 (C): 15<br>

MCF (C) : 15<br>

LIBQUANTUM (C): 17<br>

HMMER (C) : 13<br>

GOBMK (C): 20<br>

<br>

Xalan(C++) : 61<br>

DealII (C++): 58<br>

Clang (C++): 82<br>

<br>

Google benchmarks<br>

<span class="">><br>

> It seems premature to think that function name size is the greatest overhead<br>

> in general since there are only O(#functions) of them, while there are<br>

> O(#counters) unnecessary instances of `struct __llvm_profile_data` (and that<br>

> struct is reasonably large compared to typical identifiers, even mangled)<br>

<br>

</span>llvm_profile_data is also O(#functions).<br>

<span class=""><br>

. A<br>

> quick exploration in Mathematica suggests that for Clang, the bulk of the<br>

> size overhead of function names is concentrated in the range 50-150<br>

> characters (see <a href="http://i.imgur.com/en2mpBF.png" rel="noreferrer" target="_blank">http://i.imgur.com/en2mpBF.png</a>; area under the curve of the<br>

> first plot is total size of identifiers, with horizontal axis being bins by<br>

> identifier size; second plot is left to right accumulated values of the<br>

> first plot). Since `struct __llvm_profile_data` is ~25 bytes, this means<br>

> that we need only assume that functions typically have 2-6 counters for the<br>

> overhead of the structs to be comparable to the mangled names.<br>

<br>

</span>llvm_prof_data is 32byte and it is per-function.<br>

<span class=""><br>

ave negligible effect<br>

> on the size as a whole.<br>

><br>

>><br>

>><br>

>> >The approach of using name hashes is also not an incremental step towards<br>

>> > the long-term solution. So thinking about it a bit more, I'm on the fence<br>

>> > about whether a quick solution like this one even makes sense to do.<br>

>><br>

>> I am on the fence too, but leaning towards having this one as the<br>

>> longer term solution -- it is cleaner and less disruptive. It does not<br>

>> require bumping up the version numbers. (On the other hand, my<br>

>> original patch does require version change)<br>

>><br>

>> > I am very apprehensive of adding complexity to make this approach work,<br>

>> > in particular threading information through many layers, as this is likely<br>

>> > to get in the way of future work towards the long-term solution.<br>

>><br>

>> I don't see the complexity actually. Other than some refactoring, the<br>

>> code here is very non-intrusive. I don't think it will get in the way<br>

>> of 'longer term' solution if it ever became a need to do.<br>

><br>

><br>

> My complexity-detector must be tuned differently than yours.<br>

><br>

> For example, -instr-prof-with-namehash introduces a weird coupling between<br>

> the frontend and the middle-end. It seems to just be used for setting<br>

> IncludeNamesInProfile, which seems to be completely orthogonal to hashing?<br>

<br>

</span>This option is only useful for testing purpose (with opt).<br>

<br>

Anyway, I am fine with going forward with the most aggressive size<br>

reduction approach that every one can benefit consistently.<br>

<span class=""><font color="#888888"><br>

David<br>

</font></span><div class=""><div class="h5"><br>

<br>

<br>

><br>

>><br>

>><br>

>> ><br>

>> > My takeaways from the RFC thread (maybe I am misunderstanding) is that<br>

>> > we are still using the same format just with one string vs. another.<br>

>><br>

>> yes that is true -- that is why version number is not bumped up.<br>

>><br>

>> > We shouldn't need to rev the version and there shouldn't need to be any<br>

>> > changes in compiler-rt or llvm.<br>

>><br>

>> Note that version is not changed. It is just the version field is now<br>

>> partitioned to allow more information to be passed around. This is a<br>

>> very useful feature to have.<br>

>><br>

>><br>

>> >Clang can easily diagnose (as an isolated piece of logic) if the user did<br>

>> > the pgo training run with md5 mode enabled but the md5 mode flag was not<br>

>> > passed or vice versa (it is easy to tell if input function names are md5<br>

>> > sums or not as a diagnostic heuristic).<br>

>><br>

>> If we want to enforce matching options in profile-gen and profile-use,<br>

>> the mismatched option and profile data should result in hard errors --<br>

>> current implementation allows it to be caught as a hard error (with<br>

>> encoding in the profile data).<br>

><br>

><br>

> I don't see in the patch where this error is issued (nor where that error is<br>

> tested).<br>

><br>

>><br>

>><br>

>> >It is also inconsistent for llvm-profdata to be able to be passed<br>

>> > `-function=foo` for an md5 profile; the user should have to pass (truncated)<br>

>> > md5 of the function name.<br>

>><br>

>> This is usability and convenience -- not inconsistency.<br>

><br>

><br>

> Fundamentally, trying to paper over the fact that what is in the file is md5<br>

> sums is wrong. There are many ways that it sneaks through with this<br>

> approach, and I don't see a fundamental reason to try to add a special case<br>

> for this one option. If this is needed for convenience, a separate patch can<br>

> add an option `-md5-function=` which automatically takes the md5 hash of the<br>

> provided argument to save the trouble.<br>

><br>

> -- Sean Silva<br>

><br>

>><br>

>><br>

>> ><br>

>> ><br>

>> > ================<br>

>> > Comment at:<br>

>> > test/Instrumentation/InstrProfiling/profiling_md5hash_key.ll:5-6<br>

>> > @@ +4,4 @@<br>

>> > +<br>

>> > +@__llvm_profile_name_foo = hidden constant [16 x i8]<br>

>> > c"5cf8c24cdb18bdac"<br>

>> > +; CHECK: @__llvm_profile_name_foo = hidden constant [16 x i8]<br>

>> > c"5cf8c24cdb18bdac", section "__DATA,__llvm_prf_names", align 1<br>

>> > +@__llvm_profile_name_bar = hidden constant [17 x i8]<br>

>> > c"e413754a191db537\00"<br>

>> > ----------------<br>

>> > This is backward from a standard md5 sum in hex. We should maintain the<br>

>> > invariant that the resulting symbol name is a substring of a standard md5<br>

>> > sum. Otherwise it is very difficult to explain how to get the hash for a<br>

>> > function.<br>

>> ><br>

>> > E.g. we should be able to say "the symbol name is effectively replaced<br>

>> > by the first half of its md5 sum in hex":<br>

>><br>

>><br>

>> Good catch. That can be fixed.<br>

>><br>

>> thanks,<br>

>><br>

>> David<br>

>><br>

>> ><br>

>> > ```<br>

>> > Sean:~ % echo -n 'foo' | md5<br>

>> > acbd18db4cc2f85cedef654fccc4a4d8<br>

>> > Sean:~ % echo -n 'foo' | md5 | head -c 16<br>

>> > acbd18db4cc2f85c<br>

>> > ```<br>

>> ><br>

>> > The symbol name string should be `acbd18db4cc2f85c`.<br>

>> ><br>

>> ><br>

>> > <a href="http://reviews.llvm.org/D12715" rel="noreferrer" target="_blank">http://reviews.llvm.org/D12715</a><br>

>> ><br>

>> ><br>

>> ><br>

><br>

><br>

</div></div></blockquote></div><br></div></div>