<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 9, 2015 at 1:12 PM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We are now very close to push the final stage of the PGO size<br>

reduction task. Here is the updated plan going forward:<br>

<br>

1) In this round, the format of the indexed profile data won't be unchanged.<br>

2) there will be *no* changes in user interfaces to all profiling<br>

related tools including llvm-profdata, llvm-cov -- the change will be<br>

transparent in terms of PGO usability.<br></blockquote><div><br></div><div>Thanks. This was my main concern.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

3) The implementation will be using  compression for the function name<br>

section (the compression ratio is about 5:1). As a result, the linker<br>

input object size, unstripped binary size, stripped binary size,<br>

process image size, and raw profile data size will all be greatly<br>

reduced;<br></blockquote><div><br></div><div>Is the linker able to do string merging on the compressed sections?</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

4) The change will also greatly improve usability of coverage-mapping<br>

due to the reduced data size in memory.<br>

<br>

Before the final patch, there are a few more small preparation patches<br>

: 1) abstract naming reading into a class (ProfSymtab) (currently the<br>

reader uses/assumes the raw/uncompressed object section. 2) add<br>

compression support in ProfSymtab.<br>

<br>

thanks,<br>

<br>

David<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

<br>

<br>

On Wed, Oct 14, 2015 at 12:30 AM, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>> wrote:<br>

> I plan to divide the patch into series of small patches. The<br>

> preparation patches will mostly be refactoring changes with NFC. This<br>

> will minimize the size of final patch(es) with functional changes to<br>

> help discussions.<br>

><br>

> thanks,<br>

><br>

> David<br>

><br>

> On Fri, Oct 9, 2015 at 4:06 PM, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>> wrote:<br>

>> On Fri, Oct 9, 2015 at 3:58 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

>>><br>

>>><br>

>>> On Wed, Oct 7, 2015 at 11:12 PM, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>><br>

>>> wrote:<br>

>>>><br>

>>>> There is no further response to this, so I will assume general<br>

>>>> direction of solution-3 is acceptable ;)<br>

>><br>

>>> No response does not mean "LGTM".<br>

>>><br>

>><br>

>> What I meant is that the discussion can be moved on to the formal code<br>

>> review. I have not yet submitted the final patch for review yet.<br>

>> Before that is ready, continue using this thread to voice your<br>

>> concerns.<br>

>><br>

>> David<br>

>><br>

>><br>

>><br>

>><br>

>>> -- Sean Silva<br>

>>><br>

>>>><br>

>>>><br>

>>>> Solution-3 can be further improved. Instead of using static symbol<br>

>>>> table (with zero size __llvm_prf_nm symbols) to store function names<br>

>>>> for profile display and coverage mapping, the function names can be<br>

>>>> stored compressed in a non-allocatable section. The compression ratio<br>

>>>> for function name strings can be very high (~5x).  The covmapping data<br>

>>>> can also be made non-allocatable.<br>

>>>><br>

>>>> thanks,<br>

>>>><br>

>>>> David<br>

>>>><br>

>>>> On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>><br>

>>>> wrote:<br>

>>>> > Sorry for the late update. I finally found time to implement a solution<br>

>>>> > (Solution-3) that has the best size savings (for both PGO and coverage<br>

>>>> > testing) with symbolic information available.  Here is a re-cap of what<br>

>>>> > we<br>

>>>> > have discussed so far:<br>

>>>> ><br>

>>>> > Solution-1:<br>

>>>> ><br>

>>>> > This is the original solution proposed. In this solution, A function<br>

>>>> > name's<br>

>>>> > MD5 hash is used as the primary key (combined with IR hash) for function<br>

>>>> > lookup. __llvm_prf_names section won't be emitted into the binary nor<br>

>>>> > dumped<br>

>>>> > into the profile data unless -fcoverage-mapping is also specified.<br>

>>>> ><br>

>>>> >     Pros:<br>

>>>> >         1. maximal reduction of instrumentation binary process image<br>

>>>> > size<br>

>>>> >         2. maximal reduction of object and unstripped binary size<br>

>>>> >         3. maximal reduction of raw profile data size<br>

>>>> >         4. maximal reduction of indexed profile data size<br>

>>>> ><br>

>>>> >    Cons:<br>

>>>> >         1. -fcoverage-mapping use case is not handled -- the size<br>

>>>> > problem<br>

>>>> > still exist<br>

>>>> >         2. profile dump with llvm-profdata no longer have function names<br>

>>>> > associated -- user needs to use postprocessing tool to get the<br>

>>>> > functionality<br>

>>>> >         3. function filtering with partial function name is not<br>

>>>> > supported<br>

>>>> >         4. Requires incompatible format change<br>

>>>> ><br>

>>>> ><br>

>>>> > Solution-2: (<a href="http://reviews.llvm.org/D12715" rel="noreferrer" target="_blank">http://reviews.llvm.org/D12715</a>)<br>

>>>> ><br>

>>>> > In this solution, the MD5 hash string is used to replace the raw name<br>

>>>> > string<br>

>>>> >       Pros:<br>

>>>> >           1. Very simple to implement<br>

>>>> >           2. have good reduction of all sizes for typical large C++<br>

>>>> > applications<br>

>>>> >           3. No profile data format change is required.<br>

>>>> ><br>

>>>> >       Cons:<br>

>>>> >           1. Still requires 16byte overhead per-function -- can actually<br>

>>>> > hurt C programs<br>

>>>> >           2. -fcoverage-mapping use case is still not handled<br>

>>>> >           3. The problem with llvm-profdata still exists (no symbolic<br>

>>>> > info,<br>

>>>> > partial filtering support)<br>

>>>> ><br>

>>>> ><br>

>>>> > Solution-3:<br>

>>>> ><br>

>>>> > This is the new solution I am proposing. It is basically an enhancement<br>

>>>> > of<br>

>>>> > Solution-1 with most of the weakness resolved.  The difference with<br>

>>>> > Solution-1 is<br>

>>>> >       1. Function name symbols are emitted into the symbol table as weak<br>

>>>> > externs. They don't occupy any space at runtime and can be easily<br>

>>>> > stripped.<br>

>>>> >       2. -fcoverage-mapping does not need special handling -- it<br>

>>>> > automatically benefit from the same size saving.<br>

>>>> >       3. llvm-cov is changed to read symbol info from the symtab instead<br>

>>>> > of<br>

>>>> > reading them from the name section data<br>

>>>> >       4. llvm-profdata is enhanced to take a binary as input and dump<br>

>>>> > profile with names attached. Function filtering is fully supported<br>

>>>> > (option<br>

>>>> > can also be introduced to force dumping names into binary and profile<br>

>>>> > data,<br>

>>>> > so that llvm-profdata use case is not changed at all).<br>

>>>> ><br>

>>>> > Pros:<br>

>>>> >        1. All the pros from Solution-1<br>

>>>> >        2. Size savings for coverage-mapping case<br>

>>>> > Cons:<br>

>>>> >        Format change is required for profile data and coverage mapping.<br>

>>>> ><br>

>>>> > The initial patch is here: <a href="http://reviews.llvm.org/D13251" rel="noreferrer" target="_blank">http://reviews.llvm.org/D13251</a><br>

>>>> ><br>

>>>> > With this patch, the size of a release clang binary with coverage<br>

>>>> > mapping is<br>

>>>> > reduced from 986M to 569M.<br>

>>>> ><br>

>>>> > If there are no major concerns,  I will carve out the patch into smaller<br>

>>>> > ones for review.<br>

>>>> ><br>

>>>> > thanks,<br>

>>>> ><br>

>>>> > David<br>

>>>> ><br>

>>>> ><br>

>>>> ><br>

>>>> ><br>

>>>> ><br>

>>>> ><br>

>>>> ><br>

>>>> > On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>><br>

>>>> > wrote:<br>

>>>> >><br>

>>>> >> >> >><br>

>>>> >> >> >> yes -- it is fixed length (8byte) blob which may include null<br>

>>>> >> >> >> byte<br>

>>>> >> >> >> in<br>

>>>> >> >> >> the middle.<br>

>>>> >> >> ><br>

>>>> >> >> ><br>

>>>> >> >> > For reference, MD5 sum is 16 bytes (128-bit):<br>

>>>> >> >> > <a href="https://en.wikipedia.org/wiki/MD5" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/MD5</a><br>

>>>> >> >><br>

>>>> >> >> yes, LLVM's MD5 hash only takes the lower 64bit.<br>

>>>> >> >><br>

>>>> >> >><br>

>>>> >> >> ><br>

>>>> >> >> >><br>

>>>> >> >> >><br>

>>>> >> >> >> ><br>

>>>> >> >> >> > Or to say it another way, suppose that Itanium mangling<br>

>>>> >> >> >> > required<br>

>>>> >> >> >> > as a<br>

>>>> >> >> >> > final<br>

>>>> >> >> >> > step to replace the string with its md5 sum in hex. Therefore<br>

>>>> >> >> >> > all<br>

>>>> >> >> >> > symbol<br>

>>>> >> >> >> > names are "small". My understanding is that this is effectively<br>

>>>> >> >> >> > all<br>

>>>> >> >> >> > your<br>

>>>> >> >> >> > patch is doing.<br>

>>>> >> >> >><br>

>>>> >> >> >> The key type before the change is StringRef, while the after the<br>

>>>> >> >> >> key<br>

>>>> >> >> >> type is uint64_t. Are you suggesting treating uint64_t md5 sum<br>

>>>> >> >> >> key<br>

>>>> >> >> >> as<br>

>>>> >> >> >> a string of 8 bytes or storing md5 has in text form which will<br>

>>>> >> >> >> double<br>

>>>> >> >> >> the size?<br>

>>>> >> >> ><br>

>>>> >> >> ><br>

>>>> >> >> > How much does this change the benefit? If most of the benefit is<br>

>>>> >> >> > avoiding<br>

>>>> >> >> > extraordinarily long mangled names then it may be sufficient.<br>

>>>> >> >> ><br>

>>>> >> >> > With IR-level instrumentation like Rong is pursuing the size may<br>

>>>> >> >> > be<br>

>>>> >> >> > reduced<br>

>>>> >> >> > sufficiently that we do not need the optimization proposed in this<br>

>>>> >> >> > thread.<br>

>>>> >> >> > For example, Rong found >2x size reduction on Google's C++<br>

>>>> >> >> > benchmarks,<br>

>>>> >> >> > which<br>

>>>> >> >> > I assume are representative of the extremely large Google binaries<br>

>>>> >> >> > that<br>

>>>> >> >> > are<br>

>>>> >> >> > causing the problems addressed by your proposal in this thread.<br>

>>>> >> >> > The<br>

>>>> >> >> > measurements you mention for Clang in this thread provide similar<br>

>>>> >> >> > size<br>

>>>> >> >> > reductions, so Rong's approach may be sufficient (especially<br>

>>>> >> >> > because<br>

>>>> >> >> > functions with extremely large mangled names tend to be small<br>

>>>> >> >> > inline<br>

>>>> >> >> > functions in header-only template libraries).<br>

>>>> >> >><br>

>>>> >> >> Late instrumentation helps many cases. In some cases (as shown in<br>

>>>> >> >> SPEC), the reduction in size is not as large. Reducing PGO overhead<br>

>>>> >> >> will lower the bar for its adoption.<br>

>>>> >> >><br>

>>>> >> >> ><br>

>>>> >> >> > Of the points you mention in "Large size of overhead can limit the<br>

>>>> >> >> > usability<br>

>>>> >> >> > of PGO greatly", many of the issues are hard limits that prevent<br>

>>>> >> >> > the<br>

>>>> >> >> > use<br>

>>>> >> >> > of<br>

>>>> >> >> > PGO. Do you have a lower bound on how much the size of the PGO<br>

>>>> >> >> > data<br>

>>>> >> >> > must<br>

>>>> >> >> > be<br>

>>>> >> >> > reduced in order to overcome the hard limits?<br>

>>>> >> >><br>

>>>> >> >> This is a static view:  Think about the situation where application<br>

>>>> >> >> size is ever increasing; also think about situation where we want to<br>

>>>> >> >><br>

>>>> >> >> collect more types of profile data. Think about situation where user<br>

>>>> >> >> want to run pgo binaries on small devices with tiny memory/storage<br>

>>>> >> >> ..<br>

>>>> >> ><br>

>>>> >> ><br>

>>>> >> > If we want to reduce memory overhead at runtime and reduce the size<br>

>>>> >> > of<br>

>>>> >> > the<br>

>>>> >> > raw profile data extracted from the target, there are clear<br>

>>>> >> > solutions.<br>

>>>> >> > Consider that debug info does not need to be loaded into the memory<br>

>>>> >> > image of<br>

>>>> >> > the target; why should information identifying each counter need to<br>

>>>> >> > be?<br>

>>>> >> > A file containing raw profile counters is a subset of a core dump; in<br>

>>>> >> > most<br>

>>>> >> > environments, a core dump does not need to have debug info or symbol<br>

>>>> >> > names<br>

>>>> >> > in it, but can be still be read in full detail in conjunction with<br>

>>>> >> > the<br>

>>>> >> > original binary.<br>

>>>> >><br>

>>>> >> Yes -- there are many alternatives:<br>

>>>> >> 1) emit the name key mapping as a side data at compile time, or<br>

>>>> >> 2) emit them into nonloadable sections of the object file.<br>

>>>> >><br>

>>>> >> Compared with the above, LLVM's existing design does have its own<br>

>>>> >> advantage -- making it easier for tool to access 'debug' info for<br>

>>>> >> counters.<br>

>>>> >><br>

>>>> >> LLVM's coverage testing, on the other hand, take a hybrid approach: It<br>

>>>> >> emits the coverage map as rodata, but does not pass it to the profile<br>

>>>> >> dumper. I think it is better to emit covmap as a side data not<br>

>>>> >> attached to target binary.<br>

>>>> >><br>

>>>> >><br>

>>>> >> ><br>

>>>> >> > Thus, as we require that the binary be passed to llvm-profdata, there<br>

>>>> >> > is<br>

>>>> >> > no<br>

>>>> >> > fundamental reason that the memory image of the program, or the raw<br>

>>>> >> > data<br>

>>>> >> > extracted from the program, must have any size overhead besides the<br>

>>>> >> > raw<br>

>>>> >> > values of the counters themselves and any text size increase for<br>

>>>> >> > incrementing them. If we are willing to impose this requirement on<br>

>>>> >> > users,<br>

>>>> >> > then as far as reducing memory overhead at runtime and reducing the<br>

>>>> >> > size<br>

>>>> >> > of<br>

>>>> >> > the raw profile data extracted from the target, using hashed function<br>

>>>> >> > names<br>

>>>> >> > is clearly the wrong direction.<br>

>>>> >> ><br>

>>>> >> > *Without* imposing the requirement of passing the binary to<br>

>>>> >> > llvm-profdata, I<br>

>>>> >> > do like the ability to use hashed function names like you are<br>

>>>> >> > proposing.<br>

>>>> >> > It<br>

>>>> >> > is a simple solution for reducing size overhead of function name<br>

>>>> >> > strings<br>

>>>> >> > with little complexity, as it is just swapping one string for<br>

>>>> >> > another.<br>

>>>> >><br>

>>>> >> Agree. The good news is that the overhead of hashed function names is<br>

>>>> >> small enough that makes this approach attractive.<br>

>>>> >><br>

>>>> >> thanks,<br>

>>>> >><br>

>>>> >> David<br>

>>>> >> ><br>

>>>> >> >><br>

>>>> >> >><br>

>>>> >> >> ><br>

>>>> >> >> > Obviously LLVM must be able to support the extremely large<br>

>>>> >> >> > binaries<br>

>>>> >> >> > in<br>

>>>> >> >> > your<br>

>>>> >> >> > configuration (otherwise what use is LLVM as a compiler ;) My<br>

>>>> >> >> > questions<br>

>>>> >> >> > are<br>

>>>> >> >> > primarily aimed at establishing which tradeoffs are acceptable for<br>

>>>> >> >> > supporting this (both for LLVM and for you guys).<br>

>>>> >> >><br>

>>>> >> >> As I said, with the modified proposal (after getting your feedback),<br>

>>>> >> >> no PGO users in LLVM land is going to lose anything/functionality.<br>

>>>> >> >> The<br>

>>>> >> >> end result will be net win for general users of LLVM (even though<br>

>>>> >> >> your<br>

>>>> >> >> customers don't care about it), not just 'us' as you have mentioned<br>

>>>> >> >> many times.<br>

>>>> >> >><br>

>>>> >> >> ><br>

>>>> >> >> > Btw, for us, the issue of PGO data size is not completely<br>

>>>> >> >> > immaterial<br>

>>>> >> >> > but<br>

>>>> >> >> > is<br>

>>>> >> >> > very different from your use case. For us, the primary issue is<br>

>>>> >> >> > the<br>

>>>> >> >> > additional memory use at run time, since PS4 games usually use<br>

>>>> >> >> > "all"<br>

>>>> >> >> > available memory. We had a problem with UBSan where the large<br>

>>>> >> >> > amount<br>

>>>> >> >> > of<br>

>>>> >> >> > memory required for storing the UBSan diagnostic data at runtime<br>

>>>> >> >> > required<br>

>>>> >> >> > the game programmers to manually change their memory map to make<br>

>>>> >> >> > room.<br>

>>>> >> >> > +Filipe, do you remember how much memory UBSan was using that<br>

>>>> >> >> > caused<br>

>>>> >> >> > a<br>

>>>> >> >> > problem?<br>

>>>> >> >> ><br>

>>>> >> >><br>

>>>> >> >> My proposal does help reducing rodata size significantly.<br>

>>>> >> ><br>

>>>> >> ><br>

>>>> >> > Yes, that is why I think that this is a useful thing to do. I just<br>

>>>> >> > want<br>

>>>> >> > to<br>

>>>> >> > be careful about existing use cases and the relevant workflow issues.<br>

>>>> >> ><br>

>>>> >> > -- Sean Silva<br>

>>>> >> ><br>

>>>> >> >><br>

>>>> >> >><br>

>>>> >> >> David<br>

>>>> >> >><br>

>>>> >> >><br>

>>>> >> >> > -- Sean Silva<br>

>>>> >> >> ><br>

>>>> >> >> >><br>

>>>> >> >> >><br>

>>>> >> >> >> In the raw format, md5 sum key can be an embedded field in the<br>

>>>> >> >> >> prf_data variable instead of as different var referenced by<br>

>>>> >> >> >> prf_data.<br>

>>>> >> >> >><br>

>>>> >> >> >> ><br>

>>>> >> >> >> > If this is not the case, you should show your current patch so<br>

>>>> >> >> >> > that<br>

>>>> >> >> >> > we<br>

>>>> >> >> >> > can<br>

>>>> >> >> >> > discuss things concretely.<br>

>>>> >> >> >><br>

>>>> >> >> >> It is not. See above about the difference.<br>

>>>> >> >> >><br>

>>>> >> >> >> ><br>

>>>> >> >> >> >><br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> It will be<br>

>>>> >> >> >> >> >> very messy to support multiple formats in instr-codegen and<br>

>>>> >> >> >> >> >> instr-runtime.  For compatibility concerns, the reader is<br>

>>>> >> >> >> >> >> taught<br>

>>>> >> >> >> >> >> to<br>

>>>> >> >> >> >> >> support previous format, but the changes there are isolated<br>

>>>> >> >> >> >> >> (also<br>

>>>> >> >> >> >> >> expected to be removed in the future).<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> ><br>

>>>> >> >> >> >> >> > My primary concern is that if the function name are not<br>

>>>> >> >> >> >> >> > kept<br>

>>>> >> >> >> >> >> > at<br>

>>>> >> >> >> >> >> > all<br>

>>>> >> >> >> >> >> > stages,<br>

>>>> >> >> >> >> >> > then it becomes difficult to analyze the profile data in<br>

>>>> >> >> >> >> >> > a<br>

>>>> >> >> >> >> >> > standalone<br>

>>>> >> >> >> >> >> > way.<br>

>>>> >> >> >> >> >> > Many times, I have used `llvm-profdata show<br>

>>>> >> >> >> >> >> > -all-functions<br>

>>>> >> >> >> >> >> > foo.profdata`<br>

>>>> >> >> >> >> >> > on<br>

>>>> >> >> >> >> >> > the resulting profile data and then imported that data<br>

>>>> >> >> >> >> >> > into<br>

>>>> >> >> >> >> >> > Mathematica<br>

>>>> >> >> >> >> >> > for<br>

>>>> >> >> >> >> >> > analysis.<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> This is certainly a very valid use case.<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> >My understanding of your proposal is that `llvm-profdata<br>

>>>> >> >> >> >> >> > show<br>

>>>> >> >> >> >> >> > -all-functions foo.profdata` will not show the actual<br>

>>>> >> >> >> >> >> > function<br>

>>>> >> >> >> >> >> > names<br>

>>>> >> >> >> >> >> > but<br>

>>>> >> >> >> >> >> > instead MD5 hashes,<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> Yes.<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> To support your use case, there are two solutions:<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> 1) user can add -fcoverage-mapping option in the build<br>

>>>> >> >> >> >> >> 2) introduce a new option -fprofile-instr-names that force<br>

>>>> >> >> >> >> >> the<br>

>>>> >> >> >> >> >> emission of the name sections in the .o file. This is<br>

>>>> >> >> >> >> >> similar<br>

>>>> >> >> >> >> >> to<br>

>>>> >> >> >> >> >> 1),<br>

>>>> >> >> >> >> >> but no covmap section is needed.<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> llvm-profdata tool  will be taught to read the name section<br>

>>>> >> >> >> >> >> and<br>

>>>> >> >> >> >> >> attach<br>

>>>> >> >> >> >> >> function names to the profile records.<br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> > Needing to pass the executable to llvm-profdata would cause<br>

>>>> >> >> >> >> > deployment<br>

>>>> >> >> >> >> > issues for my customers in practice.<br>

>>>> >> >> >> >><br>

>>>> >> >> >> >> Why? The deployment needs to pass the profile data anyway<br>

>>>> >> >> >> >> right?<br>

>>>> >> >> >> ><br>

>>>> >> >> >> ><br>

>>>> >> >> >> > Yes, but not the executable.<br>

>>>> >> >> >> ><br>

>>>> >> >> >> > The PGO training run is likely being run by a gameplay tester<br>

>>>> >> >> >> > (non-programmer). In general the binary will not be lying<br>

>>>> >> >> >> > around<br>

>>>> >> >> >> > as a<br>

>>>> >> >> >> > loose<br>

>>>> >> >> >> > file anywhere, it will be part of a full package of the<br>

>>>> >> >> >> > binary+assets<br>

>>>> >> >> >> > (think<br>

>>>> >> >> >> > like what will end up on a bluray disc). A game's binary<br>

>>>> >> >> >> > *completely<br>

>>>> >> >> >> > useless* without the assets, so except locally on a<br>

>>>> >> >> >> > programmer's<br>

>>>> >> >> >> > machine<br>

>>>> >> >> >> > while they iterate/debug, there is no reason for a binary to<br>

>>>> >> >> >> > ever<br>

>>>> >> >> >> > exist<br>

>>>> >> >> >> > as a<br>

>>>> >> >> >> > standalone file.<br>

>>>> >> >> >> ><br>

>>>> >> >> >> > I'm not saying that needing the binary is insurmountable in any<br>

>>>> >> >> >> > particular<br>

>>>> >> >> >> > scenario. Just that it will cause a strict increase in the<br>

>>>> >> >> >> > number<br>

>>>> >> >> >> > of<br>

>>>> >> >> >> > issues<br>

>>>> >> >> >> > to deploying PGO.<br>

>>>> >> >> >><br>

>>>> >> >> >>  Your concern is acknowledged.<br>

>>>> >> >> >><br>

>>>> >> >> >> ><br>

>>>> >> >> >> > These are much bigger "compatibility concerns" for me than for<br>

>>>> >> >> >> > newer<br>

>>>> >> >> >> > toolchains to accept the old format. For a change in format I<br>

>>>> >> >> >> > can<br>

>>>> >> >> >> > easily<br>

>>>> >> >> >> > tell my users to replace an exe with a newer one and that is<br>

>>>> >> >> >> > all<br>

>>>> >> >> >> > they<br>

>>>> >> >> >> > need<br>

>>>> >> >> >> > to do and it takes 10 seconds, guaranteed. A workflow change is<br>

>>>> >> >> >> > potentially<br>

>>>> >> >> >> > a massive disruption and guaranteed to take more than 10<br>

>>>> >> >> >> > seconds<br>

>>>> >> >> >> > to<br>

>>>> >> >> >> > fix<br>

>>>> >> >> >> > (perhaps hours or days).<br>

>>>> >> >> >><br>

>>>> >> >> >> ok.<br>

>>>> >> >> >><br>

>>>> >> >> >> ><br>

>>>> >> >> >> ><br>

>>>> >> >> >> >><br>

>>>> >> >> >> >>   This<br>

>>>> >> >> >> >> is no different from llvm-cov usage model.<br>

>>>> >> >> >> ><br>

>>>> >> >> >> ><br>

>>>> >> >> >> > In practice, getting the performance of PGO is a higher<br>

>>>> >> >> >> > priority<br>

>>>> >> >> >> > for<br>

>>>> >> >> >> > my<br>

>>>> >> >> >> > users, so we should not assume that llvm-cov is being used.<br>

>>>> >> >> >><br>

>>>> >> >> >> Glad to hear that :)<br>

>>>> >> >> >><br>

>>>> >> >> >> thanks,<br>

>>>> >> >> >><br>

>>>> >> >> >> David<br>

>>>> >> >> >><br>

>>>> >> >> >> ><br>

>>>> >> >> >> > -- Sean Silva<br>

>>>> >> >> >> ><br>

>>>> >> >> >> >><br>

>>>> >> >> >> >><br>

>>>> >> >> >> >> David<br>

>>>> >> >> >> >><br>

>>>> >> >> >> >><br>

>>>> >> >> >> >><br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> Note that with 1) or 2), the user can still benefit from<br>

>>>> >> >> >> >> >> the<br>

>>>> >> >> >> >> >> reduced<br>

>>>> >> >> >> >> >> profile size.<br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> > Let me reiterate that the size of the profile is not a<br>

>>>> >> >> >> >> > problem<br>

>>>> >> >> >> >> > I<br>

>>>> >> >> >> >> > have<br>

>>>> >> >> >> >> > observed in practice (nor have I heard of this being a<br>

>>>> >> >> >> >> > problem<br>

>>>> >> >> >> >> > in<br>

>>>> >> >> >> >> > practice<br>

>>>> >> >> >> >> > until this thread). Therefore I'm skeptical of any changes<br>

>>>> >> >> >> >> > to<br>

>>>> >> >> >> >> > our<br>

>>>> >> >> >> >> > default<br>

>>>> >> >> >> >> > behavior or any new requirements that are not opt-in.<br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> > -- Sean Silva<br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> thanks,<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> David<br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >><br>

>>>> >> >> >> >> >> >which will make it more difficult for me to do this kind<br>

>>>> >> >> >> >> >> > of analysis (would require using nm on the original<br>

>>>> >> >> >> >> >> > binary,<br>

>>>> >> >> >> >> >> > hashing<br>

>>>> >> >> >> >> >> > everything, etc.).<br>

>>>> >> >> >> >> >> ><br>

>>>> >> >> >> >> >> > btw, feel free to attach the patch even if it in a rough<br>

>>>> >> >> >> >> >> > state.<br>

>>>> >> >> >> >> >> > It<br>

>>>> >> >> >> >> >> > can<br>

>>>> >> >> >> >> >> > still<br>

>>>> >> >> >> >> >> > help to clarify the proposal and be a good talking point.<br>

>>>> >> >> >> >> >> > Fine-grained<br>

>>>> >> >> >> >> >> > patch<br>

>>>> >> >> >> >> >> > review for caring about the rough parts will happen on<br>

>>>> >> >> >> >> >> > llvm-commits;<br>

>>>> >> >> >> >> >> > the<br>

>>>> >> >> >> >> >> > rough parts will not distract the discussion here on<br>

>>>> >> >> >> >> >> > llvm-dev.<br>

>>>> >> >> >> >> >> ><br>

>>>> >> >> >> >> >> > -- Sean Silva<br>

>>>> >> >> >> >> >> ><br>

>>>> >> >> >> >> >> >><br>

>>>> >> >> >> >> >> >><br>

>>>> >> >> >> >> >> >> thanks,<br>

>>>> >> >> >> >> >> >><br>

>>>> >> >> >> >> >> >> David<br>

>>>> >> >> >> >> >> >> _______________________________________________<br>

>>>> >> >> >> >> >> >> LLVM Developers mailing list<br>

>>>> >> >> >> >> >> >> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

>>>> >> >> >> >> >> >> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>>>> >> >> >> >> >> ><br>

>>>> >> >> >> >> >> ><br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> >> ><br>

>>>> >> >> >> ><br>

>>>> >> >> >> ><br>

>>>> >> >> ><br>

>>>> >> >> ><br>

>>>> >> ><br>

>>>> >> ><br>

>>>> ><br>

>>>> ><br>

>>><br>

>>><br>

</div></div></blockquote></div><br></div></div>