[llvm-dev] RFC: Reducing Instr PGO size overhead

Mon Dec 14 10:08:13 PST 2015

Great — thanks for confirming that.

> On Dec 14, 2015, at 9:31 AM, Xinliang David Li <davidxl at google.com> wrote:
> 
> Sorry about the typo  -- I meant indexed format version won't be changed.
> 
> thanks,
> 
> David
> 
> On Mon, Dec 14, 2015 at 8:55 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>> 
>>> On Dec 9, 2015, at 1:12 PM, Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> We are now very close to push the final stage of the PGO size
>>> reduction task. Here is the updated plan going forward:
>>> 
>>> 1) In this round, the format of the indexed profile data won't be unchanged.
>> 
>> That’s a double negative. Was that intentional? I.E., are you changing the format?
>> 
>>> 2) there will be *no* changes in user interfaces to all profiling
>>> related tools including llvm-profdata, llvm-cov -- the change will be
>>> transparent in terms of PGO usability.
>>> 3) The implementation will be using  compression for the function name
>>> section (the compression ratio is about 5:1). As a result, the linker
>>> input object size, unstripped binary size, stripped binary size,
>>> process image size, and raw profile data size will all be greatly
>>> reduced;
>>> 4) The change will also greatly improve usability of coverage-mapping
>>> due to the reduced data size in memory.
>>> 
>>> Before the final patch, there are a few more small preparation patches
>>> : 1) abstract naming reading into a class (ProfSymtab) (currently the
>>> reader uses/assumes the raw/uncompressed object section. 2) add
>>> compression support in ProfSymtab.
>>> 
>>> thanks,
>>> 
>>> David
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Oct 14, 2015 at 12:30 AM, Xinliang David Li <davidxl at google.com> wrote:
>>>> I plan to divide the patch into series of small patches. The
>>>> preparation patches will mostly be refactoring changes with NFC. This
>>>> will minimize the size of final patch(es) with functional changes to
>>>> help discussions.
>>>> 
>>>> thanks,
>>>> 
>>>> David
>>>> 
>>>> On Fri, Oct 9, 2015 at 4:06 PM, Xinliang David Li <davidxl at google.com> wrote:
>>>>> On Fri, Oct 9, 2015 at 3:58 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 11:12 PM, Xinliang David Li <davidxl at google.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> There is no further response to this, so I will assume general
>>>>>>> direction of solution-3 is acceptable ;)
>>>>> 
>>>>>> No response does not mean "LGTM".
>>>>>> 
>>>>> 
>>>>> What I meant is that the discussion can be moved on to the formal code
>>>>> review. I have not yet submitted the final patch for review yet.
>>>>> Before that is ready, continue using this thread to voice your
>>>>> concerns.
>>>>> 
>>>>> David
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> -- Sean Silva
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Solution-3 can be further improved. Instead of using static symbol
>>>>>>> table (with zero size __llvm_prf_nm symbols) to store function names
>>>>>>> for profile display and coverage mapping, the function names can be
>>>>>>> stored compressed in a non-allocatable section. The compression ratio
>>>>>>> for function name strings can be very high (~5x).  The covmapping data
>>>>>>> can also be made non-allocatable.
>>>>>>> 
>>>>>>> thanks,
>>>>>>> 
>>>>>>> David
>>>>>>> 
>>>>>>> On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li <davidxl at google.com>
>>>>>>> wrote:
>>>>>>>> Sorry for the late update. I finally found time to implement a solution
>>>>>>>> (Solution-3) that has the best size savings (for both PGO and coverage
>>>>>>>> testing) with symbolic information available.  Here is a re-cap of what
>>>>>>>> we
>>>>>>>> have discussed so far:
>>>>>>>> 
>>>>>>>> Solution-1:
>>>>>>>> 
>>>>>>>> This is the original solution proposed. In this solution, A function
>>>>>>>> name's
>>>>>>>> MD5 hash is used as the primary key (combined with IR hash) for function
>>>>>>>> lookup. __llvm_prf_names section won't be emitted into the binary nor
>>>>>>>> dumped
>>>>>>>> into the profile data unless -fcoverage-mapping is also specified.
>>>>>>>> 
>>>>>>>>   Pros:
>>>>>>>>       1. maximal reduction of instrumentation binary process image
>>>>>>>> size
>>>>>>>>       2. maximal reduction of object and unstripped binary size
>>>>>>>>       3. maximal reduction of raw profile data size
>>>>>>>>       4. maximal reduction of indexed profile data size
>>>>>>>> 
>>>>>>>>  Cons:
>>>>>>>>       1. -fcoverage-mapping use case is not handled -- the size
>>>>>>>> problem
>>>>>>>> still exist
>>>>>>>>       2. profile dump with llvm-profdata no longer have function names
>>>>>>>> associated -- user needs to use postprocessing tool to get the
>>>>>>>> functionality
>>>>>>>>       3. function filtering with partial function name is not
>>>>>>>> supported
>>>>>>>>       4. Requires incompatible format change
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Solution-2: (http://reviews.llvm.org/D12715)
>>>>>>>> 
>>>>>>>> In this solution, the MD5 hash string is used to replace the raw name
>>>>>>>> string
>>>>>>>>     Pros:
>>>>>>>>         1. Very simple to implement
>>>>>>>>         2. have good reduction of all sizes for typical large C++
>>>>>>>> applications
>>>>>>>>         3. No profile data format change is required.
>>>>>>>> 
>>>>>>>>     Cons:
>>>>>>>>         1. Still requires 16byte overhead per-function -- can actually
>>>>>>>> hurt C programs
>>>>>>>>         2. -fcoverage-mapping use case is still not handled
>>>>>>>>         3. The problem with llvm-profdata still exists (no symbolic
>>>>>>>> info,
>>>>>>>> partial filtering support)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Solution-3:
>>>>>>>> 
>>>>>>>> This is the new solution I am proposing. It is basically an enhancement
>>>>>>>> of
>>>>>>>> Solution-1 with most of the weakness resolved.  The difference with
>>>>>>>> Solution-1 is
>>>>>>>>     1. Function name symbols are emitted into the symbol table as weak
>>>>>>>> externs. They don't occupy any space at runtime and can be easily
>>>>>>>> stripped.
>>>>>>>>     2. -fcoverage-mapping does not need special handling -- it
>>>>>>>> automatically benefit from the same size saving.
>>>>>>>>     3. llvm-cov is changed to read symbol info from the symtab instead
>>>>>>>> of
>>>>>>>> reading them from the name section data
>>>>>>>>     4. llvm-profdata is enhanced to take a binary as input and dump
>>>>>>>> profile with names attached. Function filtering is fully supported
>>>>>>>> (option
>>>>>>>> can also be introduced to force dumping names into binary and profile
>>>>>>>> data,
>>>>>>>> so that llvm-profdata use case is not changed at all).
>>>>>>>> 
>>>>>>>> Pros:
>>>>>>>>      1. All the pros from Solution-1
>>>>>>>>      2. Size savings for coverage-mapping case
>>>>>>>> Cons:
>>>>>>>>      Format change is required for profile data and coverage mapping.
>>>>>>>> 
>>>>>>>> The initial patch is here: http://reviews.llvm.org/D13251
>>>>>>>> 
>>>>>>>> With this patch, the size of a release clang binary with coverage
>>>>>>>> mapping is
>>>>>>>> reduced from 986M to 569M.
>>>>>>>> 
>>>>>>>> If there are no major concerns,  I will carve out the patch into smaller
>>>>>>>> ones for review.
>>>>>>>> 
>>>>>>>> thanks,
>>>>>>>> 
>>>>>>>> David
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David Li <davidxl at google.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> yes -- it is fixed length (8byte) blob which may include null
>>>>>>>>>>>>> byte
>>>>>>>>>>>>> in
>>>>>>>>>>>>> the middle.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> For reference, MD5 sum is 16 bytes (128-bit):
>>>>>>>>>>>> https://en.wikipedia.org/wiki/MD5
>>>>>>>>>>> 
>>>>>>>>>>> yes, LLVM's MD5 hash only takes the lower 64bit.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Or to say it another way, suppose that Itanium mangling
>>>>>>>>>>>>>> required
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> final
>>>>>>>>>>>>>> step to replace the string with its md5 sum in hex. Therefore
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> symbol
>>>>>>>>>>>>>> names are "small". My understanding is that this is effectively
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> your
>>>>>>>>>>>>>> patch is doing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The key type before the change is StringRef, while the after the
>>>>>>>>>>>>> key
>>>>>>>>>>>>> type is uint64_t. Are you suggesting treating uint64_t md5 sum
>>>>>>>>>>>>> key
>>>>>>>>>>>>> as
>>>>>>>>>>>>> a string of 8 bytes or storing md5 has in text form which will
>>>>>>>>>>>>> double
>>>>>>>>>>>>> the size?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> How much does this change the benefit? If most of the benefit is
>>>>>>>>>>>> avoiding
>>>>>>>>>>>> extraordinarily long mangled names then it may be sufficient.
>>>>>>>>>>>> 
>>>>>>>>>>>> With IR-level instrumentation like Rong is pursuing the size may
>>>>>>>>>>>> be
>>>>>>>>>>>> reduced
>>>>>>>>>>>> sufficiently that we do not need the optimization proposed in this
>>>>>>>>>>>> thread.
>>>>>>>>>>>> For example, Rong found >2x size reduction on Google's C++
>>>>>>>>>>>> benchmarks,
>>>>>>>>>>>> which
>>>>>>>>>>>> I assume are representative of the extremely large Google binaries
>>>>>>>>>>>> that
>>>>>>>>>>>> are
>>>>>>>>>>>> causing the problems addressed by your proposal in this thread.
>>>>>>>>>>>> The
>>>>>>>>>>>> measurements you mention for Clang in this thread provide similar
>>>>>>>>>>>> size
>>>>>>>>>>>> reductions, so Rong's approach may be sufficient (especially
>>>>>>>>>>>> because
>>>>>>>>>>>> functions with extremely large mangled names tend to be small
>>>>>>>>>>>> inline
>>>>>>>>>>>> functions in header-only template libraries).
>>>>>>>>>>> 
>>>>>>>>>>> Late instrumentation helps many cases. In some cases (as shown in
>>>>>>>>>>> SPEC), the reduction in size is not as large. Reducing PGO overhead
>>>>>>>>>>> will lower the bar for its adoption.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Of the points you mention in "Large size of overhead can limit the
>>>>>>>>>>>> usability
>>>>>>>>>>>> of PGO greatly", many of the issues are hard limits that prevent
>>>>>>>>>>>> the
>>>>>>>>>>>> use
>>>>>>>>>>>> of
>>>>>>>>>>>> PGO. Do you have a lower bound on how much the size of the PGO
>>>>>>>>>>>> data
>>>>>>>>>>>> must
>>>>>>>>>>>> be
>>>>>>>>>>>> reduced in order to overcome the hard limits?
>>>>>>>>>>> 
>>>>>>>>>>> This is a static view:  Think about the situation where application
>>>>>>>>>>> size is ever increasing; also think about situation where we want to
>>>>>>>>>>> 
>>>>>>>>>>> collect more types of profile data. Think about situation where user
>>>>>>>>>>> want to run pgo binaries on small devices with tiny memory/storage
>>>>>>>>>>> ..
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If we want to reduce memory overhead at runtime and reduce the size
>>>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>> raw profile data extracted from the target, there are clear
>>>>>>>>>> solutions.
>>>>>>>>>> Consider that debug info does not need to be loaded into the memory
>>>>>>>>>> image of
>>>>>>>>>> the target; why should information identifying each counter need to
>>>>>>>>>> be?
>>>>>>>>>> A file containing raw profile counters is a subset of a core dump; in
>>>>>>>>>> most
>>>>>>>>>> environments, a core dump does not need to have debug info or symbol
>>>>>>>>>> names
>>>>>>>>>> in it, but can be still be read in full detail in conjunction with
>>>>>>>>>> the
>>>>>>>>>> original binary.
>>>>>>>>> 
>>>>>>>>> Yes -- there are many alternatives:
>>>>>>>>> 1) emit the name key mapping as a side data at compile time, or
>>>>>>>>> 2) emit them into nonloadable sections of the object file.
>>>>>>>>> 
>>>>>>>>> Compared with the above, LLVM's existing design does have its own
>>>>>>>>> advantage -- making it easier for tool to access 'debug' info for
>>>>>>>>> counters.
>>>>>>>>> 
>>>>>>>>> LLVM's coverage testing, on the other hand, take a hybrid approach: It
>>>>>>>>> emits the coverage map as rodata, but does not pass it to the profile
>>>>>>>>> dumper. I think it is better to emit covmap as a side data not
>>>>>>>>> attached to target binary.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thus, as we require that the binary be passed to llvm-profdata, there
>>>>>>>>>> is
>>>>>>>>>> no
>>>>>>>>>> fundamental reason that the memory image of the program, or the raw
>>>>>>>>>> data
>>>>>>>>>> extracted from the program, must have any size overhead besides the
>>>>>>>>>> raw
>>>>>>>>>> values of the counters themselves and any text size increase for
>>>>>>>>>> incrementing them. If we are willing to impose this requirement on
>>>>>>>>>> users,
>>>>>>>>>> then as far as reducing memory overhead at runtime and reducing the
>>>>>>>>>> size
>>>>>>>>>> of
>>>>>>>>>> the raw profile data extracted from the target, using hashed function
>>>>>>>>>> names
>>>>>>>>>> is clearly the wrong direction.
>>>>>>>>>> 
>>>>>>>>>> *Without* imposing the requirement of passing the binary to
>>>>>>>>>> llvm-profdata, I
>>>>>>>>>> do like the ability to use hashed function names like you are
>>>>>>>>>> proposing.
>>>>>>>>>> It
>>>>>>>>>> is a simple solution for reducing size overhead of function name
>>>>>>>>>> strings
>>>>>>>>>> with little complexity, as it is just swapping one string for
>>>>>>>>>> another.
>>>>>>>>> 
>>>>>>>>> Agree. The good news is that the overhead of hashed function names is
>>>>>>>>> small enough that makes this approach attractive.
>>>>>>>>> 
>>>>>>>>> thanks,
>>>>>>>>> 
>>>>>>>>> David
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Obviously LLVM must be able to support the extremely large
>>>>>>>>>>>> binaries
>>>>>>>>>>>> in
>>>>>>>>>>>> your
>>>>>>>>>>>> configuration (otherwise what use is LLVM as a compiler ;) My
>>>>>>>>>>>> questions
>>>>>>>>>>>> are
>>>>>>>>>>>> primarily aimed at establishing which tradeoffs are acceptable for
>>>>>>>>>>>> supporting this (both for LLVM and for you guys).
>>>>>>>>>>> 
>>>>>>>>>>> As I said, with the modified proposal (after getting your feedback),
>>>>>>>>>>> no PGO users in LLVM land is going to lose anything/functionality.
>>>>>>>>>>> The
>>>>>>>>>>> end result will be net win for general users of LLVM (even though
>>>>>>>>>>> your
>>>>>>>>>>> customers don't care about it), not just 'us' as you have mentioned
>>>>>>>>>>> many times.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Btw, for us, the issue of PGO data size is not completely
>>>>>>>>>>>> immaterial
>>>>>>>>>>>> but
>>>>>>>>>>>> is
>>>>>>>>>>>> very different from your use case. For us, the primary issue is
>>>>>>>>>>>> the
>>>>>>>>>>>> additional memory use at run time, since PS4 games usually use
>>>>>>>>>>>> "all"
>>>>>>>>>>>> available memory. We had a problem with UBSan where the large
>>>>>>>>>>>> amount
>>>>>>>>>>>> of
>>>>>>>>>>>> memory required for storing the UBSan diagnostic data at runtime
>>>>>>>>>>>> required
>>>>>>>>>>>> the game programmers to manually change their memory map to make
>>>>>>>>>>>> room.
>>>>>>>>>>>> +Filipe, do you remember how much memory UBSan was using that
>>>>>>>>>>>> caused
>>>>>>>>>>>> a
>>>>>>>>>>>> problem?
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> My proposal does help reducing rodata size significantly.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Yes, that is why I think that this is a useful thing to do. I just
>>>>>>>>>> want
>>>>>>>>>> to
>>>>>>>>>> be careful about existing use cases and the relevant workflow issues.
>>>>>>>>>> 
>>>>>>>>>> -- Sean Silva
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> David
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In the raw format, md5 sum key can be an embedded field in the
>>>>>>>>>>>>> prf_data variable instead of as different var referenced by
>>>>>>>>>>>>> prf_data.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If this is not the case, you should show your current patch so
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> discuss things concretely.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It is not. See above about the difference.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> It will be
>>>>>>>>>>>>>>>>> very messy to support multiple formats in instr-codegen and
>>>>>>>>>>>>>>>>> instr-runtime.  For compatibility concerns, the reader is
>>>>>>>>>>>>>>>>> taught
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> support previous format, but the changes there are isolated
>>>>>>>>>>>>>>>>> (also
>>>>>>>>>>>>>>>>> expected to be removed in the future).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> My primary concern is that if the function name are not
>>>>>>>>>>>>>>>>>> kept
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> stages,
>>>>>>>>>>>>>>>>>> then it becomes difficult to analyze the profile data in
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> standalone
>>>>>>>>>>>>>>>>>> way.
>>>>>>>>>>>>>>>>>> Many times, I have used `llvm-profdata show
>>>>>>>>>>>>>>>>>> -all-functions
>>>>>>>>>>>>>>>>>> foo.profdata`
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> the resulting profile data and then imported that data
>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>> Mathematica
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> analysis.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> This is certainly a very valid use case.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> My understanding of your proposal is that `llvm-profdata
>>>>>>>>>>>>>>>>>> show
>>>>>>>>>>>>>>>>>> -all-functions foo.profdata` will not show the actual
>>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> instead MD5 hashes,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> To support your use case, there are two solutions:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1) user can add -fcoverage-mapping option in the build
>>>>>>>>>>>>>>>>> 2) introduce a new option -fprofile-instr-names that force
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> emission of the name sections in the .o file. This is
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> 1),
>>>>>>>>>>>>>>>>> but no covmap section is needed.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> llvm-profdata tool  will be taught to read the name section
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> attach
>>>>>>>>>>>>>>>>> function names to the profile records.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Needing to pass the executable to llvm-profdata would cause
>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>> issues for my customers in practice.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Why? The deployment needs to pass the profile data anyway
>>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, but not the executable.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The PGO training run is likely being run by a gameplay tester
>>>>>>>>>>>>>> (non-programmer). In general the binary will not be lying
>>>>>>>>>>>>>> around
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> loose
>>>>>>>>>>>>>> file anywhere, it will be part of a full package of the
>>>>>>>>>>>>>> binary+assets
>>>>>>>>>>>>>> (think
>>>>>>>>>>>>>> like what will end up on a bluray disc). A game's binary
>>>>>>>>>>>>>> *completely
>>>>>>>>>>>>>> useless* without the assets, so except locally on a
>>>>>>>>>>>>>> programmer's
>>>>>>>>>>>>>> machine
>>>>>>>>>>>>>> while they iterate/debug, there is no reason for a binary to
>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>> exist
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> standalone file.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm not saying that needing the binary is insurmountable in any
>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>> scenario. Just that it will cause a strict increase in the
>>>>>>>>>>>>>> number
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>> to deploying PGO.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Your concern is acknowledged.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> These are much bigger "compatibility concerns" for me than for
>>>>>>>>>>>>>> newer
>>>>>>>>>>>>>> toolchains to accept the old format. For a change in format I
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>> tell my users to replace an exe with a newer one and that is
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> they
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>> to do and it takes 10 seconds, guaranteed. A workflow change is
>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>> a massive disruption and guaranteed to take more than 10
>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> fix
>>>>>>>>>>>>>> (perhaps hours or days).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ok.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>> is no different from llvm-cov usage model.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In practice, getting the performance of PGO is a higher
>>>>>>>>>>>>>> priority
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> my
>>>>>>>>>>>>>> users, so we should not assume that llvm-cov is being used.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Glad to hear that :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> David
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Note that with 1) or 2), the user can still benefit from
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> reduced
>>>>>>>>>>>>>>>>> profile size.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Let me reiterate that the size of the profile is not a
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> observed in practice (nor have I heard of this being a
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> practice
>>>>>>>>>>>>>>>> until this thread). Therefore I'm skeptical of any changes
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>> default
>>>>>>>>>>>>>>>> behavior or any new requirements that are not opt-in.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> which will make it more difficult for me to do this kind
>>>>>>>>>>>>>>>>>> of analysis (would require using nm on the original
>>>>>>>>>>>>>>>>>> binary,
>>>>>>>>>>>>>>>>>> hashing
>>>>>>>>>>>>>>>>>> everything, etc.).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> btw, feel free to attach the patch even if it in a rough
>>>>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> help to clarify the proposal and be a good talking point.
>>>>>>>>>>>>>>>>>> Fine-grained
>>>>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>>>> review for caring about the rough parts will happen on
>>>>>>>>>>>>>>>>>> llvm-commits;
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> rough parts will not distract the discussion here on
>>>>>>>>>>>>>>>>>> llvm-dev.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>