[llvm-dev] RFC: Reducing Instr PGO size overhead
Bob Wilson via llvm-dev
llvm-dev at lists.llvm.org
Mon Dec 14 10:08:13 PST 2015
Great — thanks for confirming that.
> On Dec 14, 2015, at 9:31 AM, Xinliang David Li <davidxl at google.com> wrote:
>
> Sorry about the typo -- I meant indexed format version won't be changed.
>
> thanks,
>
> David
>
> On Mon, Dec 14, 2015 at 8:55 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>>
>>> On Dec 9, 2015, at 1:12 PM, Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>
>>> We are now very close to push the final stage of the PGO size
>>> reduction task. Here is the updated plan going forward:
>>>
>>> 1) In this round, the format of the indexed profile data won't be unchanged.
>>
>> That’s a double negative. Was that intentional? I.E., are you changing the format?
>>
>>> 2) there will be *no* changes in user interfaces to all profiling
>>> related tools including llvm-profdata, llvm-cov -- the change will be
>>> transparent in terms of PGO usability.
>>> 3) The implementation will be using compression for the function name
>>> section (the compression ratio is about 5:1). As a result, the linker
>>> input object size, unstripped binary size, stripped binary size,
>>> process image size, and raw profile data size will all be greatly
>>> reduced;
>>> 4) The change will also greatly improve usability of coverage-mapping
>>> due to the reduced data size in memory.
>>>
>>> Before the final patch, there are a few more small preparation patches
>>> : 1) abstract naming reading into a class (ProfSymtab) (currently the
>>> reader uses/assumes the raw/uncompressed object section. 2) add
>>> compression support in ProfSymtab.
>>>
>>> thanks,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Oct 14, 2015 at 12:30 AM, Xinliang David Li <davidxl at google.com> wrote:
>>>> I plan to divide the patch into series of small patches. The
>>>> preparation patches will mostly be refactoring changes with NFC. This
>>>> will minimize the size of final patch(es) with functional changes to
>>>> help discussions.
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>> On Fri, Oct 9, 2015 at 4:06 PM, Xinliang David Li <davidxl at google.com> wrote:
>>>>> On Fri, Oct 9, 2015 at 3:58 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 11:12 PM, Xinliang David Li <davidxl at google.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> There is no further response to this, so I will assume general
>>>>>>> direction of solution-3 is acceptable ;)
>>>>>
>>>>>> No response does not mean "LGTM".
>>>>>>
>>>>>
>>>>> What I meant is that the discussion can be moved on to the formal code
>>>>> review. I have not yet submitted the final patch for review yet.
>>>>> Before that is ready, continue using this thread to voice your
>>>>> concerns.
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -- Sean Silva
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Solution-3 can be further improved. Instead of using static symbol
>>>>>>> table (with zero size __llvm_prf_nm symbols) to store function names
>>>>>>> for profile display and coverage mapping, the function names can be
>>>>>>> stored compressed in a non-allocatable section. The compression ratio
>>>>>>> for function name strings can be very high (~5x). The covmapping data
>>>>>>> can also be made non-allocatable.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On Tue, Sep 29, 2015 at 9:52 AM, Xinliang David Li <davidxl at google.com>
>>>>>>> wrote:
>>>>>>>> Sorry for the late update. I finally found time to implement a solution
>>>>>>>> (Solution-3) that has the best size savings (for both PGO and coverage
>>>>>>>> testing) with symbolic information available. Here is a re-cap of what
>>>>>>>> we
>>>>>>>> have discussed so far:
>>>>>>>>
>>>>>>>> Solution-1:
>>>>>>>>
>>>>>>>> This is the original solution proposed. In this solution, A function
>>>>>>>> name's
>>>>>>>> MD5 hash is used as the primary key (combined with IR hash) for function
>>>>>>>> lookup. __llvm_prf_names section won't be emitted into the binary nor
>>>>>>>> dumped
>>>>>>>> into the profile data unless -fcoverage-mapping is also specified.
>>>>>>>>
>>>>>>>> Pros:
>>>>>>>> 1. maximal reduction of instrumentation binary process image
>>>>>>>> size
>>>>>>>> 2. maximal reduction of object and unstripped binary size
>>>>>>>> 3. maximal reduction of raw profile data size
>>>>>>>> 4. maximal reduction of indexed profile data size
>>>>>>>>
>>>>>>>> Cons:
>>>>>>>> 1. -fcoverage-mapping use case is not handled -- the size
>>>>>>>> problem
>>>>>>>> still exist
>>>>>>>> 2. profile dump with llvm-profdata no longer have function names
>>>>>>>> associated -- user needs to use postprocessing tool to get the
>>>>>>>> functionality
>>>>>>>> 3. function filtering with partial function name is not
>>>>>>>> supported
>>>>>>>> 4. Requires incompatible format change
>>>>>>>>
>>>>>>>>
>>>>>>>> Solution-2: (http://reviews.llvm.org/D12715)
>>>>>>>>
>>>>>>>> In this solution, the MD5 hash string is used to replace the raw name
>>>>>>>> string
>>>>>>>> Pros:
>>>>>>>> 1. Very simple to implement
>>>>>>>> 2. have good reduction of all sizes for typical large C++
>>>>>>>> applications
>>>>>>>> 3. No profile data format change is required.
>>>>>>>>
>>>>>>>> Cons:
>>>>>>>> 1. Still requires 16byte overhead per-function -- can actually
>>>>>>>> hurt C programs
>>>>>>>> 2. -fcoverage-mapping use case is still not handled
>>>>>>>> 3. The problem with llvm-profdata still exists (no symbolic
>>>>>>>> info,
>>>>>>>> partial filtering support)
>>>>>>>>
>>>>>>>>
>>>>>>>> Solution-3:
>>>>>>>>
>>>>>>>> This is the new solution I am proposing. It is basically an enhancement
>>>>>>>> of
>>>>>>>> Solution-1 with most of the weakness resolved. The difference with
>>>>>>>> Solution-1 is
>>>>>>>> 1. Function name symbols are emitted into the symbol table as weak
>>>>>>>> externs. They don't occupy any space at runtime and can be easily
>>>>>>>> stripped.
>>>>>>>> 2. -fcoverage-mapping does not need special handling -- it
>>>>>>>> automatically benefit from the same size saving.
>>>>>>>> 3. llvm-cov is changed to read symbol info from the symtab instead
>>>>>>>> of
>>>>>>>> reading them from the name section data
>>>>>>>> 4. llvm-profdata is enhanced to take a binary as input and dump
>>>>>>>> profile with names attached. Function filtering is fully supported
>>>>>>>> (option
>>>>>>>> can also be introduced to force dumping names into binary and profile
>>>>>>>> data,
>>>>>>>> so that llvm-profdata use case is not changed at all).
>>>>>>>>
>>>>>>>> Pros:
>>>>>>>> 1. All the pros from Solution-1
>>>>>>>> 2. Size savings for coverage-mapping case
>>>>>>>> Cons:
>>>>>>>> Format change is required for profile data and coverage mapping.
>>>>>>>>
>>>>>>>> The initial patch is here: http://reviews.llvm.org/D13251
>>>>>>>>
>>>>>>>> With this patch, the size of a release clang binary with coverage
>>>>>>>> mapping is
>>>>>>>> reduced from 986M to 569M.
>>>>>>>>
>>>>>>>> If there are no major concerns, I will carve out the patch into smaller
>>>>>>>> ones for review.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 8, 2015 at 3:47 PM, Xinliang David Li <davidxl at google.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> yes -- it is fixed length (8byte) blob which may include null
>>>>>>>>>>>>> byte
>>>>>>>>>>>>> in
>>>>>>>>>>>>> the middle.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For reference, MD5 sum is 16 bytes (128-bit):
>>>>>>>>>>>> https://en.wikipedia.org/wiki/MD5
>>>>>>>>>>>
>>>>>>>>>>> yes, LLVM's MD5 hash only takes the lower 64bit.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Or to say it another way, suppose that Itanium mangling
>>>>>>>>>>>>>> required
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> final
>>>>>>>>>>>>>> step to replace the string with its md5 sum in hex. Therefore
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> symbol
>>>>>>>>>>>>>> names are "small". My understanding is that this is effectively
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> your
>>>>>>>>>>>>>> patch is doing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The key type before the change is StringRef, while the after the
>>>>>>>>>>>>> key
>>>>>>>>>>>>> type is uint64_t. Are you suggesting treating uint64_t md5 sum
>>>>>>>>>>>>> key
>>>>>>>>>>>>> as
>>>>>>>>>>>>> a string of 8 bytes or storing md5 has in text form which will
>>>>>>>>>>>>> double
>>>>>>>>>>>>> the size?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> How much does this change the benefit? If most of the benefit is
>>>>>>>>>>>> avoiding
>>>>>>>>>>>> extraordinarily long mangled names then it may be sufficient.
>>>>>>>>>>>>
>>>>>>>>>>>> With IR-level instrumentation like Rong is pursuing the size may
>>>>>>>>>>>> be
>>>>>>>>>>>> reduced
>>>>>>>>>>>> sufficiently that we do not need the optimization proposed in this
>>>>>>>>>>>> thread.
>>>>>>>>>>>> For example, Rong found >2x size reduction on Google's C++
>>>>>>>>>>>> benchmarks,
>>>>>>>>>>>> which
>>>>>>>>>>>> I assume are representative of the extremely large Google binaries
>>>>>>>>>>>> that
>>>>>>>>>>>> are
>>>>>>>>>>>> causing the problems addressed by your proposal in this thread.
>>>>>>>>>>>> The
>>>>>>>>>>>> measurements you mention for Clang in this thread provide similar
>>>>>>>>>>>> size
>>>>>>>>>>>> reductions, so Rong's approach may be sufficient (especially
>>>>>>>>>>>> because
>>>>>>>>>>>> functions with extremely large mangled names tend to be small
>>>>>>>>>>>> inline
>>>>>>>>>>>> functions in header-only template libraries).
>>>>>>>>>>>
>>>>>>>>>>> Late instrumentation helps many cases. In some cases (as shown in
>>>>>>>>>>> SPEC), the reduction in size is not as large. Reducing PGO overhead
>>>>>>>>>>> will lower the bar for its adoption.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Of the points you mention in "Large size of overhead can limit the
>>>>>>>>>>>> usability
>>>>>>>>>>>> of PGO greatly", many of the issues are hard limits that prevent
>>>>>>>>>>>> the
>>>>>>>>>>>> use
>>>>>>>>>>>> of
>>>>>>>>>>>> PGO. Do you have a lower bound on how much the size of the PGO
>>>>>>>>>>>> data
>>>>>>>>>>>> must
>>>>>>>>>>>> be
>>>>>>>>>>>> reduced in order to overcome the hard limits?
>>>>>>>>>>>
>>>>>>>>>>> This is a static view: Think about the situation where application
>>>>>>>>>>> size is ever increasing; also think about situation where we want to
>>>>>>>>>>>
>>>>>>>>>>> collect more types of profile data. Think about situation where user
>>>>>>>>>>> want to run pgo binaries on small devices with tiny memory/storage
>>>>>>>>>>> ..
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If we want to reduce memory overhead at runtime and reduce the size
>>>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>> raw profile data extracted from the target, there are clear
>>>>>>>>>> solutions.
>>>>>>>>>> Consider that debug info does not need to be loaded into the memory
>>>>>>>>>> image of
>>>>>>>>>> the target; why should information identifying each counter need to
>>>>>>>>>> be?
>>>>>>>>>> A file containing raw profile counters is a subset of a core dump; in
>>>>>>>>>> most
>>>>>>>>>> environments, a core dump does not need to have debug info or symbol
>>>>>>>>>> names
>>>>>>>>>> in it, but can be still be read in full detail in conjunction with
>>>>>>>>>> the
>>>>>>>>>> original binary.
>>>>>>>>>
>>>>>>>>> Yes -- there are many alternatives:
>>>>>>>>> 1) emit the name key mapping as a side data at compile time, or
>>>>>>>>> 2) emit them into nonloadable sections of the object file.
>>>>>>>>>
>>>>>>>>> Compared with the above, LLVM's existing design does have its own
>>>>>>>>> advantage -- making it easier for tool to access 'debug' info for
>>>>>>>>> counters.
>>>>>>>>>
>>>>>>>>> LLVM's coverage testing, on the other hand, take a hybrid approach: It
>>>>>>>>> emits the coverage map as rodata, but does not pass it to the profile
>>>>>>>>> dumper. I think it is better to emit covmap as a side data not
>>>>>>>>> attached to target binary.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thus, as we require that the binary be passed to llvm-profdata, there
>>>>>>>>>> is
>>>>>>>>>> no
>>>>>>>>>> fundamental reason that the memory image of the program, or the raw
>>>>>>>>>> data
>>>>>>>>>> extracted from the program, must have any size overhead besides the
>>>>>>>>>> raw
>>>>>>>>>> values of the counters themselves and any text size increase for
>>>>>>>>>> incrementing them. If we are willing to impose this requirement on
>>>>>>>>>> users,
>>>>>>>>>> then as far as reducing memory overhead at runtime and reducing the
>>>>>>>>>> size
>>>>>>>>>> of
>>>>>>>>>> the raw profile data extracted from the target, using hashed function
>>>>>>>>>> names
>>>>>>>>>> is clearly the wrong direction.
>>>>>>>>>>
>>>>>>>>>> *Without* imposing the requirement of passing the binary to
>>>>>>>>>> llvm-profdata, I
>>>>>>>>>> do like the ability to use hashed function names like you are
>>>>>>>>>> proposing.
>>>>>>>>>> It
>>>>>>>>>> is a simple solution for reducing size overhead of function name
>>>>>>>>>> strings
>>>>>>>>>> with little complexity, as it is just swapping one string for
>>>>>>>>>> another.
>>>>>>>>>
>>>>>>>>> Agree. The good news is that the overhead of hashed function names is
>>>>>>>>> small enough that makes this approach attractive.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Obviously LLVM must be able to support the extremely large
>>>>>>>>>>>> binaries
>>>>>>>>>>>> in
>>>>>>>>>>>> your
>>>>>>>>>>>> configuration (otherwise what use is LLVM as a compiler ;) My
>>>>>>>>>>>> questions
>>>>>>>>>>>> are
>>>>>>>>>>>> primarily aimed at establishing which tradeoffs are acceptable for
>>>>>>>>>>>> supporting this (both for LLVM and for you guys).
>>>>>>>>>>>
>>>>>>>>>>> As I said, with the modified proposal (after getting your feedback),
>>>>>>>>>>> no PGO users in LLVM land is going to lose anything/functionality.
>>>>>>>>>>> The
>>>>>>>>>>> end result will be net win for general users of LLVM (even though
>>>>>>>>>>> your
>>>>>>>>>>> customers don't care about it), not just 'us' as you have mentioned
>>>>>>>>>>> many times.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Btw, for us, the issue of PGO data size is not completely
>>>>>>>>>>>> immaterial
>>>>>>>>>>>> but
>>>>>>>>>>>> is
>>>>>>>>>>>> very different from your use case. For us, the primary issue is
>>>>>>>>>>>> the
>>>>>>>>>>>> additional memory use at run time, since PS4 games usually use
>>>>>>>>>>>> "all"
>>>>>>>>>>>> available memory. We had a problem with UBSan where the large
>>>>>>>>>>>> amount
>>>>>>>>>>>> of
>>>>>>>>>>>> memory required for storing the UBSan diagnostic data at runtime
>>>>>>>>>>>> required
>>>>>>>>>>>> the game programmers to manually change their memory map to make
>>>>>>>>>>>> room.
>>>>>>>>>>>> +Filipe, do you remember how much memory UBSan was using that
>>>>>>>>>>>> caused
>>>>>>>>>>>> a
>>>>>>>>>>>> problem?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> My proposal does help reducing rodata size significantly.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes, that is why I think that this is a useful thing to do. I just
>>>>>>>>>> want
>>>>>>>>>> to
>>>>>>>>>> be careful about existing use cases and the relevant workflow issues.
>>>>>>>>>>
>>>>>>>>>> -- Sean Silva
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the raw format, md5 sum key can be an embedded field in the
>>>>>>>>>>>>> prf_data variable instead of as different var referenced by
>>>>>>>>>>>>> prf_data.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If this is not the case, you should show your current patch so
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> discuss things concretely.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is not. See above about the difference.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It will be
>>>>>>>>>>>>>>>>> very messy to support multiple formats in instr-codegen and
>>>>>>>>>>>>>>>>> instr-runtime. For compatibility concerns, the reader is
>>>>>>>>>>>>>>>>> taught
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> support previous format, but the changes there are isolated
>>>>>>>>>>>>>>>>> (also
>>>>>>>>>>>>>>>>> expected to be removed in the future).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My primary concern is that if the function name are not
>>>>>>>>>>>>>>>>>> kept
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> stages,
>>>>>>>>>>>>>>>>>> then it becomes difficult to analyze the profile data in
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> standalone
>>>>>>>>>>>>>>>>>> way.
>>>>>>>>>>>>>>>>>> Many times, I have used `llvm-profdata show
>>>>>>>>>>>>>>>>>> -all-functions
>>>>>>>>>>>>>>>>>> foo.profdata`
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> the resulting profile data and then imported that data
>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>> Mathematica
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> analysis.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is certainly a very valid use case.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My understanding of your proposal is that `llvm-profdata
>>>>>>>>>>>>>>>>>> show
>>>>>>>>>>>>>>>>>> -all-functions foo.profdata` will not show the actual
>>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> instead MD5 hashes,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To support your use case, there are two solutions:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) user can add -fcoverage-mapping option in the build
>>>>>>>>>>>>>>>>> 2) introduce a new option -fprofile-instr-names that force
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> emission of the name sections in the .o file. This is
>>>>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> 1),
>>>>>>>>>>>>>>>>> but no covmap section is needed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> llvm-profdata tool will be taught to read the name section
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> attach
>>>>>>>>>>>>>>>>> function names to the profile records.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Needing to pass the executable to llvm-profdata would cause
>>>>>>>>>>>>>>>> deployment
>>>>>>>>>>>>>>>> issues for my customers in practice.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why? The deployment needs to pass the profile data anyway
>>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, but not the executable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The PGO training run is likely being run by a gameplay tester
>>>>>>>>>>>>>> (non-programmer). In general the binary will not be lying
>>>>>>>>>>>>>> around
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> loose
>>>>>>>>>>>>>> file anywhere, it will be part of a full package of the
>>>>>>>>>>>>>> binary+assets
>>>>>>>>>>>>>> (think
>>>>>>>>>>>>>> like what will end up on a bluray disc). A game's binary
>>>>>>>>>>>>>> *completely
>>>>>>>>>>>>>> useless* without the assets, so except locally on a
>>>>>>>>>>>>>> programmer's
>>>>>>>>>>>>>> machine
>>>>>>>>>>>>>> while they iterate/debug, there is no reason for a binary to
>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>> exist
>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>> standalone file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm not saying that needing the binary is insurmountable in any
>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>> scenario. Just that it will cause a strict increase in the
>>>>>>>>>>>>>> number
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>> to deploying PGO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Your concern is acknowledged.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> These are much bigger "compatibility concerns" for me than for
>>>>>>>>>>>>>> newer
>>>>>>>>>>>>>> toolchains to accept the old format. For a change in format I
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>> tell my users to replace an exe with a newer one and that is
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> they
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>> to do and it takes 10 seconds, guaranteed. A workflow change is
>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>> a massive disruption and guaranteed to take more than 10
>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> fix
>>>>>>>>>>>>>> (perhaps hours or days).
>>>>>>>>>>>>>
>>>>>>>>>>>>> ok.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>> is no different from llvm-cov usage model.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In practice, getting the performance of PGO is a higher
>>>>>>>>>>>>>> priority
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> my
>>>>>>>>>>>>>> users, so we should not assume that llvm-cov is being used.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Glad to hear that :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Note that with 1) or 2), the user can still benefit from
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> reduced
>>>>>>>>>>>>>>>>> profile size.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let me reiterate that the size of the profile is not a
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> observed in practice (nor have I heard of this being a
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> practice
>>>>>>>>>>>>>>>> until this thread). Therefore I'm skeptical of any changes
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>> default
>>>>>>>>>>>>>>>> behavior or any new requirements that are not opt-in.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> which will make it more difficult for me to do this kind
>>>>>>>>>>>>>>>>>> of analysis (would require using nm on the original
>>>>>>>>>>>>>>>>>> binary,
>>>>>>>>>>>>>>>>>> hashing
>>>>>>>>>>>>>>>>>> everything, etc.).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> btw, feel free to attach the patch even if it in a rough
>>>>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> help to clarify the proposal and be a good talking point.
>>>>>>>>>>>>>>>>>> Fine-grained
>>>>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>>>> review for caring about the rough parts will happen on
>>>>>>>>>>>>>>>>>> llvm-commits;
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> rough parts will not distract the discussion here on
>>>>>>>>>>>>>>>>>> llvm-dev.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
More information about the llvm-dev
mailing list