[llvm-dev] RFC: Reducing Instr PGO size overhead

Tue Sep 8 15:05:18 PDT 2015

On Tue, Sep 8, 2015 at 3:04 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Tue, Sep 8, 2015 at 3:03 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Fri, Sep 4, 2015 at 11:44 PM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>> On Fri, Sep 4, 2015 at 11:03 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>> >
>>> >
>>> > On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at google.com
>>> >
>>> > wrote:
>>> >>
>>> >> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>> >> >
>>> >> >
>>> >> > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <
>>> davidxl at google.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li
>>> >> >> > <davidxl at google.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> >
>>> >> >> >> > I think it is reasonable to simply replace the key we
>>> currently
>>> >> >> >> > use
>>> >> >> >> > with
>>> >> >> >> > MD5(key) for getting a size reduction.  In practice for my use
>>> >> >> >> > cases,
>>> >> >> >> > I
>>> >> >> >> > have
>>> >> >> >> > not observed any of the issues you mentioned under "Large
>>> size of
>>> >> >> >> > overhead
>>> >> >> >> > can limit the usability of PGO greatly", but I can understand
>>> that
>>> >> >> >> > some
>>> >> >> >> > of
>>> >> >> >> > these issues could become problems in Google's use case. I
>>> would
>>> >> >> >> > personally
>>> >> >> >> > prefer to keep the existing behavior as the default (see
>>> below),
>>> >> >> >> > and
>>> >> >> >> > have
>>> >> >> >> > MD5(key) as an option.
>>> >> >> >>
>>> >> >> >> The problem is that this requires profile format changes.
>>> >> >> >
>>> >> >> >
>>> >> >> > Why? AFAIK the "function name" is just an arbitrary string.
>>> Using s
>>> >> >> > or
>>> >> >> > MD5(s) shouldn't matter. Of course, the user will need to pass
>>> >> >> > consistent
>>> >> >> > flags to clang.
>>> >> >>
>>> >> >> The raw format for 64bit target can be made 'unchanged', but not
>>> for
>>> >> >> the 32bit raw format -- the nameptr field is only 32bit.
>>> >> >>
>>> >> >> The indexed format can not be made the same --  The ondisk profile
>>> >> >> record layout totally changes. The key field changes from a blob of
>>> >> >> chars into an 64bit integer.
>>> >> >
>>> >> >
>>> >> > An MD5 sum cannot be represented as a blob of chars?
>>> >>
>>> >> yes -- it is fixed length (8byte) blob which may include null byte in
>>> >> the middle.
>>> >
>>> >
>>> > For reference, MD5 sum is 16 bytes (128-bit):
>>> > https://en.wikipedia.org/wiki/MD5
>>>
>>> yes, LLVM's MD5 hash only takes the lower 64bit.
>>>
>>>
>>> >
>>> >>
>>> >>
>>> >> >
>>> >> > Or to say it another way, suppose that Itanium mangling required as
>>> a
>>> >> > final
>>> >> > step to replace the string with its md5 sum in hex. Therefore all
>>> symbol
>>> >> > names are "small". My understanding is that this is effectively all
>>> your
>>> >> > patch is doing.
>>> >>
>>> >> The key type before the change is StringRef, while the after the key
>>> >> type is uint64_t. Are you suggesting treating uint64_t md5 sum key as
>>> >> a string of 8 bytes or storing md5 has in text form which will double
>>> >> the size?
>>> >
>>> >
>>> > How much does this change the benefit? If most of the benefit is
>>> avoiding
>>> > extraordinarily long mangled names then it may be sufficient.
>>> >
>>> > With IR-level instrumentation like Rong is pursuing the size may be
>>> reduced
>>> > sufficiently that we do not need the optimization proposed in this
>>> thread.
>>> > For example, Rong found >2x size reduction on Google's C++ benchmarks,
>>> which
>>> > I assume are representative of the extremely large Google binaries
>>> that are
>>> > causing the problems addressed by your proposal in this thread. The
>>> > measurements you mention for Clang in this thread provide similar size
>>> > reductions, so Rong's approach may be sufficient (especially because
>>> > functions with extremely large mangled names tend to be small inline
>>> > functions in header-only template libraries).
>>>
>>> Late instrumentation helps many cases. In some cases (as shown in
>>> SPEC), the reduction in size is not as large. Reducing PGO overhead
>>> will lower the bar for its adoption.
>>>
>>> >
>>> > Of the points you mention in "Large size of overhead can limit the
>>> usability
>>> > of PGO greatly", many of the issues are hard limits that prevent the
>>> use of
>>> > PGO. Do you have a lower bound on how much the size of the PGO data
>>> must be
>>> > reduced in order to overcome the hard limits?
>>>
>>> This is a static view:  Think about the situation where application
>>> size is ever increasing; also think about situation where we want to
>>
>> collect more types of profile data. Think about situation where user
>>> want to run pgo binaries on small devices with tiny memory/storage ..
>>
>>
>> If we want to reduce memory overhead at runtime and reduce the size of
>> the raw profile data extracted from the target, there are clear solutions.
>> Consider that debug info does not need to be loaded into the memory image
>> of the target; why should information identifying each counter need to be?
>> A file containing raw profile counters is a subset of a core dump; in
>> most environments, a core dump does not need to have debug info or symbol
>> names in it, but can be still be read in full detail in conjunction with
>> the original binary.
>>
>> Thus, as we require
>>
>
> this should read "if we ant to require".
>

 "if we want to require"

-- Sean Silva

> -- Sean Silva
>
>
>> that the binary be passed to llvm-profdata, there is no fundamental
>> reason that the memory image of the program, or the raw data extracted from
>> the program, must have any size overhead besides the raw values of the
>> counters themselves and any text size increase for incrementing them. If we
>> are willing to impose this requirement on users, then as far as reducing
>> memory overhead at runtime and reducing the size of the raw profile data
>> extracted from the target, using hashed function names is clearly the wrong
>> direction.
>>
>> *Without* imposing the requirement of passing the binary to
>> llvm-profdata, I do like the ability to use hashed function names like you
>> are proposing. It is a simple solution for reducing size overhead of
>> function name strings with little complexity, as it is just swapping one
>> string for another.
>>
>>
>>>
>>> >
>>> > Obviously LLVM must be able to support the extremely large binaries in
>>> your
>>> > configuration (otherwise what use is LLVM as a compiler ;) My
>>> questions are
>>> > primarily aimed at establishing which tradeoffs are acceptable for
>>> > supporting this (both for LLVM and for you guys).
>>>
>>> As I said, with the modified proposal (after getting your feedback),
>>> no PGO users in LLVM land is going to lose anything/functionality. The
>>> end result will be net win for general users of LLVM (even though your
>>> customers don't care about it), not just 'us' as you have mentioned
>>> many times.
>>>
>>> >
>>> > Btw, for us, the issue of PGO data size is not completely immaterial
>>> but is
>>> > very different from your use case. For us, the primary issue is the
>>> > additional memory use at run time, since PS4 games usually use "all"
>>> > available memory. We had a problem with UBSan where the large amount of
>>> > memory required for storing the UBSan diagnostic data at runtime
>>> required
>>> > the game programmers to manually change their memory map to make room.
>>> > +Filipe, do you remember how much memory UBSan was using that caused a
>>> > problem?
>>> >
>>>
>>> My proposal does help reducing rodata size significantly.
>>>
>>
>> Yes, that is why I think that this is a useful thing to do. I just want
>> to be careful about existing use cases and the relevant workflow issues.
>>
>> -- Sean Silva
>>
>>
>>>
>>> David
>>>
>>>
>>> > -- Sean Silva
>>> >
>>> >>
>>> >>
>>> >> In the raw format, md5 sum key can be an embedded field in the
>>> >> prf_data variable instead of as different var referenced by prf_data.
>>> >>
>>> >> >
>>> >> > If this is not the case, you should show your current patch so that
>>> we
>>> >> > can
>>> >> > discuss things concretely.
>>> >>
>>> >> It is not. See above about the difference.
>>> >>
>>> >> >
>>> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> >>
>>> >> >> >> It will be
>>> >> >> >> very messy to support multiple formats in instr-codegen and
>>> >> >> >> instr-runtime.  For compatibility concerns, the reader is
>>> taught to
>>> >> >> >> support previous format, but the changes there are isolated
>>> (also
>>> >> >> >> expected to be removed in the future).
>>> >> >> >>
>>> >> >> >> >
>>> >> >> >> > My primary concern is that if the function name are not kept
>>> at
>>> >> >> >> > all
>>> >> >> >> > stages,
>>> >> >> >> > then it becomes difficult to analyze the profile data in a
>>> >> >> >> > standalone
>>> >> >> >> > way.
>>> >> >> >> > Many times, I have used `llvm-profdata show -all-functions
>>> >> >> >> > foo.profdata`
>>> >> >> >> > on
>>> >> >> >> > the resulting profile data and then imported that data into
>>> >> >> >> > Mathematica
>>> >> >> >> > for
>>> >> >> >> > analysis.
>>> >> >> >>
>>> >> >> >> This is certainly a very valid use case.
>>> >> >> >>
>>> >> >> >> >My understanding of your proposal is that `llvm-profdata show
>>> >> >> >> > -all-functions foo.profdata` will not show the actual function
>>> >> >> >> > names
>>> >> >> >> > but
>>> >> >> >> > instead MD5 hashes,
>>> >> >> >>
>>> >> >> >> Yes.
>>> >> >> >>
>>> >> >> >> To support your use case, there are two solutions:
>>> >> >> >>
>>> >> >> >> 1) user can add -fcoverage-mapping option in the build
>>> >> >> >> 2) introduce a new option -fprofile-instr-names that force the
>>> >> >> >> emission of the name sections in the .o file. This is similar
>>> to 1),
>>> >> >> >> but no covmap section is needed.
>>> >> >> >>
>>> >> >> >> llvm-profdata tool  will be taught to read the name section and
>>> >> >> >> attach
>>> >> >> >> function names to the profile records.
>>> >> >> >
>>> >> >> >
>>> >> >> > Needing to pass the executable to llvm-profdata would cause
>>> >> >> > deployment
>>> >> >> > issues for my customers in practice.
>>> >> >>
>>> >> >> Why? The deployment needs to pass the profile data anyway right?
>>> >> >
>>> >> >
>>> >> > Yes, but not the executable.
>>> >> >
>>> >> > The PGO training run is likely being run by a gameplay tester
>>> >> > (non-programmer). In general the binary will not be lying around as
>>> a
>>> >> > loose
>>> >> > file anywhere, it will be part of a full package of the
>>> binary+assets
>>> >> > (think
>>> >> > like what will end up on a bluray disc). A game's binary *completely
>>> >> > useless* without the assets, so except locally on a programmer's
>>> machine
>>> >> > while they iterate/debug, there is no reason for a binary to ever
>>> exist
>>> >> > as a
>>> >> > standalone file.
>>> >> >
>>> >> > I'm not saying that needing the binary is insurmountable in any
>>> >> > particular
>>> >> > scenario. Just that it will cause a strict increase in the number of
>>> >> > issues
>>> >> > to deploying PGO.
>>> >>
>>> >>  Your concern is acknowledged.
>>> >>
>>> >> >
>>> >> > These are much bigger "compatibility concerns" for me than for newer
>>> >> > toolchains to accept the old format. For a change in format I can
>>> easily
>>> >> > tell my users to replace an exe with a newer one and that is all
>>> they
>>> >> > need
>>> >> > to do and it takes 10 seconds, guaranteed. A workflow change is
>>> >> > potentially
>>> >> > a massive disruption and guaranteed to take more than 10 seconds to
>>> fix
>>> >> > (perhaps hours or days).
>>> >>
>>> >> ok.
>>> >>
>>> >> >
>>> >> >
>>> >> >>
>>> >> >>   This
>>> >> >> is no different from llvm-cov usage model.
>>> >> >
>>> >> >
>>> >> > In practice, getting the performance of PGO is a higher priority
>>> for my
>>> >> > users, so we should not assume that llvm-cov is being used.
>>> >>
>>> >> Glad to hear that :)
>>> >>
>>> >> thanks,
>>> >>
>>> >> David
>>> >>
>>> >> >
>>> >> > -- Sean Silva
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> David
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> Note that with 1) or 2), the user can still benefit from the
>>> reduced
>>> >> >> >> profile size.
>>> >> >> >
>>> >> >> >
>>> >> >> > Let me reiterate that the size of the profile is not a problem I
>>> have
>>> >> >> > observed in practice (nor have I heard of this being a problem in
>>> >> >> > practice
>>> >> >> > until this thread). Therefore I'm skeptical of any changes to our
>>> >> >> > default
>>> >> >> > behavior or any new requirements that are not opt-in.
>>> >> >> >
>>> >> >> > -- Sean Silva
>>> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> thanks,
>>> >> >> >>
>>> >> >> >> David
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> >which will make it more difficult for me to do this kind
>>> >> >> >> > of analysis (would require using nm on the original binary,
>>> >> >> >> > hashing
>>> >> >> >> > everything, etc.).
>>> >> >> >> >
>>> >> >> >> > btw, feel free to attach the patch even if it in a rough
>>> state. It
>>> >> >> >> > can
>>> >> >> >> > still
>>> >> >> >> > help to clarify the proposal and be a good talking point.
>>> >> >> >> > Fine-grained
>>> >> >> >> > patch
>>> >> >> >> > review for caring about the rough parts will happen on
>>> >> >> >> > llvm-commits;
>>> >> >> >> > the
>>> >> >> >> > rough parts will not distract the discussion here on llvm-dev.
>>> >> >> >> >
>>> >> >> >> > -- Sean Silva
>>> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> thanks,
>>> >> >> >> >>
>>> >> >> >> >> David
>>> >> >> >> >> _______________________________________________
>>> >> >> >> >> LLVM Developers mailing list
>>> >> >> >> >> llvm-dev at lists.llvm.org
>>> >> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/12a9658d/attachment.html>