[llvm-dev] RFC: Reducing Instr PGO size overhead

Fri Sep 4 23:17:16 PDT 2015

On Fri, Sep 4, 2015 at 11:03 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Fri, Sep 4, 2015 at 10:11 PM, Xinliang David Li <davidxl at google.com>
> wrote:
>
>> On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at gmail.com> wrote:
>> >
>> >
>> > On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl at google.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com>
>> wrote:
>> >> >
>> >> >
>> >> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li <
>> davidxl at google.com>
>> >> > wrote:
>> >> >>
>> >> >> >
>> >> >> > I think it is reasonable to simply replace the key we currently
>> use
>> >> >> > with
>> >> >> > MD5(key) for getting a size reduction.  In practice for my use
>> cases,
>> >> >> > I
>> >> >> > have
>> >> >> > not observed any of the issues you mentioned under "Large size of
>> >> >> > overhead
>> >> >> > can limit the usability of PGO greatly", but I can understand that
>> >> >> > some
>> >> >> > of
>> >> >> > these issues could become problems in Google's use case. I would
>> >> >> > personally
>> >> >> > prefer to keep the existing behavior as the default (see below),
>> and
>> >> >> > have
>> >> >> > MD5(key) as an option.
>> >> >>
>> >> >> The problem is that this requires profile format changes.
>> >> >
>> >> >
>> >> > Why? AFAIK the "function name" is just an arbitrary string. Using s
>> or
>> >> > MD5(s) shouldn't matter. Of course, the user will need to pass
>> >> > consistent
>> >> > flags to clang.
>> >>
>> >> The raw format for 64bit target can be made 'unchanged', but not for
>> >> the 32bit raw format -- the nameptr field is only 32bit.
>> >>
>> >> The indexed format can not be made the same --  The ondisk profile
>> >> record layout totally changes. The key field changes from a blob of
>> >> chars into an 64bit integer.
>> >
>> >
>> > An MD5 sum cannot be represented as a blob of chars?
>>
>> yes -- it is fixed length (8byte) blob which may include null byte in
>> the middle.
>>
>
> For reference, MD5 sum is 16 bytes (128-bit):
> https://en.wikipedia.org/wiki/MD5
>
>
>>
>> >
>> > Or to say it another way, suppose that Itanium mangling required as a
>> final
>> > step to replace the string with its md5 sum in hex. Therefore all symbol
>> > names are "small". My understanding is that this is effectively all your
>> > patch is doing.
>>
>> The key type before the change is StringRef, while the after the key
>> type is uint64_t. Are you suggesting treating uint64_t md5 sum key as
>> a string of 8 bytes or storing md5 has in text form which will double
>> the size?
>>
>
> How much does this change the benefit? If most of the benefit is avoiding
> extraordinarily long mangled names then it may be sufficient.
>
> With IR-level instrumentation like Rong is pursuing the size may be
> reduced sufficiently that we do not need the optimization proposed in this
> thread. For example, Rong found >2x size reduction
>

To clarify, I mean the size of the PGO data, not the binary size.
Column (2) in section 3.5 of Rong's RFC to be precise.

-- Sean Silva

> on Google's C++ benchmarks, which I assume are representative of the
> extremely large Google binaries that are causing the problems addressed by
> your proposal in this thread. The measurements you mention for Clang in
> this thread provide similar size reductions, so Rong's approach may be
> sufficient (especially because functions with extremely large mangled names
> tend to be small inline functions in header-only template libraries).
>
> Of the points you mention in "Large size of overhead can limit the
> usability of PGO greatly", many of the issues are hard limits that prevent
> the use of PGO. Do you have a lower bound on how much the size of the PGO
> data must be reduced in order to overcome the hard limits?
>
> Obviously LLVM must be able to support the extremely large binaries in
> your configuration (otherwise what use is LLVM as a compiler ;) My
> questions are primarily aimed at establishing which tradeoffs are
> acceptable for supporting this (both for LLVM and for you guys).
>
> Btw, for us, the issue of PGO data size is not completely immaterial but
> is very different from your use case. For us, the primary issue is the
> additional memory use at run time, since PS4 games usually use "all"
> available memory. We had a problem with UBSan where the large amount of
> memory required for storing the UBSan diagnostic data at runtime required
> the game programmers to manually change their memory map to make room.
> +Filipe, do you remember how much memory UBSan was using that caused a
> problem?
>
> -- Sean Silva
>
>
>>
>> In the raw format, md5 sum key can be an embedded field in the
>> prf_data variable instead of as different var referenced by prf_data.
>>
>> >
>> > If this is not the case, you should show your current patch so that we
>> can
>> > discuss things concretely.
>>
>> It is not. See above about the difference.
>>
>> >
>> >>
>> >> >
>> >> >
>> >> >>
>> >> >> It will be
>> >> >> very messy to support multiple formats in instr-codegen and
>> >> >> instr-runtime.  For compatibility concerns, the reader is taught to
>> >> >> support previous format, but the changes there are isolated (also
>> >> >> expected to be removed in the future).
>> >> >>
>> >> >> >
>> >> >> > My primary concern is that if the function name are not kept at
>> all
>> >> >> > stages,
>> >> >> > then it becomes difficult to analyze the profile data in a
>> standalone
>> >> >> > way.
>> >> >> > Many times, I have used `llvm-profdata show -all-functions
>> >> >> > foo.profdata`
>> >> >> > on
>> >> >> > the resulting profile data and then imported that data into
>> >> >> > Mathematica
>> >> >> > for
>> >> >> > analysis.
>> >> >>
>> >> >> This is certainly a very valid use case.
>> >> >>
>> >> >> >My understanding of your proposal is that `llvm-profdata show
>> >> >> > -all-functions foo.profdata` will not show the actual function
>> names
>> >> >> > but
>> >> >> > instead MD5 hashes,
>> >> >>
>> >> >> Yes.
>> >> >>
>> >> >> To support your use case, there are two solutions:
>> >> >>
>> >> >> 1) user can add -fcoverage-mapping option in the build
>> >> >> 2) introduce a new option -fprofile-instr-names that force the
>> >> >> emission of the name sections in the .o file. This is similar to 1),
>> >> >> but no covmap section is needed.
>> >> >>
>> >> >> llvm-profdata tool  will be taught to read the name section and
>> attach
>> >> >> function names to the profile records.
>> >> >
>> >> >
>> >> > Needing to pass the executable to llvm-profdata would cause
>> deployment
>> >> > issues for my customers in practice.
>> >>
>> >> Why? The deployment needs to pass the profile data anyway right?
>> >
>> >
>> > Yes, but not the executable.
>> >
>> > The PGO training run is likely being run by a gameplay tester
>> > (non-programmer). In general the binary will not be lying around as a
>> loose
>> > file anywhere, it will be part of a full package of the binary+assets
>> (think
>> > like what will end up on a bluray disc). A game's binary *completely
>> > useless* without the assets, so except locally on a programmer's machine
>> > while they iterate/debug, there is no reason for a binary to ever exist
>> as a
>> > standalone file.
>> >
>> > I'm not saying that needing the binary is insurmountable in any
>> particular
>> > scenario. Just that it will cause a strict increase in the number of
>> issues
>> > to deploying PGO.
>>
>>  Your concern is acknowledged.
>>
>> >
>> > These are much bigger "compatibility concerns" for me than for newer
>> > toolchains to accept the old format. For a change in format I can easily
>> > tell my users to replace an exe with a newer one and that is all they
>> need
>> > to do and it takes 10 seconds, guaranteed. A workflow change is
>> potentially
>> > a massive disruption and guaranteed to take more than 10 seconds to fix
>> > (perhaps hours or days).
>>
>> ok.
>>
>> >
>> >
>> >>
>> >>   This
>> >> is no different from llvm-cov usage model.
>> >
>> >
>> > In practice, getting the performance of PGO is a higher priority for my
>> > users, so we should not assume that llvm-cov is being used.
>>
>> Glad to hear that :)
>>
>> thanks,
>>
>> David
>>
>> >
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> David
>> >>
>> >>
>> >>
>> >> >
>> >> >>
>> >> >>
>> >> >> Note that with 1) or 2), the user can still benefit from the reduced
>> >> >> profile size.
>> >> >
>> >> >
>> >> > Let me reiterate that the size of the profile is not a problem I have
>> >> > observed in practice (nor have I heard of this being a problem in
>> >> > practice
>> >> > until this thread). Therefore I'm skeptical of any changes to our
>> >> > default
>> >> > behavior or any new requirements that are not opt-in.
>> >> >
>> >> > -- Sean Silva
>> >> >
>> >> >>
>> >> >>
>> >> >> thanks,
>> >> >>
>> >> >> David
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> >which will make it more difficult for me to do this kind
>> >> >> > of analysis (would require using nm on the original binary,
>> hashing
>> >> >> > everything, etc.).
>> >> >> >
>> >> >> > btw, feel free to attach the patch even if it in a rough state. It
>> >> >> > can
>> >> >> > still
>> >> >> > help to clarify the proposal and be a good talking point.
>> >> >> > Fine-grained
>> >> >> > patch
>> >> >> > review for caring about the rough parts will happen on
>> llvm-commits;
>> >> >> > the
>> >> >> > rough parts will not distract the discussion here on llvm-dev.
>> >> >> >
>> >> >> > -- Sean Silva
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >> thanks,
>> >> >> >>
>> >> >> >> David
>> >> >> >> _______________________________________________
>> >> >> >> LLVM Developers mailing list
>> >> >> >> llvm-dev at lists.llvm.org
>> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150904/4a1b93ce/attachment.html>