[llvm-dev] RFC: Reducing Instr PGO size overhead

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Fri Sep 4 21:26:55 PDT 2015


On Fri, Sep 4, 2015 at 9:11 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Fri, Sep 4, 2015 at 5:42 PM, Xinliang David Li <davidxl at google.com>
> wrote:
>
>> On Fri, Sep 4, 2015 at 5:21 PM, Sean Silva <chisophugis at gmail.com> wrote:
>> >
>> >
>> > On Fri, Sep 4, 2015 at 3:57 PM, Xinliang David Li <davidxl at google.com>
>> > wrote:
>> >>
>> >> >
>> >> > I think it is reasonable to simply replace the key we currently use
>> with
>> >> > MD5(key) for getting a size reduction.  In practice for my use
>> cases, I
>> >> > have
>> >> > not observed any of the issues you mentioned under "Large size of
>> >> > overhead
>> >> > can limit the usability of PGO greatly", but I can understand that
>> some
>> >> > of
>> >> > these issues could become problems in Google's use case. I would
>> >> > personally
>> >> > prefer to keep the existing behavior as the default (see below), and
>> >> > have
>> >> > MD5(key) as an option.
>> >>
>> >> The problem is that this requires profile format changes.
>> >
>> >
>> > Why? AFAIK the "function name" is just an arbitrary string. Using s or
>> > MD5(s) shouldn't matter. Of course, the user will need to pass
>> consistent
>> > flags to clang.
>>
>> The raw format for 64bit target can be made 'unchanged', but not for
>> the 32bit raw format -- the nameptr field is only 32bit.
>>
>> The indexed format can not be made the same --  The ondisk profile
>> record layout totally changes. The key field changes from a blob of
>> chars into an 64bit integer.
>>
>
> An MD5 sum cannot be represented as a blob of chars?
>
> Or to say it another way, suppose that Itanium mangling required as a
> final step to replace the string with its md5 sum in hex. Therefore all
> symbol names are "small". My understanding is that this is effectively all
> your patch is doing.
>

(only when dealing with profile-related things; the actual symbol names in
the binary are kept the same as always; i.e. all PGO-related codepaths will
either 1) use the assembly name directly or 2) compute and MD5 sum of the
assembly name before using it; which of these is chosen depends on a
user-specified flag)

-- Sean Silva


>
> If this is not the case, you should show your current patch so that we can
> discuss things concretely.
>
>
>> >
>> >
>> >>
>> >> It will be
>> >> very messy to support multiple formats in instr-codegen and
>> >> instr-runtime.  For compatibility concerns, the reader is taught to
>> >> support previous format, but the changes there are isolated (also
>> >> expected to be removed in the future).
>> >>
>> >> >
>> >> > My primary concern is that if the function name are not kept at all
>> >> > stages,
>> >> > then it becomes difficult to analyze the profile data in a standalone
>> >> > way.
>> >> > Many times, I have used `llvm-profdata show -all-functions
>> foo.profdata`
>> >> > on
>> >> > the resulting profile data and then imported that data into
>> Mathematica
>> >> > for
>> >> > analysis.
>> >>
>> >> This is certainly a very valid use case.
>> >>
>> >> >My understanding of your proposal is that `llvm-profdata show
>> >> > -all-functions foo.profdata` will not show the actual function names
>> but
>> >> > instead MD5 hashes,
>> >>
>> >> Yes.
>> >>
>> >> To support your use case, there are two solutions:
>> >>
>> >> 1) user can add -fcoverage-mapping option in the build
>> >> 2) introduce a new option -fprofile-instr-names that force the
>> >> emission of the name sections in the .o file. This is similar to 1),
>> >> but no covmap section is needed.
>> >>
>> >> llvm-profdata tool  will be taught to read the name section and attach
>> >> function names to the profile records.
>> >
>> >
>> > Needing to pass the executable to llvm-profdata would cause deployment
>> > issues for my customers in practice.
>>
>> Why? The deployment needs to pass the profile data anyway right?
>
>
> Yes, but not the executable.
>
> The PGO training run is likely being run by a gameplay tester
> (non-programmer). In general the binary will not be lying around as a loose
> file anywhere, it will be part of a full package of the binary+assets
> (think like what will end up on a bluray disc). A game's binary *completely
> useless* without the assets, so except locally on a programmer's machine
> while they iterate/debug, there is no reason for a binary to ever exist as
> a standalone file.
>
> I'm not saying that needing the binary is insurmountable in any particular
> scenario. Just that it will cause a strict increase in the number of issues
> to deploying PGO.
>
> These are much bigger "compatibility concerns" for me than for newer
> toolchains to accept the old format. For a change in format I can easily
> tell my users to replace an exe with a newer one and that is all they need
> to do and it takes 10 seconds, guaranteed. A workflow change is potentially
> a massive disruption and guaranteed to take more than 10 seconds to fix
> (perhaps hours or days).
>
>
>
>>   This
>> is no different from llvm-cov usage model.
>>
>
> In practice, getting the performance of PGO is a higher priority for my
> users, so we should not assume that llvm-cov is being used.
>
> -- Sean Silva
>
>
>>
>> David
>>
>>
>>
>> >
>> >>
>> >>
>> >> Note that with 1) or 2), the user can still benefit from the reduced
>> >> profile size.
>> >
>> >
>> > Let me reiterate that the size of the profile is not a problem I have
>> > observed in practice (nor have I heard of this being a problem in
>> practice
>> > until this thread). Therefore I'm skeptical of any changes to our
>> default
>> > behavior or any new requirements that are not opt-in.
>> >
>> > -- Sean Silva
>> >
>> >>
>> >>
>> >> thanks,
>> >>
>> >> David
>> >>
>> >>
>> >>
>> >>
>> >> >which will make it more difficult for me to do this kind
>> >> > of analysis (would require using nm on the original binary, hashing
>> >> > everything, etc.).
>> >> >
>> >> > btw, feel free to attach the patch even if it in a rough state. It
>> can
>> >> > still
>> >> > help to clarify the proposal and be a good talking point.
>> Fine-grained
>> >> > patch
>> >> > review for caring about the rough parts will happen on llvm-commits;
>> the
>> >> > rough parts will not distract the discussion here on llvm-dev.
>> >> >
>> >> > -- Sean Silva
>> >> >
>> >> >>
>> >> >>
>> >> >> thanks,
>> >> >>
>> >> >> David
>> >> >> _______________________________________________
>> >> >> LLVM Developers mailing list
>> >> >> llvm-dev at lists.llvm.org
>> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >
>> >> >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150904/0052678a/attachment.html>


More information about the llvm-dev mailing list