[llvm-dev] RFC: Reducing Instr PGO size overhead

Tue Sep 8 14:23:26 PDT 2015

On Mon, Sep 7, 2015 at 10:38 PM, Xinliang David Li <davidxl at google.com>
wrote:

> >> The key type before the change is StringRef, while the after the key
> >> type is uint64_t. Are you suggesting treating uint64_t md5 sum key as
> >> a string of 8 bytes or storing md5 has in text form which will double
> >> the size?
> >
> >
> > How much does this change the benefit? If most of the benefit is avoiding
> > extraordinarily long mangled names then it may be sufficient.
>
> I implemented the same functionality using the approach suggested by you.
>
> 1) There is no format change required for raw/indexed profile data
> 2) when a user option is specified, md5hash string is used as the
> function key instead of of the raw function name
>
> In addition to that,  the option specified is also passed to the
> runtime and emitted into profile data. This allows different variants
> of the profile data encoded in the same format  to be automatically
> recognized by the client (clang -fprofile-use, llvm-profdata etc)
> without the need to specify the option again.
>

Okay, this seems reasonable if it does not add a huge amount of complexity.

>
> There are some size increase, but increase seems to be in the range
> that is acceptable. Using clang as the test case. The indexed profile
> size is now 33.6 M (compared with 29.5M which is the optimal case).
> The raw profile size is 54.1M (compared with 40.1M from previous
> patch).
>

Thanks for looking into this. If this is acceptable for you, then I think
this approach is simpler.

-- Sean Silva

>
> David
>
> >
> > With IR-level instrumentation like Rong is pursuing the size may be
> reduced
> > sufficiently that we do not need the optimization proposed in this
> thread.
> > For example, Rong found >2x size reduction on Google's C++ benchmarks,
> which
> > I assume are representative of the extremely large Google binaries that
> are
> > causing the problems addressed by your proposal in this thread. The
> > measurements you mention for Clang in this thread provide similar size
> > reductions, so Rong's approach may be sufficient (especially because
> > functions with extremely large mangled names tend to be small inline
> > functions in header-only template libraries).
> >
> > Of the points you mention in "Large size of overhead can limit the
> usability
> > of PGO greatly", many of the issues are hard limits that prevent the use
> of
> > PGO. Do you have a lower bound on how much the size of the PGO data must
> be
> > reduced in order to overcome the hard limits?
> >
> > Obviously LLVM must be able to support the extremely large binaries in
> your
> > configuration (otherwise what use is LLVM as a compiler ;) My questions
> are
> > primarily aimed at establishing which tradeoffs are acceptable for
> > supporting this (both for LLVM and for you guys).
> >
> > Btw, for us, the issue of PGO data size is not completely immaterial but
> is
> > very different from your use case. For us, the primary issue is the
> > additional memory use at run time, since PS4 games usually use "all"
> > available memory. We had a problem with UBSan where the large amount of
> > memory required for storing the UBSan diagnostic data at runtime required
> > the game programmers to manually change their memory map to make room.
> > +Filipe, do you remember how much memory UBSan was using that caused a
> > problem?
> >
> > -- Sean Silva
> >
> >>
> >>
> >> In the raw format, md5 sum key can be an embedded field in the
> >> prf_data variable instead of as different var referenced by prf_data.
> >>
> >> >
> >> > If this is not the case, you should show your current patch so that we
> >> > can
> >> > discuss things concretely.
> >>
> >> It is not. See above about the difference.
> >>
> >> >
> >> >>
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> It will be
> >> >> >> very messy to support multiple formats in instr-codegen and
> >> >> >> instr-runtime.  For compatibility concerns, the reader is taught
> to
> >> >> >> support previous format, but the changes there are isolated (also
> >> >> >> expected to be removed in the future).
> >> >> >>
> >> >> >> >
> >> >> >> > My primary concern is that if the function name are not kept at
> >> >> >> > all
> >> >> >> > stages,
> >> >> >> > then it becomes difficult to analyze the profile data in a
> >> >> >> > standalone
> >> >> >> > way.
> >> >> >> > Many times, I have used `llvm-profdata show -all-functions
> >> >> >> > foo.profdata`
> >> >> >> > on
> >> >> >> > the resulting profile data and then imported that data into
> >> >> >> > Mathematica
> >> >> >> > for
> >> >> >> > analysis.
> >> >> >>
> >> >> >> This is certainly a very valid use case.
> >> >> >>
> >> >> >> >My understanding of your proposal is that `llvm-profdata show
> >> >> >> > -all-functions foo.profdata` will not show the actual function
> >> >> >> > names
> >> >> >> > but
> >> >> >> > instead MD5 hashes,
> >> >> >>
> >> >> >> Yes.
> >> >> >>
> >> >> >> To support your use case, there are two solutions:
> >> >> >>
> >> >> >> 1) user can add -fcoverage-mapping option in the build
> >> >> >> 2) introduce a new option -fprofile-instr-names that force the
> >> >> >> emission of the name sections in the .o file. This is similar to
> 1),
> >> >> >> but no covmap section is needed.
> >> >> >>
> >> >> >> llvm-profdata tool  will be taught to read the name section and
> >> >> >> attach
> >> >> >> function names to the profile records.
> >> >> >
> >> >> >
> >> >> > Needing to pass the executable to llvm-profdata would cause
> >> >> > deployment
> >> >> > issues for my customers in practice.
> >> >>
> >> >> Why? The deployment needs to pass the profile data anyway right?
> >> >
> >> >
> >> > Yes, but not the executable.
> >> >
> >> > The PGO training run is likely being run by a gameplay tester
> >> > (non-programmer). In general the binary will not be lying around as a
> >> > loose
> >> > file anywhere, it will be part of a full package of the binary+assets
> >> > (think
> >> > like what will end up on a bluray disc). A game's binary *completely
> >> > useless* without the assets, so except locally on a programmer's
> machine
> >> > while they iterate/debug, there is no reason for a binary to ever
> exist
> >> > as a
> >> > standalone file.
> >> >
> >> > I'm not saying that needing the binary is insurmountable in any
> >> > particular
> >> > scenario. Just that it will cause a strict increase in the number of
> >> > issues
> >> > to deploying PGO.
> >>
> >>  Your concern is acknowledged.
> >>
> >> >
> >> > These are much bigger "compatibility concerns" for me than for newer
> >> > toolchains to accept the old format. For a change in format I can
> easily
> >> > tell my users to replace an exe with a newer one and that is all they
> >> > need
> >> > to do and it takes 10 seconds, guaranteed. A workflow change is
> >> > potentially
> >> > a massive disruption and guaranteed to take more than 10 seconds to
> fix
> >> > (perhaps hours or days).
> >>
> >> ok.
> >>
> >> >
> >> >
> >> >>
> >> >>   This
> >> >> is no different from llvm-cov usage model.
> >> >
> >> >
> >> > In practice, getting the performance of PGO is a higher priority for
> my
> >> > users, so we should not assume that llvm-cov is being used.
> >>
> >> Glad to hear that :)
> >>
> >> thanks,
> >>
> >> David
> >>
> >> >
> >> > -- Sean Silva
> >> >
> >> >>
> >> >>
> >> >> David
> >> >>
> >> >>
> >> >>
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> Note that with 1) or 2), the user can still benefit from the
> reduced
> >> >> >> profile size.
> >> >> >
> >> >> >
> >> >> > Let me reiterate that the size of the profile is not a problem I
> have
> >> >> > observed in practice (nor have I heard of this being a problem in
> >> >> > practice
> >> >> > until this thread). Therefore I'm skeptical of any changes to our
> >> >> > default
> >> >> > behavior or any new requirements that are not opt-in.
> >> >> >
> >> >> > -- Sean Silva
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> thanks,
> >> >> >>
> >> >> >> David
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> >which will make it more difficult for me to do this kind
> >> >> >> > of analysis (would require using nm on the original binary,
> >> >> >> > hashing
> >> >> >> > everything, etc.).
> >> >> >> >
> >> >> >> > btw, feel free to attach the patch even if it in a rough state.
> It
> >> >> >> > can
> >> >> >> > still
> >> >> >> > help to clarify the proposal and be a good talking point.
> >> >> >> > Fine-grained
> >> >> >> > patch
> >> >> >> > review for caring about the rough parts will happen on
> >> >> >> > llvm-commits;
> >> >> >> > the
> >> >> >> > rough parts will not distract the discussion here on llvm-dev.
> >> >> >> >
> >> >> >> > -- Sean Silva
> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> thanks,
> >> >> >> >>
> >> >> >> >> David
> >> >> >> >> _______________________________________________
> >> >> >> >> LLVM Developers mailing list
> >> >> >> >> llvm-dev at lists.llvm.org
> >> >> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150908/dce1321a/attachment.html>