[PATCH] D12715: Reduce PGO Instrumentation binary and profile data size (Patch-1)

Thu Sep 10 16:08:39 PDT 2015

On Wed, Sep 9, 2015 at 4:29 PM, Xinliang David Li <davidxl at google.com>
wrote:

> On Wed, Sep 9, 2015 at 1:16 AM, Sean Silva <chisophugis at gmail.com> wrote:
> >
> >
> > On Tue, Sep 8, 2015 at 10:34 PM, Xinliang David Li <davidxl at google.com>
> > wrote:
> >>
> >> On Tue, Sep 8, 2015 at 10:00 PM, Sean Silva <chisophugis at gmail.com>
> wrote:
> >> > silvas added a comment.
> >> >
> >> > Fundamentally, this approach is not a long-term solution to reducing
> PGO
> >> > data size. The correct long-term solution for reducing PGO data size
> >> > requires passing the binary to llvm-profdata.
> >>
> >> This basically goes back to my original proposed solution.  Whether
> >> this solution is longer term or not depends on whether it is
> >> sufficiently effective in reducing size -- the data we have now
> >> confirm that it is indeed effective.
> >
> >
> > If we have the binary, the size of the names should not matter because
> they
> > should not be present in the runtime memory image or in the raw profile
> > data.
>
>
> If we can tolerate the inconvenience of passing binary to
> llvm-profdata,

Quick aside: for compatibility reasons (and I think for general convenience
reasons), this would need to be opt-in (we can consider breaking
compatibility, but for a user-visible frontend feature this is likely a
long discussion and it is easier to just provide compatibility). But
otherwise, I think requiring the binary is clearly the right approach.

> I think there is a very simple and nice way of doing
> this.
>
> Currently LLVM emits symbols like __llvm_profile_name_foo with
> contents 'foo', this is basically a waste -- the string of 'foo' is
> actually a substring of the symbol name itself.
>
> In other words, we can choose to continue emit __llvm_profile_name_xxx
> symbols as before, but with zero size.  llvm-profdata/llvm-cov tool
> can read the symtab and reconstruct the md5hash values for them.
>
> With this change,
>
> 1) no new option is needed
> 2) binary size, raw/indexed profile size will be minimized
> consistently (even when coverage mapping is on).
>

Could you explain how this affects the 4 different sizes I lay out below?
It's not clear how saving a string by pointing into the symbol name affects
2. or 3. below.

I think we should make clear the different "sizes", which may matter
differently for different use cases:
1. memory overhead in the program image. Ideal case: just the raw counter
values (+ relevant text size increase)
2. size overhead in the raw profile. Ideal case: just the raw counter
values.
3. size overhead in the indexed profile. No clear ideal case.
4. raw binary size of the program's ELF file. No clear ideal case.
(did I miss any?)

For me, only 1. is of primary importance. 2. and 3. are not relevant for
programs of the size I have (~ size of Clang) and my users' setups (an
entire host machine is available to handle the data, and only a single
profile is written at a given time). 4. is irrelevant since large binaries
must already be handled anyway for debug info.

For an embedded use case 1. and 2. are important but 3,4 are irrelevant
since that happens on the host (assume host has much more resources than
target).

It sounds like for your use case, 2,3,4 are the primary concerns. Is this
correct?

-- Sean Silva

>
> >Do you know how much improvement there would be over the current patch
> > if we could completely eliminate the names from the memory image and from
> > the raw profile?
>
> There are 16 byte overhead per function for emitting the name. For
> clang, the overhead of the additional rodata is about 6% of the total
> text size.
>
> Note that if we don't use strings (name or md5string) but just 64bit
> md5hash as the key (as done by my original patch), the perf-function
> overhead for the keys  is zero -- because the Name reference field in
> llvm_profile_data_* structure can be reused.
>
> > I.e., how large are the just the counters by themselves
> > compared to the current raw profile data? Quickly looking at some source
> > code (but not testing anything) suggests that there is one `struct
> > __llvm_profile_data` per counter. Depending on the pointer size, this
> struct
> > is 3-4x as large as the counter itself.
>
> _llvm_profile_data_*** is  one per function. The size of the
> per-function data is 32 bytes (64bit arch), so __llvm_prf_data section
> is larger than __llvm_prf_name.  So the largest is the counter section
> when this patch is applied. Without this patch, the largest is
> __llvm_prf_name (way larger).
>
>
> >
> > One other thing that bothers me is that (while I don't use llvm-cov) this
> > size reduction in this patch is incompatible with llvm-cov.
>
> yes -- but they can be made compatible if we are ok with more drastic
> change.
>
> >
> > My thinking is that if we want to provide a feature for reducing the
> > overhead in the runtime memory image or in the profraw, it should:
> > - be fully general (e.g. cover the llvm-cov use case)
> > - have nearly optimal size (3-4x overhead is not acceptable)
> > - be completely transparent to the user, except for the fact that the
> binary
> > must be passed to llvm-profdata (but not to clang; llvm-profdata should
> > extract all necessary information from the binary)
> >
> > I think this is achievable and not hugely more complicated than the
> current
> > patch.
> >
>
> agreed (other than the 3-4x overhead statement above).
>
> > When it comes to adding a feature to clang that we must maintain
> forever, it
> > makes more sense to have it be something like
> > `-fprofile-instr-reduced-image-size` which can have a clear contract and
> the
> > implementation of which we can continually improve (or make optimal). For
> > this, the transparency is important (e.g. output of llvm-profdata should
> be
> > the same). The current patch is highly tied to the precise
> implementation of
> > what we are doing to reduce the size, and does not leave margin for
> > improvement or for changing the underlying approach to cover new use
> cases
> > (see below).
> >
> > What do you think?
>
> See above, we can even do without the option and make every use case happy.
>
> >
> > Out of curiosity, could you post the patch for your original approach
> that
> > required the binary for llvm-profdata? Could it potentially work as an
> > initial approach to "`-fprofile-instr-reduced-image-size`" that we could
> > incrementally build on?
> >
>
> I have not added the llvm-profdata debug dump support yet.  If we
> reach an agreement that is a good thing to do (which I agree), I will
> add that submit that patch instead.
>
> >
> > The idea is that if a user says that there is too much overhead in the
> > runtime memory image or in the profraw, then we can tell them to use a
> > particular option to improve the situation, with the only caveat being
> that
> > they must pass the binary to llvm-profdata;
>
> I don't have supporting data, but if majority of users do not need
> debug symbol info for llvm-profdata (that often), would it be better
> to require passing binary if per function dump is needed? Post
> proposing script can also be easily developed for that. It also makes
> the use model consistent with llvm-cov.
>
> >besides that, there are no
> > strings attached, and no hidden issues for the user (e.g. in an embedded
> > project, the code may be mostly C and the function names may typically be
> > shorter than the hash, so this patch could make things worse). The
> current
> > patch assumes that function name size is the primary overhead which makes
> > PGO not usable; do we don't have a reason to believe that to be the case
> in
> > general? I am interested to hear from the other commenter on the thread
> who
> > mentioned issues with pgo size; still our sample set is very small.
>
> For C programs, using name hash string instead of name itself may
> actually increase overhead -- it is probably a good reason that the
> zero overhead approach is preferred.  Here are some data from SPEC
> (average name size):
>
> GCC (C): 20
> SJENG (C): 12
> GZIP (C): 11
> BZIP2 (C): 15
> MCF (C) : 15
> LIBQUANTUM (C): 17
> HMMER (C) : 13
> GOBMK (C): 20
>
> Xalan(C++) : 61
> DealII (C++): 58
> Clang (C++): 82
>
> Google benchmarks
> >
> > It seems premature to think that function name size is the greatest
> overhead
> > in general since there are only O(#functions) of them, while there are
> > O(#counters) unnecessary instances of `struct __llvm_profile_data` (and
> that
> > struct is reasonably large compared to typical identifiers, even mangled)
>
> llvm_profile_data is also O(#functions).
>
> . A
> > quick exploration in Mathematica suggests that for Clang, the bulk of the
> > size overhead of function names is concentrated in the range 50-150
> > characters (see http://i.imgur.com/en2mpBF.png; area under the curve of
> the
> > first plot is total size of identifiers, with horizontal axis being bins
> by
> > identifier size; second plot is left to right accumulated values of the
> > first plot). Since `struct __llvm_profile_data` is ~25 bytes, this means
> > that we need only assume that functions typically have 2-6 counters for
> the
> > overhead of the structs to be comparable to the mangled names.
>
> llvm_prof_data is 32byte and it is per-function.
>
> ave negligible effect
> > on the size as a whole.
> >
> >>
> >>
> >> >The approach of using name hashes is also not an incremental step
> towards
> >> > the long-term solution. So thinking about it a bit more, I'm on the
> fence
> >> > about whether a quick solution like this one even makes sense to do.
> >>
> >> I am on the fence too, but leaning towards having this one as the
> >> longer term solution -- it is cleaner and less disruptive. It does not
> >> require bumping up the version numbers. (On the other hand, my
> >> original patch does require version change)
> >>
> >> > I am very apprehensive of adding complexity to make this approach
> work,
> >> > in particular threading information through many layers, as this is
> likely
> >> > to get in the way of future work towards the long-term solution.
> >>
> >> I don't see the complexity actually. Other than some refactoring, the
> >> code here is very non-intrusive. I don't think it will get in the way
> >> of 'longer term' solution if it ever became a need to do.
> >
> >
> > My complexity-detector must be tuned differently than yours.
> >
> > For example, -instr-prof-with-namehash introduces a weird coupling
> between
> > the frontend and the middle-end. It seems to just be used for setting
> > IncludeNamesInProfile, which seems to be completely orthogonal to
> hashing?
>
> This option is only useful for testing purpose (with opt).
>
> Anyway, I am fine with going forward with the most aggressive size
> reduction approach that every one can benefit consistently.
>
> David
>
>
>
> >
> >>
> >>
> >> >
> >> > My takeaways from the RFC thread (maybe I am misunderstanding) is that
> >> > we are still using the same format just with one string vs. another.
> >>
> >> yes that is true -- that is why version number is not bumped up.
> >>
> >> > We shouldn't need to rev the version and there shouldn't need to be
> any
> >> > changes in compiler-rt or llvm.
> >>
> >> Note that version is not changed. It is just the version field is now
> >> partitioned to allow more information to be passed around. This is a
> >> very useful feature to have.
> >>
> >>
> >> >Clang can easily diagnose (as an isolated piece of logic) if the user
> did
> >> > the pgo training run with md5 mode enabled but the md5 mode flag was
> not
> >> > passed or vice versa (it is easy to tell if input function names are
> md5
> >> > sums or not as a diagnostic heuristic).
> >>
> >> If we want to enforce matching options in profile-gen and profile-use,
> >> the mismatched option and profile data should result in hard errors --
> >> current implementation allows it to be caught as a hard error (with
> >> encoding in the profile data).
> >
> >
> > I don't see in the patch where this error is issued (nor where that
> error is
> > tested).
> >
> >>
> >>
> >> >It is also inconsistent for llvm-profdata to be able to be passed
> >> > `-function=foo` for an md5 profile; the user should have to pass
> (truncated)
> >> > md5 of the function name.
> >>
> >> This is usability and convenience -- not inconsistency.
> >
> >
> > Fundamentally, trying to paper over the fact that what is in the file is
> md5
> > sums is wrong. There are many ways that it sneaks through with this
> > approach, and I don't see a fundamental reason to try to add a special
> case
> > for this one option. If this is needed for convenience, a separate patch
> can
> > add an option `-md5-function=` which automatically takes the md5 hash of
> the
> > provided argument to save the trouble.
> >
> > -- Sean Silva
> >
> >>
> >>
> >> >
> >> >
> >> > ================
> >> > Comment at:
> >> > test/Instrumentation/InstrProfiling/profiling_md5hash_key.ll:5-6
> >> > @@ +4,4 @@
> >> > +
> >> > + at __llvm_profile_name_foo = hidden constant [16 x i8]
> >> > c"5cf8c24cdb18bdac"
> >> > +; CHECK: @__llvm_profile_name_foo = hidden constant [16 x i8]
> >> > c"5cf8c24cdb18bdac", section "__DATA,__llvm_prf_names", align 1
> >> > + at __llvm_profile_name_bar = hidden constant [17 x i8]
> >> > c"e413754a191db537\00"
> >> > ----------------
> >> > This is backward from a standard md5 sum in hex. We should maintain
> the
> >> > invariant that the resulting symbol name is a substring of a standard
> md5
> >> > sum. Otherwise it is very difficult to explain how to get the hash
> for a
> >> > function.
> >> >
> >> > E.g. we should be able to say "the symbol name is effectively replaced
> >> > by the first half of its md5 sum in hex":
> >>
> >>
> >> Good catch. That can be fixed.
> >>
> >> thanks,
> >>
> >> David
> >>
> >> >
> >> > ```
> >> > Sean:~ % echo -n 'foo' | md5
> >> > acbd18db4cc2f85cedef654fccc4a4d8
> >> > Sean:~ % echo -n 'foo' | md5 | head -c 16
> >> > acbd18db4cc2f85c
> >> > ```
> >> >
> >> > The symbol name string should be `acbd18db4cc2f85c`.
> >> >
> >> >
> >> > http://reviews.llvm.org/D12715
> >> >
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150910/d30d931b/attachment.html>