[PATCH] D12715: Reduce PGO Instrumentation binary and profile data size (Patch-1)
Sean Silva via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 9 01:16:57 PDT 2015
On Tue, Sep 8, 2015 at 10:34 PM, Xinliang David Li <davidxl at google.com>
> On Tue, Sep 8, 2015 at 10:00 PM, Sean Silva <chisophugis at gmail.com> wrote:
> > silvas added a comment.
> > Fundamentally, this approach is not a long-term solution to reducing PGO
> data size. The correct long-term solution for reducing PGO data size
> requires passing the binary to llvm-profdata.
> This basically goes back to my original proposed solution. Whether
> this solution is longer term or not depends on whether it is
> sufficiently effective in reducing size -- the data we have now
> confirm that it is indeed effective.
If we have the binary, the size of the names should not matter because they
should not be present in the runtime memory image or in the raw profile
data. Do you know how much improvement there would be over the current
patch if we could completely eliminate the names from the memory image and
from the raw profile? I.e., how large are the just the counters by
themselves compared to the current raw profile data? Quickly looking at
some source code (but not testing anything) suggests that there is one
`struct __llvm_profile_data` per counter. Depending on the pointer size,
this struct is 3-4x as large as the counter itself.
One other thing that bothers me is that (while I don't use llvm-cov) this
size reduction in this patch is incompatible with llvm-cov.
My thinking is that if we want to provide a feature for reducing the
overhead in the runtime memory image or in the profraw, it should:
- be fully general (e.g. cover the llvm-cov use case)
- have nearly optimal size (3-4x overhead is not acceptable)
- be completely transparent to the user, except for the fact that the
binary must be passed to llvm-profdata (but not to clang; llvm-profdata
should extract all necessary information from the binary)
I think this is achievable and not hugely more complicated than the current
When it comes to adding a feature to clang that we must maintain forever,
it makes more sense to have it be something like
`-fprofile-instr-reduced-image-size` which can have a clear contract and
the implementation of which we can continually improve (or make optimal).
For this, the transparency is important (e.g. output of llvm-profdata
should be the same). The current patch is highly tied to the precise
implementation of what we are doing to reduce the size, and does not leave
margin for improvement or for changing the underlying approach to cover new
use cases (see below).
What do you think?
Out of curiosity, could you post the patch for your original approach that
required the binary for llvm-profdata? Could it potentially work as an
initial approach to "`-fprofile-instr-reduced-image-size`" that we could
incrementally build on?
The idea is that if a user says that there is too much overhead in the
runtime memory image or in the profraw, then we can tell them to use a
particular option to improve the situation, with the only caveat being that
they must pass the binary to llvm-profdata; besides that, there are no
strings attached, and no hidden issues for the user (e.g. in an embedded
project, the code may be mostly C and the function names may typically be
shorter than the hash, so this patch could make things worse). The current
patch assumes that function name size is the primary overhead which makes
PGO not usable; do we don't have a reason to believe that to be the case in
general? I am interested to hear from the other commenter on the thread who
mentioned issues with pgo size; still our sample set is very small.
It seems premature to think that function name size is the greatest
overhead in general since there are only O(#functions) of them, while there
are O(#counters) unnecessary instances of `struct __llvm_profile_data` (and
that struct is reasonably large compared to typical identifiers, even
mangled). A quick exploration in Mathematica suggests that for Clang, the
bulk of the size overhead of function names is concentrated in the range
50-150 characters (see http://i.imgur.com/en2mpBF.png; area under the curve
of the first plot is total size of identifiers, with horizontal axis being
bins by identifier size; second plot is left to right accumulated values of
the first plot). Since `struct __llvm_profile_data` is ~25 bytes, this
means that we need only assume that functions typically have 2-6 counters
for the overhead of the structs to be comparable to the mangled names. If
it is not the case already, then once we do early inlining as Rong is
proposing, I think it will be the case that most instrumented functions
have at least 2-6 counters. Therefore, the size of the names may not be as
significant a factor. For reference, here is the same analysis performed on
one of our games http://i.imgur.com/b3nNxHB.png (similar size C++ program
to Clang); the results are qualitatively similar, but the range of interest
is lower, more like 25-100, so here the comparable size overhead is only
1-4 counters per function. It would be interesting to see the data for the
binaries at google. Looking at symbols in the text section is only an
approximation. Using the actual function names in the the profile collected
at runtime for the game (using `~/pg/release/bin/llvm-profdata show
-all-functions data.profraw | grep ':$' |` with some string cleanup), the
result is http://i.imgur.com/g1AGiEX.png, which is both qualitatively and
numerically similar with identifiers between 25-100 bytes occupying the
bulk of the space. There are some functions with extraordinarily long names
which change the scale on the horizontal axis (but were fully inlined and
therefore do not appear in the output of llvm-nm), but since area under the
curve is total size in bytes, we can see that these outliers have
negligible effect on the size as a whole.
> >The approach of using name hashes is also not an incremental step towards
> the long-term solution. So thinking about it a bit more, I'm on the fence
> about whether a quick solution like this one even makes sense to do.
> I am on the fence too, but leaning towards having this one as the
> longer term solution -- it is cleaner and less disruptive. It does not
> require bumping up the version numbers. (On the other hand, my
> original patch does require version change)
> > I am very apprehensive of adding complexity to make this approach work,
> in particular threading information through many layers, as this is likely
> to get in the way of future work towards the long-term solution.
> I don't see the complexity actually. Other than some refactoring, the
> code here is very non-intrusive. I don't think it will get in the way
> of 'longer term' solution if it ever became a need to do.
My complexity-detector must be tuned differently than yours.
For example, -instr-prof-with-namehash introduces a weird coupling between
the frontend and the middle-end. It seems to just be used for setting
IncludeNamesInProfile, which seems to be completely orthogonal to hashing?
> > My takeaways from the RFC thread (maybe I am misunderstanding) is that
> we are still using the same format just with one string vs. another.
> yes that is true -- that is why version number is not bumped up.
> > We shouldn't need to rev the version and there shouldn't need to be any
> changes in compiler-rt or llvm.
> Note that version is not changed. It is just the version field is now
> partitioned to allow more information to be passed around. This is a
> very useful feature to have.
> >Clang can easily diagnose (as an isolated piece of logic) if the user did
> the pgo training run with md5 mode enabled but the md5 mode flag was not
> passed or vice versa (it is easy to tell if input function names are md5
> sums or not as a diagnostic heuristic).
> If we want to enforce matching options in profile-gen and profile-use,
> the mismatched option and profile data should result in hard errors --
> current implementation allows it to be caught as a hard error (with
> encoding in the profile data).
I don't see in the patch where this error is issued (nor where that error
> >It is also inconsistent for llvm-profdata to be able to be passed
> `-function=foo` for an md5 profile; the user should have to pass
> (truncated) md5 of the function name.
> This is usability and convenience -- not inconsistency.
Fundamentally, trying to paper over the fact that what is in the file is
md5 sums is wrong. There are many ways that it sneaks through with this
approach, and I don't see a fundamental reason to try to add a special case
for this one option. If this is needed for convenience, a separate patch
can add an option `-md5-function=` which automatically takes the md5 hash
of the provided argument to save the trouble.
-- Sean Silva
> > ================
> > Comment at:
> > @@ +4,4 @@
> > +
> > + at __llvm_profile_name_foo = hidden constant [16 x i8] c"5cf8c24cdb18bdac"
> > +; CHECK: @__llvm_profile_name_foo = hidden constant [16 x i8]
> c"5cf8c24cdb18bdac", section "__DATA,__llvm_prf_names", align 1
> > + at __llvm_profile_name_bar = hidden constant [17 x i8]
> > ----------------
> > This is backward from a standard md5 sum in hex. We should maintain the
> invariant that the resulting symbol name is a substring of a standard md5
> sum. Otherwise it is very difficult to explain how to get the hash for a
> > E.g. we should be able to say "the symbol name is effectively replaced
> by the first half of its md5 sum in hex":
> Good catch. That can be fixed.
> > ```
> > Sean:~ % echo -n 'foo' | md5
> > acbd18db4cc2f85cedef654fccc4a4d8
> > Sean:~ % echo -n 'foo' | md5 | head -c 16
> > acbd18db4cc2f85c
> > ```
> > The symbol name string should be `acbd18db4cc2f85c`.
> > http://reviews.llvm.org/D12715
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2015-09-09 at 12.36.09 AM.png
Size: 76134 bytes
Desc: not available
More information about the llvm-commits