[llvm-dev] RFC: A binary serialization format for MemProf
Teresa Johnson via llvm-dev
llvm-dev at lists.llvm.org
Sun Oct 10 08:34:36 PDT 2021
On Thu, Oct 7, 2021 at 12:06 PM Snehasish Kumar <snehasishk at google.com>
> Hi Wenlei,
> Thanks for taking a look! Added responses inline.
> On Thu, Oct 7, 2021 at 9:29 AM Xinliang David Li <davidxl at google.com>
> > Just a quick note -- IRPGO profile is not deterministic with
> multi-threaded programs due to contentions (there is of course atomic
> update mode, but it can be slow). Asynchronous dumping is another reason
> that the profile is not guaranteed to be repeatable.
> > David
> > On Thu, Oct 7, 2021 at 9:18 AM Wenlei He <wenlei at fb.com> wrote:
> >> Thanks for sharing the progress and details on the binary format.
> Overall this looks like a clean design that fits current PGO profile format
> with extensions.
> >> Some high level comments:
> Our focus is to have a single combined IR instrumentation and PGHO
> instrumentation phase to keep operational costs low. For CSPGO today,
> this would be the second IR instrumentation phase. We also intend to
> support a separate PGHO instrumentation phase.
> >> Does memprof/PGHO work together with today's IRPGO today, i.e. can we
> have one instrumented build to collect both PGO and PGHO profile, or we
> will need separate PGO instrumentation builds for each, in which case CSPGO
> + PGHO would need three iterations of training and build, which would be
> significant operational cost..
> Yes, the context tracker is quite relevant to the IR matching need.
> Teresa will share the detailed design soon and we can evaluate the
> benefit of reusing the existing logic for CSSPGO. I think this is
> orthogonal to this RFC (serialization format) so we can defer to the
> next one for a detailed discussion.
> >> I think some of the problems memprof faced when dealing with storing
> calling context and mapping context to IR is very similar to CSSPGO. I'm
> wondering if it makes sense to promote some existing infrastructure to be
> more general beyond just serving CSSPGO. One example is the IR mapping you
> mentioned (quoted below). In CSSPGO, we have the exact same need, and it's
> handled by `SampleContextTracker` which queries a context trie using an
> >> > Because the MIB corresponding to the A->B context is
> associated with function B in the profile, we do not find it by looking at
> function A’s profile when we see function A’s malloc call during matching.
> To address this we need to keep a correspondence from debug locations to
> the associated profile information.
> We intend to retain as much of the calling context information until
> the IR matching. This is where we can leverage common solutions. We
> would be happy to generalize where appropriate and intend to tackle
> this topic in detail in the next RFC.
In fact, we need to retain the calling context beyond matching, so that we
can perform the context disambiguation transformations that Snehasish
described in an earlier email. The next RFC will focus on the IR metadata
needed to carry the PGHO data as well as the context.
>From reading through the CSSPGO RFC it sounds like the context info is
never annotated onto the IR, but rather just used during the sample PGO
loading/inlining step to help generate more accurate IR prof md counts - is
that correct? In that case perhaps some of the infrastructure can be shared
for performing the matching for already inlined contexts, which I think is
what the ContextTrieNode structures are used for from what I can tell
perusing the code. It is a little unclear to me - how is the profile for a
partially inlined context found in the data structure - i.e. how do you
look up the ContextTrieNode for a given out of line function?
>> The serialization of calling context, pruning of calling context are
> also example of shared problems, and we've put in some effort to have
> effective solutions (e.g. offline preinliner for most effective pruning,
> which I think could be adapted to help keep most important allocation
> context). Perhaps some of the frameworks can be merged, so LLVM has general
> context aware PGO support that can be leverage by different kinds of PGO
> (IRPGO, PGHO, CSSPGO). If you think this is worth pursuing, we’d be happy
> to help too.
> >> More on the details:
> As David mentioned, keeping the PGHO profile deterministic is a
> non-goal since IR PGO profile is non-deterministic.
> >> I saw that MemInfoBlock contains alloc/dealloc cpuid, does that make
> memprof profile non-deterministic in the sense that running memprof twice
> on the exact program and input would yield bit-wise different memory
> profile? I think IR PGO profile is deterministic?
> We need to use the file path instead of the function to be able to
> distinguish COMDAT functions. The line_offset based matching is more
> resilient if the entire function is moved, I think it's a good idea
> and we can incorporate it into the IR matching phase.
> >> Why do we use `file:line:discriminator` instead of
> `func:line_offset:discriminator `? The later would be more resilient to
> source change. If function name string is too long, we could perhaps
> leverage the MD5 encoding used by sample PGO?
> While we only intend to support Memprof optimizations for the main
> binary, retaining all executable mappings allow future analysis tools
> to symbolize shared library code.
> >> Is the design of mmap section (quoted below) trying to support memprof
> for multiple binaries in the same process at the same time, or mainly for
> handling multiple non-consecutive executable segments for a single binary?
> >> > The process memory mappings for the executable segment
> during profiling are stored in this section. This allows symbolization
> during post processing for binaries which are built with position
> independent code. For now all read only, executable mappings are recorded,
> however in the future, mappings for heap data can also potentially be
> Yes, we do intend to support Memprof profile section merging via
> `llvm-profdata merge`. The schema overhead per function is low now, so
> we opted for function granularity. We can revisit if the overheads are
> high or if the IR metadata scheme intends to keep it at module
> granularity (in which case we don't need the extra fidelity).
> >> Do we need each function record to have its own schema, do we expect
> different functions to use different versions/schemas? The is very
> flexible, but wondering what’s the use case. If the schema is for
> compatibility across versions, perhaps a file level scheme would be enough?
> >> > The InstrProfRecord for each function will hold the
> schema and an array of Memprof info blocks, one for each unique allocation
> >> Thanks,
> >> Wenlei
Teresa Johnson | Software Engineer | tejohnson at google.com |
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev