[llvm-dev] RFC: IR metadata format for MemProf

Teresa Johnson via llvm-dev llvm-dev at lists.llvm.org
Thu Nov 4 20:06:36 PDT 2021


On Thu, Nov 4, 2021 at 5:59 PM Hongtao Yu <hoy at fb.com> wrote:

> Hello Teresa, Snehasish and David,
>
>
>
> Thanks for the RFC and follow-up clarification. I have a few questions
> regarding the use of the metadata and how they are manipulated.
>

Hi Hongtao, responses below.


>
>
>    1. While the proposed !callsite metadata and its compression through
>    chaining sounds efficient, have you considered using the existing !dbg
>    metadata as an alternative?
>
> We could potentially do so for the !callsite metadata, but not for the
stack ids in the !memprof metadata's stack context (which could be in
another module), and we need to be able to correlate these (i.e. identify
the callsites corresponding to the stack context list on the !memprof
metadata), using ThinLTO if they are across module boundaries. This idea
and its downside is mentioned briefly near the beginning of the Metadata
Format section:

> Another option would be to represent the stack entries using existing
debug metadata. However, for stack entries in another module we would need
to synthesize additional debug location metadata in the module containing
MIB profile data that references that stack context entry.

It is possible that if we used stack ids that were generated from the MD5
hash of the debug location, we could actually generate this id for
callsites on the fly (from its debug metadata). However, having the
!callsite metadata shows explicitly which callsites we need to consider for
the MemProf optimizations (and also which to summarize in the ThinLTO
summary in order to perform cross-module context disambiguation).


>    1. Do !callisite metadata need to be maintained on MIR?
>
> We don't plan to as the targeted transformations would be in LLVM IR (e.g.
context disambiguation via cloning or the like, modification of allocation
calls).


>
>    1. For the !callsite metadata, in order to make sure the metadata not
>    accidentally dropped, how much extra care is needed in passes such as tail
>    call optimization?
>
> I believe tail call elimination happens in code gen so after we would be
done with this metadata. Tail recursion elimination is earlier but we
probably can and will want to handle direct recursion in the contexts
specially anyway.


>    1. When two callsites are merged by some CFG optimizations, how are
>    their !callsite metadata handled?
>
> Initially we can prevent this type of merging until after context
disambiguation is complete, i.e. while they have !callsite metadata (in
fact, we might need to keep the calls separate anyway if their respective
contexts result in different behavior at leaf allocation sites). If this
becomes too limiting we could probably extend the !callsite metadata to
include "aliased" callsite ids from merged callsites.


>    1. How do we identify a call path in the presence of indirect call
>    sites?
>
> If the indirect callsites aren't already speculatively devirtualized via
the value profile info, we can do so in the process of cloning for context
disambiguation, because we know the caller from the full stack context.


>    1. I was wondering how the !memprof metadata is consumed. Will they be
>    passed into the runtime allocator in some way?
>
> We plan to use it to set hints on allocations that are consumed by the
allocator. See for example the recent patch to TCMalloc that adds hot/cold
hints:

[3] Implement interfaces for providing access frequency hints to TCMalloc (
https://github.com/google/tcmalloc/commit/ab87cf382dc56784f783f3aaa43d6d0465d5f385
)

Thanks!
Teresa


>
>
> Thanks!
>
>
>
> Hongtao
>
>
>
>
>
> *From: *Teresa Johnson <tejohnson at google.com>
> *Date: *Monday, November 1, 2021 at 12:06 PM
> *To: *llvm-dev <llvm-dev at lists.llvm.org>
> *Cc: *Snehasish Kumar <snehasishk at google.com>, David Li <
> davidxl at google.com>, Vedant Kumar <vsk at apple.com>, Wenlei He <
> wenlei at fb.com>, Andrey Bokhanko <andreybokhanko at gmail.com>, Hongtao Yu <
> hoy at fb.com>
> *Subject: *Re: RFC: IR metadata format for MemProf
>
> One change below along with another clarification. Also, you can view the
> original RFC here with better formatting for the examples:
> https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ
>
>
>
> On Wed, Oct 27, 2021 at 1:01 PM Teresa Johnson <tejohnson at google.com>
> wrote:
>
> This RFC describes the IR metadata format for the sanitizer-based heap
> profiler (MemProf) data when it is fed back into a subsequent compile for
> profile guided heap optimization. Additional background can be found in the
> following RFCs:
>
> · RFC: Sanitizer-based Heap Profiler [1]
>
> · RFC: A binary serialization format for MemProf [2]
>
>
>
> We look forward to your feedback.
>
> Authors: tejohnson at google.com, snehasishk at google.com, davidxl at google.com
> Requirements
>
> The profile format will need to be reasonably compact within IR, but also
> facilitate efficiently identifying callsites within the stack contexts for
> use in interprocedural (and cross module) optimization for context
> disambiguation. These optimizations will require transformations to
> disambiguate the context at a particular allocation, which will require
> call graph based analysis and optimization.
> Input Profile Format
>
> The profile will be in an index binary format, which is detailed in [2].
> In the index format, the profiles for each allocation site will be located
> with the profile data for the function containing the allocation call site.
> Each allocation may have multiple profile entries (known as a MIB, or
> Memory Info Block) uniquely identified by stack context. The entries in the
> stack context will be symbolized and include file, line and discriminator.
> Metadata Format
>
>
>
> Similar to branch weights and value profile data from regular PGO, the
> PGHO profile will be annotated as metadata onto relevant instructions. A
> natural instruction to attach the profile metadata is on the allocation
> callsite, so these allocation calls can be identified and handled by the
> subsequent heap optimization pass. As an example, this profile data can be
> used to enable automatic application of hot and cold hints to allocations,
> for use by a runtime allocator such as tcmalloc, where support for such
> allocation hints was recently added [3].
>
>
>
> However, in order to identify ancestor callsites within an allocation’s
> call stack context that require modification for disambiguating the context
> at the allocation site, e.g. via cloning, we will also want to attach
> metadata to these callsites. This is particularly important for contexts
> that cross module boundaries, so that we can identify them in ThinLTO
> summaries for cross module coordination of context transformations.
>
>
>
> To identify and correlate entries in a context, we will use a unique
> identifier for each stack entry. Specifically, we will use the 64-bit value
> from the stack entry table in the indexed profile format which is formed
> from the index into the file path table along with the line and
> discriminator. Another option would be to represent the stack entries using
> existing debug metadata. However, for stack entries in another module we
> would need to synthesize additional debug location metadata in the module
> containing MIB profile data that references that stack context entry.
>
> Assume the following working example. For simplicity, all are shown as
> being in the same module, however, these function definitions could
> theoretically be located in multiple different modules.
>
>
>
> x.cc
>
>   1 main() {
>
>   2    foo();  // stack entry id: 123
>
>   3 }
>
>   4
>
>   5 foo() {
>
>   6    baz();  // stack entry id: 234
>
>   7 }
>
>   8
>
>   9 baz() {
>
>  10    if (x)
>
>  11       bar();  // stack entry id: 345
>
>  12    else
>
>  13       bar();  // stack entry id: 456
>
>  14 }
>
>  15
>
>  16 bar() {
>
>  17    malloc(4);  // stack entry id: 567
>
>  18 }
>
>
>
> The call to malloc has 2 possible calling contexts:
>
> 1.    main -> foo (x.cc:2) -> baz (x.cc:6) -> bar (x.cc:11) -> malloc
> (x.cc:17)
>
> 2.    main -> foo (x.cc:2) -> baz (x.cc:6) -> bar (x.cc:13) -> malloc
> (x.cc:17)
>
> where the stack entry id for each callsite, taken from the profile’s stack
> entry table contents, is shown in the code comments. The corresponding full
> contexts in terms of stack entry ids, listed from the leaf allocation
> callsite up to the root are:
>
> 1.    567, 345, 234, 123
>
> 2.    567, 456, 234, 123
>
>
>
> Assuming both contexts execute at runtime, the allocation will end up with
> 2 MIBs in the profile, one for each of the above contexts.
>
>
>
> To represent this in the IR, we propose 2 new metadata attachment types,
> as described below.
> Callsite metadata
>
> The !callsite metadata is used to associate a callsite with its
> corresponding references in MIB stack contexts. It contains the associated
> 64-bit stack entry table value for that callsite from the indexed profile,
> and is initially only on non-allocation callsites. As will be described
> later, after inlining it can contain multiple entry ids or be propagated
> onto allocation callsites.
>
>
>
> In the above example, for the call to foo(), which had stack entry id 123,
> the IR callsite would be decorated with a !callsite metadata containing
> that stack entry id:
>
>
>
>   tail call void @_Z3foov(), !dbg !12, !callsite !14
>
>
>
> !14 = !{i64 123}
>
>
>
> Note that this call may be in a different module initially than the
> referencing MIB metadata. In order to disambiguate the context across
> modules, some form of LTO would be required. ThinLTO summary support will
> be added to reflect the cross-module contexts and enable cross module
> optimization of the contexts.
>
>
>
> Also, while for MemProf the ids will be assigned uniquely using
> information from the MemProf profile, other types of context sensitive
> profiles could simply reuse the same id after matching with line table
> information, or at least leverage the same metadata attachment to assign
> their own unique ids if there is no MemProf profile.
>
>
>
> To clarify, the callsite metadata value can be any globally unique
> identifier. While in this proposal we simply describe using the indexed
> profile's associated 64-bit stack entry table value, an alternative could
> be to compute this from the MD5 hash of the debug information
> (file:line:discriminator). It doesn't affect the format and its usage
> described in this RFC.
>
> Memprof metadata
>
> The !memprof metadata describes the MIBs for the leaf allocation callsite
> it is attached to. If there are multiple stack contexts leading to that
> allocation, it will have a single !memprof metadata attachment, with a
> level of indirection used to list all related MIB, as shown in the later
> example.
>
>
>
> As with the indexed profile format, we need to be able to add or modify
> fields of the MIB entries while maintaining backwards compatibility with
> older bitcode. Therefore, we use a schema format with the MIB profile entry
> fields described by a “Memprof Schema” module level metadata, for example:
>
>
>
> !llvm.module.flags = !{!1}
>
>
>
> !1 = !{i32 1, !”Memprof Schema”,!”Stack”, !”AllocCount”, !”AveSize”,
> !”MinSize”, !”MaxSize”, !”AveAccessCount”, !”MinAccessCount”,
> !”MaxAccessCount”, !”AveLifetime”, !”MinLifetime”, !”MaxLifetime”,
> !”NumMigration”, !”NumLifetimeOverlaps” }
>
>
>
> The first (merge behavior) field is 1 (ModFlagBehavior::Error), meaning
> that it is an error to merge modules with different values, or in other
> words, merging modules compiled with different profiles generated with
> different versions of the indexed profile format.
>
>
>
> Using module flags for the schema doesn't work, because it only supports a
> single string tag and an integer flag value, not arbitrary contents.
> Instead, I have implemented this using a new named metadata:
>
>
>
> !memprof.schema = !{!0}
> !0 = !{!"Stack", !"AllocCount", !"AveSize", !"MinSize", !"MaxSize",
> !"AveAccessCount", !"MinAccessCount", !"MaxAccessCount", !"AveLifetime",
> !"MinLifetime", !"MaxLifetime", !"NumMigration", !"NumLifetimeOverlaps"}
>
>
>
> Named metadata must only hold metadata nodes as operands. Here we use a
> single operand to point to metadata that describes the schema.
>
>
>
> The advantage of using a single metadata operand in the new
> !memprof.schema metadata, vs for example a list of metadata operands each
> pointing to a single MDString schema field, is that it simplifies detection
> of different schemas when modules are merged for LTO. If the schemas are
> identical, the merged !memprof.schema metadata will continue to hold a
> single metadata node as the node holding the schema (!0 above) will be
> shared. If they are not identical, the merged module's !memprof.schema
> metadata will hold more than one metadata operand, one for each unique
> schema.
>
>
>
> An alternative, which is what I originally had in a prototype, was to
> include the MDString field tags in each !memprof metadata, for example:
>
>
>
> !274 = !{!"Stack", !273, !"AllocCount", i32 1, !"AveSize", i32 4,
> !"MinSize", i32 4, !"MaxSize", i32 4, !"AveAccessCount", i64 5,
> !"MinAccessCount", i64 1, !"MaxAccessCount", i64 10, !"AveLifetime", i32
> 10, !"MinLifetime", i32 10, !"MaxLifetime", i32 10, !"NumMigration", i32 0,
> !"NumLifetimeOverlaps", i32 0}
>
>
>
> This provides maximal flexibility of merging modules using different
> schemas, at the expense of additional overhead in operands (doubling the
> number of operands of !memprof metadata). But if we need to support merging
> modules with different schemas, we could alternatively just support
> unifying the named metadata schemas and fixing up the associated !memprof
> metadata during the merging process, rather than carrying this extra
> overhead.
>
>
>
>
>
> --
>
> Teresa Johnson |
>
>  Software Engineer |
>
>  tejohnson at google.com |
>
>
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211104/ae405557/attachment-0001.html>


More information about the llvm-dev mailing list