[llvm-dev] [PGO] Thoughts on adding a key-value store to profile data formats

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 15 16:36:17 PST 2016


On Fri, Jan 15, 2016 at 3:53 PM, Xinliang David Li <davidxl at google.com>
wrote:

> This scheme is more flexible but not necessarily simplifying
> compatibility.  We probably need more use cases in mind before we jump
> into this flexibility (i.e passing arbitrary info from instrumentation
> compile time to runtime  and pass it back to profile-use in a round
> trip).  Note that we have 64 bits in version field -- and perhaps only
> 8 bits is actually needed for the actual version in reality so we have
> lots of bits to use for this purpose.  On the other hand, I think this
> is also orthogonal to the other approach -- if we run out of bits some
> day, we can always implement this.
>

It may be worth thinking about even now. I've seen multiple patches
recently that are using ad-hoc techniques to communicate with the runtime.
E.g. r257230 uses a hack due to not having an orthogonal way to set the
version and variant bits; the result is inferior diagnostic quality and
obscured code intent.

-- Sean Silva


>
> The offline profiling tagging proposed by Nathan is useful to have
> regardless of the above.
>
> David
>
> On Fri, Jan 15, 2016 at 2:18 PM, Sean Silva <chisophugis at gmail.com> wrote:
> >
> >
> > On Fri, Jan 15, 2016 at 11:41 AM, Xinliang David Li <davidxl at google.com>
> > wrote:
> >>
> >> Tagging profile data with such information is generally useful. My
> >> thoughts are
> >>
> >> 1) such information is probably not needed to be stored in raw format
> >> profile data -- so no runtime changes are needed -- only llvm-profdata
> >> and indexed format need to be enhanced to support this.
> >> 2) A more general way is just add an option:
> >> --embed_label=<customized_label>, where the label is a string can be
> >> key/value pairs encoded in user's favorite format. The format of the
> >> key-value pairs are not specified and remain opaque to Instr/Sample
> >> Profiler
> >> 3) labels from multiple source profiles will be merged when merge
> >> command is used.
> >>
> >> On Fri, Jan 15, 2016 at 11:06 AM, Nathan Slingerland <slingn at gmail.com>
> >> wrote:
> >> > Hi all,
> >> >
> >> > I'd liked to get your thoughts on possibly adding a generic key-value
> >> > store
> >> > to the profile data formats for 'metadata'. Some potential uses cases:
> >> >
> >> > I. Profile Features
> >> >
> >> > The most basic use could be as a central repository for internal bits
> of
> >> > housekeeping information about the profile data. For example, to
> >> > differentiate between FE and IR instrumentation:
> >> >
> >> >   llvm.instrumentation_source: "IR"
> >> >
> >> > A key-value store would make it simple to add new bits of information
> >> > and
> >> > help keep everything human-readable for the text-based test formats.
> >> > This
> >> > could potentially also help with error checking at the llvm-profdata
> >> > level
> >> > if the Reader classes exposed it.
> >> >
> >>
> >> This is ok to have, but I don't think the reader class should rely on
> >> meta data to make decisions (as meta data can be thrown away without
> >> affecting correctness). Formal approach such as the one proposed (to
> >> encode it in variant bits of the version field) should be used.
> >
> >
> > We could potentially have a "reserved namespace" like `llvm.*` which
> tools
> > are not allowed to drop (or that have special handling inside tools).
> >
> > Assuming that we have a semantics that guarantees that some
> > labels/"metadata" are kept (and that the compiler can communicate certain
> > predefined labels to the runtime which propagate back to the profraw and
> > then to the profdata), what do you think about using a generic format
> like
> > this for things like versions and profile source, rather than attempting
> to
> > fit everything in a small version field or having to come up with some
> > convention for a variable being defined or not (as in
> > http://reviews.llvm.org/D15540)? My impression is that it would give
> more
> > flexibility and potentially simplify compatibility.
> >
> > -- Sean Silva
> >
> >>
> >>
> >>
> >> > II. Profile Context
> >> >
> >> > Basic (lightweight) information about the profile could be
> automatically
> >> > gathered at profile time. The idea would be to automatically label
> >> > profiles
> >> > with contextual information so that the age/origin of a profile could
> be
> >> > inspected using the llvm-profdata tool.
> >> >
> >> >   $ llvm-profdata show -metadata foo.profdata
> >> >   llvm.profile_start_time: "2016-01-08T23:41:56.755Z"
> >> >   llvm.profile_duration: 5.102s
> >> >   llvm.exe_time: "2016-01-08T23:35:56.745Z"
> >>
> >> Other examples include options and workload used in the training run.
> >>
> >> >   Total functions: 4
> >> >   Maximum function count: 866988873
> >> >   Maximum internal block count: 267914296
> >> >
> >> > Other possibilities: executable path, command line arguments, system
> >> > info
> >> > (uname)
> >>
> >> yes.
> >>
> >> >
> >> > III. Custom Content
> >> >
> >> > The key-value store itself could be exposed to developers via the
> >> > llvm-profdata tool. This would allow for users to associate arbitrary
> >> > custom
> >> > data with a profile, as well as inspect it:
> >> >
> >> >   $ llvm-profdata merge -metadata=customkey,value1 foo.profraw -o
> >> > foo.profdata
> >> >   $ llvm-profdata show -metadata foo.profdata
> >> >   customkey: "value1"
> >> >   Total functions: 4
> >> >   Maximum function count: 866988873
> >> >   Maximum internal block count: 267914296
> >> >
> >> > Developers could add as much custom context as they find valuable:
> >>
> >> I think all meta data should be custom defined -- the profile reader
> >> should not need to understand them.
> >>
> >>
> >> >
> >> >   $ llvm-profdata merge -metadata="mysoft.version,${SOFTWARE_VERSION}
> >> > (${BUILD_NUMBER})" -metadata="mysoft.exe_md5,`md5 -q foo.exe`
> >> > foo.profraw -o
> >> > foo.profdata
> >> >   $ llvm-profdata show -metadata foo.profdata
> >> >   mysoft.version: "0.1.0"
> >> >   mysoft.exe_md5: "337b5c5bc29cbdca090a1921a58465d6"
> >> >   Total functions: 4
> >> >   Maximum function count: 866988873
> >> >   Maximum internal block count: 267914296
> >> >
> >> > Other information that might be interesting: git/svn revision,
> workload
> >> > description, system info (uname -a)
> >> >
> >> > This would be a way to embed almost any platform-specific or
> >> > heavy-weight
> >> > data without requiring the addition of platform-specific code in
> >> > compiler-rt
> >> > and without impacting other developers.
> >> >
> >>
> >> yes.
> >>
> >> >
> >> > When profiles are merged it might be simplest to keep all input
> metadata
> >> > (machine-readable things such as feature bits might need to be handled
> >> > differently):
> >>
> >> Feature bits should not be part of it.
> >>
> >> >
> >> >   $ llvm-profdata merge -weighted-input=3,foo.profdata bar.profdata -o
> >> > foobar.profdata
> >> >   $ llvm-profdata show -metadata foobar.profdata
> >> >   foo.profdata
> >> >     llvm.profile_weight: 3
> >> >     llvm.profile_start_time: "2016-01-08T23:41:56.755Z"
> >> >     llvm.profile_duration: 5.102s
> >> >     llvm.exe_time: "2016-01-08T23:35:56.745Z"
> >> >     customkey: "value1"
> >> >   bar.profdata
> >> >     llvm.profile_weight: 1
> >> >     llvm.profile_start_time: "2016-01-15T00:08:41.168Z"
> >> >     llvm.profile_duration: "1.001s"
> >> >     llvm.exe_time: "2016-01-15T00:08:13.000Z"
> >> >     customkey: "value2"
> >> >   Total functions: 4
> >> >   Maximum function count: 866988873
> >> >   Maximum internal block count: 267914296
> >> >
> >> > In terms of implementation, the metadata could live as a separate
> >> > contiguous
> >> > section in the binary profile formats. It might make sense to encode
> it
> >> > in
> >> > something like YAML so that it could also be directly embedded in the
> >> > various text formats.
> >> >
> >>
> >> A single string after the header should do.
> >>
> >> thanks,
> >>
> >> David
> >>
> >> > ----
> >> >
> >> > What do you think? How useful would any of the above be to you or
> other
> >> > PGO
> >> > users?
> >> > Can you think of any other use cases?
> >> >
> >> > -Nathan
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/03bf36cc/attachment.html>


More information about the llvm-dev mailing list