<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 15, 2016 at 3:53 PM, Xinliang David Li <span dir="ltr"><<a href="mailto:davidxl@google.com" target="_blank">davidxl@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">This scheme is more flexible but not necessarily simplifying<br>

compatibility.  We probably need more use cases in mind before we jump<br>

into this flexibility (i.e passing arbitrary info from instrumentation<br>

compile time to runtime  and pass it back to profile-use in a round<br>

trip).  Note that we have 64 bits in version field -- and perhaps only<br>

8 bits is actually needed for the actual version in reality so we have<br>

lots of bits to use for this purpose.  On the other hand, I think this<br>

is also orthogonal to the other approach -- if we run out of bits some<br>

day, we can always implement this.<br></blockquote><div><br></div><div>It may be worth thinking about even now. I've seen multiple patches recently that are using ad-hoc techniques to communicate with the runtime. E.g. r257230 uses a hack due to not having an orthogonal way to set the version and variant bits; the result is inferior diagnostic quality and obscured code intent.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

The offline profiling tagging proposed by Nathan is useful to have<br>

regardless of the above.<br>

<span class=""><font color="#888888"><br>

David<br>

</font></span><div class=""><div class="h5"><br>

On Fri, Jan 15, 2016 at 2:18 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

><br>

><br>

> On Fri, Jan 15, 2016 at 11:41 AM, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>><br>

> wrote:<br>

>><br>

>> Tagging profile data with such information is generally useful. My<br>

>> thoughts are<br>

>><br>

>> 1) such information is probably not needed to be stored in raw format<br>

>> profile data -- so no runtime changes are needed -- only llvm-profdata<br>

>> and indexed format need to be enhanced to support this.<br>

>> 2) A more general way is just add an option:<br>

>> --embed_label=<customized_label>, where the label is a string can be<br>

>> key/value pairs encoded in user's favorite format. The format of the<br>

>> key-value pairs are not specified and remain opaque to Instr/Sample<br>

>> Profiler<br>

>> 3) labels from multiple source profiles will be merged when merge<br>

>> command is used.<br>

>><br>

>> On Fri, Jan 15, 2016 at 11:06 AM, Nathan Slingerland <<a href="mailto:slingn@gmail.com">slingn@gmail.com</a>><br>

>> wrote:<br>

>> > Hi all,<br>

>> ><br>

>> > I'd liked to get your thoughts on possibly adding a generic key-value<br>

>> > store<br>

>> > to the profile data formats for 'metadata'. Some potential uses cases:<br>

>> ><br>

>> > I. Profile Features<br>

>> ><br>

>> > The most basic use could be as a central repository for internal bits of<br>

>> > housekeeping information about the profile data. For example, to<br>

>> > differentiate between FE and IR instrumentation:<br>

>> ><br>

>> >   llvm.instrumentation_source: "IR"<br>

>> ><br>

>> > A key-value store would make it simple to add new bits of information<br>

>> > and<br>

>> > help keep everything human-readable for the text-based test formats.<br>

>> > This<br>

>> > could potentially also help with error checking at the llvm-profdata<br>

>> > level<br>

>> > if the Reader classes exposed it.<br>

>> ><br>

>><br>

>> This is ok to have, but I don't think the reader class should rely on<br>

>> meta data to make decisions (as meta data can be thrown away without<br>

>> affecting correctness). Formal approach such as the one proposed (to<br>

>> encode it in variant bits of the version field) should be used.<br>

><br>

><br>

> We could potentially have a "reserved namespace" like `llvm.*` which tools<br>

> are not allowed to drop (or that have special handling inside tools).<br>

><br>

> Assuming that we have a semantics that guarantees that some<br>

> labels/"metadata" are kept (and that the compiler can communicate certain<br>

> predefined labels to the runtime which propagate back to the profraw and<br>

> then to the profdata), what do you think about using a generic format like<br>

> this for things like versions and profile source, rather than attempting to<br>

> fit everything in a small version field or having to come up with some<br>

> convention for a variable being defined or not (as in<br>

> <a href="http://reviews.llvm.org/D15540" rel="noreferrer" target="_blank">http://reviews.llvm.org/D15540</a>)? My impression is that it would give more<br>

> flexibility and potentially simplify compatibility.<br>

><br>

> -- Sean Silva<br>

><br>

>><br>

>><br>

>><br>

>> > II. Profile Context<br>

>> ><br>

>> > Basic (lightweight) information about the profile could be automatically<br>

>> > gathered at profile time. The idea would be to automatically label<br>

>> > profiles<br>

>> > with contextual information so that the age/origin of a profile could be<br>

>> > inspected using the llvm-profdata tool.<br>

>> ><br>

>> >   $ llvm-profdata show -metadata foo.profdata<br>

>> >   llvm.profile_start_time: "2016-01-08T23:41:56.755Z"<br>

>> >   llvm.profile_duration: 5.102s<br>

>> >   llvm.exe_time: "2016-01-08T23:35:56.745Z"<br>

>><br>

>> Other examples include options and workload used in the training run.<br>

>><br>

>> >   Total functions: 4<br>

>> >   Maximum function count: 866988873<br>

>> >   Maximum internal block count: 267914296<br>

>> ><br>

>> > Other possibilities: executable path, command line arguments, system<br>

>> > info<br>

>> > (uname)<br>

>><br>

>> yes.<br>

>><br>

>> ><br>

>> > III. Custom Content<br>

>> ><br>

>> > The key-value store itself could be exposed to developers via the<br>

>> > llvm-profdata tool. This would allow for users to associate arbitrary<br>

>> > custom<br>

>> > data with a profile, as well as inspect it:<br>

>> ><br>

>> >   $ llvm-profdata merge -metadata=customkey,value1 foo.profraw -o<br>

>> > foo.profdata<br>

>> >   $ llvm-profdata show -metadata foo.profdata<br>

>> >   customkey: "value1"<br>

>> >   Total functions: 4<br>

>> >   Maximum function count: 866988873<br>

>> >   Maximum internal block count: 267914296<br>

>> ><br>

>> > Developers could add as much custom context as they find valuable:<br>

>><br>

>> I think all meta data should be custom defined -- the profile reader<br>

>> should not need to understand them.<br>

>><br>

>><br>

>> ><br>

>> >   $ llvm-profdata merge -metadata="mysoft.version,${SOFTWARE_VERSION}<br>

>> > (${BUILD_NUMBER})" -metadata="mysoft.exe_md5,`md5 -q foo.exe`<br>

>> > foo.profraw -o<br>

>> > foo.profdata<br>

>> >   $ llvm-profdata show -metadata foo.profdata<br>

>> >   mysoft.version: "0.1.0"<br>

>> >   mysoft.exe_md5: "337b5c5bc29cbdca090a1921a58465d6"<br>

>> >   Total functions: 4<br>

>> >   Maximum function count: 866988873<br>

>> >   Maximum internal block count: 267914296<br>

>> ><br>

>> > Other information that might be interesting: git/svn revision, workload<br>

>> > description, system info (uname -a)<br>

>> ><br>

>> > This would be a way to embed almost any platform-specific or<br>

>> > heavy-weight<br>

>> > data without requiring the addition of platform-specific code in<br>

>> > compiler-rt<br>

>> > and without impacting other developers.<br>

>> ><br>

>><br>

>> yes.<br>

>><br>

>> ><br>

>> > When profiles are merged it might be simplest to keep all input metadata<br>

>> > (machine-readable things such as feature bits might need to be handled<br>

>> > differently):<br>

>><br>

>> Feature bits should not be part of it.<br>

>><br>

>> ><br>

>> >   $ llvm-profdata merge -weighted-input=3,foo.profdata bar.profdata -o<br>

>> > foobar.profdata<br>

>> >   $ llvm-profdata show -metadata foobar.profdata<br>

>> >   foo.profdata<br>

>> >     llvm.profile_weight: 3<br>

>> >     llvm.profile_start_time: "2016-01-08T23:41:56.755Z"<br>

>> >     llvm.profile_duration: 5.102s<br>

>> >     llvm.exe_time: "2016-01-08T23:35:56.745Z"<br>

>> >     customkey: "value1"<br>

>> >   bar.profdata<br>

>> >     llvm.profile_weight: 1<br>

>> >     llvm.profile_start_time: "2016-01-15T00:08:41.168Z"<br>

>> >     llvm.profile_duration: "1.001s"<br>

>> >     llvm.exe_time: "2016-01-15T00:08:13.000Z"<br>

>> >     customkey: "value2"<br>

>> >   Total functions: 4<br>

>> >   Maximum function count: 866988873<br>

>> >   Maximum internal block count: 267914296<br>

>> ><br>

>> > In terms of implementation, the metadata could live as a separate<br>

>> > contiguous<br>

>> > section in the binary profile formats. It might make sense to encode it<br>

>> > in<br>

>> > something like YAML so that it could also be directly embedded in the<br>

>> > various text formats.<br>

>> ><br>

>><br>

>> A single string after the header should do.<br>

>><br>

>> thanks,<br>

>><br>

>> David<br>

>><br>

>> > ----<br>

>> ><br>

>> > What do you think? How useful would any of the above be to you or other<br>

>> > PGO<br>

>> > users?<br>

>> > Can you think of any other use cases?<br>

>> ><br>

>> > -Nathan<br>

><br>

><br>

</div></div></blockquote></div><br></div></div>