<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><blockquote type="cite" class="">On Jun 30, 2017, at 8:16 PM, Xinliang David Li <<a href="mailto:davidxl@google.com" class="">davidxl@google.com</a>> wrote:<br class=""></blockquote><div><blockquote type="cite" class=""><br class="Apple-interchange-newline"><div class=""><br class="Apple-interchange-newline"><br style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On Fri, Jun 30, 2017 at 5:54 PM,<span class="Apple-converted-space"> </span><span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:vsk@apple.com" target="_blank" class="">vsk@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">Problem<br class="">-------<br class=""><br class="">Instrumentation for PGO and frontend-based coverage places a large amount of<br class="">data in object files, even though the majority of this data is not needed at<br class="">run-time. All the data is needlessly duplicated while generating archives, and<br class="">again while linking. PGO name data is written out into raw profiles by<br class="">instrumented programs, slowing down the training and code coverage workflows.<br class=""><br class="">Here are some numbers from a coverage + RA build of ToT clang:<br class=""><br class=""> <span class="Apple-converted-space"> </span>* Size of the build directory: 4.3 GB<br class=""><br class=""> <span class="Apple-converted-space"> </span>* Wall time needed to run "clang -help" with an SSD: 0.5 seconds<br class=""><br class=""> <span class="Apple-converted-space"> </span>* Size of the clang binary: 725.24 MB<br class=""><br class=""> <span class="Apple-converted-space"> </span>* Space wasted on duplicate name/coverage data (*.o + *.a): 923.49 MB<br class="">   <span class="Apple-converted-space"> </span>- Size contributed by __llvm_covmap sections: 1.02 GB<br class="">     <span class="Apple-converted-space"> </span>\_ Just within clang: 340.48 MB<br class=""><br class="">   <span class="Apple-converted-space"> </span>- Size contributed by __llvm_prf_names sections: 327.46 MB<br class="">     <span class="Apple-converted-space"> </span>\_ Just within clang: 106.76 MB<br class=""><br class="">   <span class="Apple-converted-space"> </span>=> Space wasted within the clang binary: 447.24 MB<br class=""><br class="">Running an instrumented clang binary triggers a 143MB raw profile write which<br class="">is slow even with an SSD. This problem is particularly bad for frontend-based<br class="">coverage because it generates a lot of extra name data: however, the situation<br class="">can also be improved for PGO instrumentation.<br class=""></blockquote><div class=""><br class=""></div><div class=""><br class=""></div><div class="">I want to point out that this is a problem with FE instrumentation with coverage turned on. Without coverage turned on, the name section size will be significantly smaller. </div></div></div></blockquote><div><br class=""></div><div>Yes, it's 15MB, or about 7 times smaller.</div><span class=""><br class=""></span><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class="">With IR PGO, the name section size is even smaller. For instance, the IR instrumented clang size is 122.3MB, while the name section size is only 2.3MB, the space wasted is < 2%.</div></div></div></blockquote><div><br class=""></div>It's also worth pointing out that with r306561, name data is only written out by the runtime a constant number of times per program, and not on every program invocation. That's a big win.</div><div><br class=""></div><div>The other side to this is that there are valid use cases the online profile merging mode doesn't support, e.g generating separate sets of profiles by training on different inputs, or generating separate coverage reports for each test case. In these cases, having the option to not write out name data is a win.</div><div><div><br class=""></div><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><br class="">Proposal<br class="">--------<br class=""><br class="">Place PGO name data and coverage data outside of object files. This would<br class="">eliminate data duplication in *.a/*.o files, shrink binaries, shrink raw<br class="">profiles, and speed up instrumented programs.<br class=""><br class=""></blockquote><div class=""><br class=""></div><div class=""><br class=""></div><div class="">This sounds fine as long as the behavior (for name) is controlled under an option. For IR PGO, name size is *not* an issue, so keeping the name data in binary and dumped with profile data has advantage in terms of usability -- the profile data is self contained.  Turning on coverage trigger the behavior difference is one possible choice.</div></div></blockquote><div><br class=""></div><div>The splitting behavior would be opt-in.</div><div><br class=""></div><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class="">As for coverage mapping data, splitting it out by default seems to be a more desirable behavior.</div></div></blockquote><div><br class=""></div><div>This can't be a default behavior. The user/build system/IDE would need to specify a metadata file/directory.</div><br class=""><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class=""> The data embedded in the binary is not even used by the profile runtime (of course the runtime can choose to dump it so that the llvm-cov data does not need to look for the executable binary). The sole purpose of emitting it with the object file is to treat the executable/object as the mapping data container.  The usability of llvm-cov won't reduce with the proposed change.</div><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">In more detail:<br class=""><br class="">1. The frontends get a new `-fprofile-metadata-dir=<path><wbr class="">` option. This lets<br class="">users specify where llvm will store profile metadata. If the metadata starts to<br class="">take up too much space, there's just one directory to clean.<br class=""><br class=""></blockquote><div class=""><br class=""></div><div class="">Why not leverage -fcoverage-mapping option -- i.e. add a new flavor of this option that accepts the meta data path: -fcoverage-mapping=<path>.   If the path is not specified, the data will be emitted with object files.</div></div></blockquote><div><br class=""></div><div>This would limit the ability to store name data outside of a binary to FE-style coverage users. I recognize that large name sections aren't always as problematic when using IR/FE PGO, but this seems like an unnecessary restriction.</div><div><br class=""></div><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">2. The frontends continue emitting PGO name data and coverage data in the same<br class="">llvm::Module. So does LLVM's IR-based PGO implementation. No change here.<br class=""><br class="">3. If the InstrProf lowering pass sees that a metadata directory is available,<br class="">it constructs a new module, copies the name/coverage data into it, hashes the<br class="">module, and attempts to write that module to:<br class=""><br class=""> <span class="Apple-converted-space"> </span><metadata-dir>/<module-hash>.<wbr class="">bc   (the metadata module)<br class=""><br class="">If this write operation fails, it scraps the new module: it keeps all the<br class="">metadata in the original module, and there are no changes from the current<br class="">process. I.e with this proposal we preserve backwards compatibility.<br class=""><br class=""></blockquote><div class=""><br class=""></div><div class="">Or simply emit the raw file as coverage notes files (gcno).</div></div></blockquote><div><br class=""></div><div>After reading through the comments, I think it would be better to have the build system specify where the external data goes, and to have just one external file formed post-link.</div><div><br class=""></div><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">4. Once the metadata module is written, the name/coverage data are entirely<br class="">stripped out of the original module. They are replaced by a path to the<br class="">metadata module:<br class=""><br class=""> <span class="Apple-converted-space"> </span>@__llvm_profiling_metadata = "<metadata-dir>/<module-hash>.<wbr class="">bc",<br class="">                               section "__llvm_prf_link"<br class=""><br class="">This allows incremental builds to work properly, which is an important use case<br class="">for code coverage users. When an object is rebuilt, it gets a fresh link to a<br class="">fresh profiling metadata file. Although stale files can accumulate in the<br class="">metadata directory, the stale files cannot ever be used.<br class=""></blockquote><div class=""><br class=""></div><div class="">Why is this needed for incremental build? The file emitted is simply a build artifact, not an input to the build.</div></div></blockquote><div><br class=""></div><div>If llvm-cov just has a path to a directory, it can only load all of the data in the directory. But the aggregate data would not be self-consistent:</div><div><br class=""></div><div>$ ninja foo</div><div><Rename/delete/edit a file.></div><div>$ ninja foo</div><div><Multiple instances of coverage data appear.></div><div><Incorrect coverage reports generated.></div><div><br class=""></div><div>This isn't a problem if there is only one external metadata file (that the build system knows about).</div><br class=""><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">In an IDE like Xcode, since there's just one target binary per scheme, it's<br class="">possible to clean the metadata directory by removing the modules which aren't<br class="">referenced by the target binary.<br class=""><br class="">5. The raw profile format is updated so that links to metadata files are written<br class="">out in each profile. This makes it possible for all existing llvm-profdata and<br class="">llvm-cov commands to work, seamlessly.<br class=""><br class=""></blockquote><div class=""><br class=""></div><div class="">It may not be as smooth as you hope: the directory containing the build artifact may not be accessible when llvm-profdata tool is run. This is especially true for for distributed build system -- without telling the build system , the meta data won't even be copied back to the user.</div></div></blockquote><div><br class=""></div><div>This is another reason the build system should be aware of any metadata stored outside of the object file.</div><br class=""><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class="">Since user explicitly asks for emitting the data into a directory, it won't be a usability regression to require the user to specify the path to locate the meta data -- this is especially true for llvm-cov which requires user to specify the binary path anyway.</div></div></blockquote><blockquote type="cite" class=""><br class=""></blockquote><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class="">This requirement can simplify the implementation even more as there seems no need to write any link data in the binary.</div></div></blockquote></div><div><br class=""></div><div><div>This is from a later email, but I'd like to follow up to this comment here:</div><div><br class=""></div><div><blockquote type="cite" class="">For coverage mapping data, another possible solution is to introduce a post-link tool that strips and compresses the coverage mapping data from the final binary and copies it to a different file. This step can be manually done by the user or by the compiler driver when coverage mapping is on. The name data can be copied too, but it requires slight llvm-profdata work flow change under a flag.</blockquote><br class=""></div><div>I've already alluded to this: this sounds like a simpler plan. Kinda like dsymutil + strip.</div><div><br class=""></div><div>I'm currently traveling (sorry for the delayed responses), and will send out a revised proposal in a week or so.</div><div><br class=""></div><div><div>thanks,</div><div>vedant</div></div></div><div><br class=""><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">The indexed profile format will *not* be updated: i.e, it will contain a full<br class="">symbol table, and no links. This simplifies the coverage mapping reader, because<br class="">a full symbol table is guaranteed to exist before any function records are<br class="">parsed. It also reduces the amount of coding, and makes it easier to preserve<br class="">backwards compatibility :).<br class=""><br class="">6. The raw profile reader will learn how to read links, open up the metadata<br class="">modules it finds links to, and collect name data from those modules.<br class=""></blockquote><div class=""><br class=""></div><div class="">See above, I think it is better to explicitly pass the directory to the reader.</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><br class="">7. The coverage reader will learn how to read the __llvm_prf_link section, open<br class="">up metadata modules, and lazily read coverage mapping data.<br class=""><br class="">Alternate Solutions<br class="">-------------------<br class=""><br class="">1. Instead of copying name data into an external metadata module, just copy the<br class="">coverage mapping data.<br class=""><br class="">I've actually prototyped this. This might be a good way to split up patches,<br class="">although I don't see why we wouldn't want to tackle the name data problem<br class="">eventually.<br class=""><br class=""></blockquote><div class=""><br class=""></div><div class=""><br class=""></div><div class="">I think this can be a good first step.</div><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">2. Instead of emitting links to external metadata modules, modify llvm-cov and<br class="">llvm-profdata so that they require a path to the metadata directory.<br class=""></blockquote><div class=""><br class=""></div><div class="">I second this.</div></div></blockquote><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><br class="">The issue with this is that it's way too easy to read stale metadata. It's also<br class="">less user-friendly, which hurts adoption.<br class=""></blockquote><div class=""><br class=""></div><div class="">I don't think it will be less user-friendly. See reasons mentioned above.</div></div></blockquote><blockquote type="cite" class=""><br class=""></blockquote><blockquote type="cite" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">3. Use something other than llvm bitcode for the metadata module format.<br class=""><br class="">Since we're mostly writing large binary blobs (compressed name data or<br class="">pre-encoded source range mapping info), using bitcode shouldn't be too slow, and<br class="">we're not likely to get better compression with a different format.<br class=""><br class="">Bitcode is also convenient, and is nice for backwards compatibility.<br class=""></blockquote><div class=""><br class=""></div><div class="">Or a simpler wrapper format. Some data is probably needed to justify the decision.</div><div class=""><br class=""></div><div class="">David</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><br class="">------------------------------<wbr class="">------------------------------<wbr class="">--------------------<br class=""><br class="">If you've made it this far, thanks for taking a look! I'd appreciate any<br class="">feedback.<br class=""><span class="HOEnZb"><font color="#888888" class=""><br class="">vedant</font></span></blockquote></div></blockquote></div><br class=""></body></html>