<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 6 Jul 2017, at 01:07, David Blaikie <<a href="mailto:dblaikie@gmail.com" class="">dblaikie@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Could someone summarize the % size costs in object and executables, release versus unoptimized and debug V no-debug builds? (maybe that's too much of a hassle, but thought it might provide some clarity about the tradeoffs, pain points, etc)<br class=""><br class="">Also, added dberris here, since if I recall correctly, the XRay work has some similar aspects - where certain mapping structures are kept in the binary and consulted when interpreting XRay traces. In that case it may also be useful to avoid putting those structures into the final binary in some cases for the same sort of size tradeoff reasons.<br class=""></div></div></blockquote><div><br class=""></div><div>Yes, in XRay we depend at runtime on being able to access an in-memory array of a certain format/alignment.</div><div><br class=""></div><div>It just so happens that we also need to be able to find which functions are instrumented in particular binary for tooling/interpretation/analysis purposes "offline".</div><div><br class=""></div><div>While emitting the instrumentation map as part of the binary is certainly something convenient for the runtime so as not to require "external input" to find the places in the binary that should be patched, I don't see it as an actual deal-breaker if the runtime would have a fall-back mechanism for finding the instrumentation map externally. That might introduce a few issues when there's a mismatch between the instrumentation map's addresses/offsets and the binary being instrumented. With the nature of XRay, this is very dangerous because a well-crafted instrumentation map can certainly lead to potential abuse. We might need to be clever about using signatures or special markers in the binary and the instrumentation maps to match those up properly.</div><div><br class=""></div><div>The tooling (llvm-xray) can already deal with a detached (crudely done, objcopy of the xray_instr_map section or a YAML representation of the same) instrumentation map. However the runtime implementation currently doesn't. Like mentioned above, there are a few issues to work out in that regard (having a strong "unique" identifier for a binary and the instrumentation map, something potentially involving some robustly computed crypto hash at compile-time, etc.). But it's certainly a non-trivial problem especially if we want to make it portable across object file formats (I only barely know how ELF works) and platforms (Windows, UNIX, etc.).</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><br class="">& then even more worth looking at a generalized solution for these sort of things.<br class=""><br class=""></div></div></blockquote><div><br class=""></div><div>+1 to a generalised solution. I'd be very happy to be involved in that discussion too.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class="">- Dave<br class=""><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Wed, Jul 5, 2017 at 7:58 AM Vedant Kumar via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><div class="">On Jun 30, 2017, at 10:04 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com" target="_blank" class="">chisophugis@gmail.com</a>> wrote:</div><br class="m_-1469304426851716179Apple-interchange-newline"><div class=""><br class="m_-1469304426851716179Apple-interchange-newline"><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">On Fri, Jun 30, 2017 at 5:54 PM, via llvm-dev<span class="m_-1469304426851716179Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>></span><span class="m_-1469304426851716179Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Problem<br class="">-------<br class=""><br class="">Instrumentation for PGO and frontend-based coverage places a large amount of<br class="">data in object files, even though the majority of this data is not needed at<br class="">run-time. All the data is needlessly duplicated while generating archives, and<br class="">again while linking. PGO name data is written out into raw profiles by<br class="">instrumented programs, slowing down the training and code coverage workflows.<br class=""><br class="">Here are some numbers from a coverage + RA build of ToT clang:<br class=""><br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>* Size of the build directory: 4.3 GB<br class=""><br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>* Wall time needed to run "clang -help" with an SSD: 0.5 seconds<br class=""><br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>* Size of the clang binary: 725.24 MB<br class=""><br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>* Space wasted on duplicate name/coverage data (*.o + *.a): 923.49 MB<br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>- Size contributed by __llvm_covmap sections: 1.02 GB<br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>\_ Just within clang: 340.48 MB<br class=""></blockquote><div class=""><br class=""></div><div class="">We live with this duplication for debug info. In some sense, if the overhead is small compared to debug info, should we even bother (i.e., we assume that users accommodate debug builds, so that is a reasonable bound on the tolerable build directory size). (I don't know the numbers; this seems pretty large so maybe it is significant compared to debug info; just saying that looking at absolute numbers is misleading here; numbers compared to debug info are a closer measure to the user's perceptions)</div></div></div></blockquote><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class="">The size of a RelWithDebInfo build directory for the same checkout is 9 GB (I'm still just building clang, this time without instrumentation). We (more or less) get away with this because the debug info isn't copied into the final binary [1]. We're not getting away with this with coverage. E.g we usually store bot artifacts for a while, but we had to shut this functionality off almost immediately for our coverage bots because the uploads were horrific.</div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class="">In fact, one overall architectural observation I have is that the most complicated part of all this is simply establishing the workflow to plumb together data emitted per-TU to a tool that needs that information to do some post-processing step on the results of running the binary. That sounds a lot like the role of debug info. In fact, having a debugger open a core file is precisely equivalent to what llvm-profdata needs to do in this regard AFAICT.</div><div class=""><br class=""></div><div class="">So it would be best if possible to piggyback on all the effort that has gone into plumbing that data to make debug info work. For example, I know that on Darwin there's a fair amount of system-level integration to make split dwarf "just work" while keeping debug info out of final binaries.</div><div class=""><br class=""></div><div class="">If there is a not-too-hacky way to piggyback on debug info, that's likely to be a really slick solution. For example, debug info could in principle (if it doesn't already) contain information about the name of each counter in the counter array, so in principle it would be a complete enough description to identify each counter.</div></div></div></blockquote><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class="">We don't emit debug info for this currently. Is there a reason to?</div><div class=""><br class=""></div><div class=""></div></div><div style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class="">I'm not very familiar with DWARF, but I'm imagining something like reserving an LLVM vendor-specific DWARF opcode/attribute/whatever and then stick a blob of data in there. Presumably we have code somewhere in LLDB that is "here's a binary, find debug info for it", and in principle we could factor out that code and lift it into an LLVM library (libFindDebugInfo) that llvm-profdata could use.</div></div></div></blockquote><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class=""><div class="">This could work for the coverage/name data. There are some really nice pieces of Darwin integration (e.g search-with-Spotlight, findDsymForUUID). I'll look into this.</div></div></div><div style="word-wrap:break-word" class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> <span class="m_-1469304426851716179Apple-converted-space"> </span>- Size contributed by __llvm_prf_names sections: 327.46 MB<br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>\_ Just within clang: 106.76 MB<br class=""><br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>=> Space wasted within the clang binary: 447.24 MB<br class=""><br class="">Running an instrumented clang binary triggers a 143MB raw profile write which<br class="">is slow even with an SSD. This problem is particularly bad for frontend-based<br class="">coverage because it generates a lot of extra name data: however, the situation<br class="">can also be improved for PGO instrumentation.<br class=""><br class="">Proposal<br class="">--------<br class=""><br class="">Place PGO name data and coverage data outside of object files. This would<br class="">eliminate data duplication in *.a/*.o files, shrink binaries, shrink raw<br class="">profiles, and speed up instrumented programs.<br class=""><br class="">In more detail:<br class=""><br class="">1. The frontends get a new `-fprofile-metadata-dir=<path>` option. This lets<br class="">users specify where llvm will store profile metadata. If the metadata starts to<br class="">take up too much space, there's just one directory to clean.<br class=""><br class="">2. The frontends continue emitting PGO name data and coverage data in the same<br class="">llvm::Module. So does LLVM's IR-based PGO implementation. No change here.<br class=""><br class="">3. If the InstrProf lowering pass sees that a metadata directory is available,<br class="">it constructs a new module, copies the name/coverage data into it, hashes the<br class="">module, and attempts to write that module to:<br class=""><br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span><metadata-dir>/<module-hash>.bc (the metadata module)<br class=""><br class="">If this write operation fails, it scraps the new module: it keeps all the<br class="">metadata in the original module, and there are no changes from the current<br class="">process. I.e with this proposal we preserve backwards compatibility.<br class=""></blockquote><div class=""><br class=""></div><div class="">Based at my experience with Clang's implicit modules, I'm *extremely* wary of anything that might cause the compiler to emit a file that the build system cannot guess the name of. In fact, having the compiler emit a file that is not explicitly listed on the command line is basically just as bad in practice (in terms of feasibility of informing the build system about it).</div><div class=""><br class=""></div><div class="">As a simple example, ninja simply cannot represent a dependency of this type, so if you delete a <metadata-dir>/<module-hash>.bc it won't know things need to be rebuilt (and it won't know how to clean it, etc.).<br class=""></div><div class=""><br class=""></div><div class="">So I would really strongly recommend against doing this.</div></div></blockquote></div></div><div style="word-wrap:break-word" class=""><div class=""><blockquote type="cite" class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class=""><br class=""></div><div class="">Again, these problems of system integration (in particular build system integration) are nasty, and if you can bypass this and piggyback on debug info then everything will "just work" because the folks that care about making sure that debugging "just works" already did the work for you.</div><div class="">It might be more work in the short term to do the debug info approach (if it is feasible at all), but I can tell you based on the experience with implicit modules (and I'm sure you have some experience of your own) that there's just going to be a neverending tail of hitches and ways that things don't work (or work poorly) due to not having the build system / overall system integration right, so it will be worth it in the long run.</div></div></blockquote><div class=""><br class=""></div></div></div><div style="word-wrap:break-word" class=""><div class="">Thanks, this makes a lot of sense. The build system should keep track of where to externalize profile metadata (regardless of whether or not it piggybacks on debug info). In addition to the advantages you've listed, this would make testing easier.</div><div class=""><br class=""></div><div class="">vedant</div><div class=""><br class=""><div class="">[1] ld64:</div>2561 if ( strcmp(sect->segname(), "__DWARF") == 0 ) { <br class="">2562 // note that .o file has dwarf <br class="">2563 _file->_debugInfoKind = ld::relocatable::File::kDebugInfoDwarf; <br class="">2564 // save off iteresting dwarf sections <br class="">... <br class="">2571 else if ( strcmp(sect->sectname(), "__debug_str") == 0 ) <br class="">2572 _file->_dwarfDebugStringSect = sect; <br class="">2573 // linker does not propagate dwarf sections to output file <br class="">2574 continue; </div></div><div style="word-wrap:break-word" class=""><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br class="">4. Once the metadata module is written, the name/coverage data are entirely<br class="">stripped out of the original module. They are replaced by a path to the<br class="">metadata module:<br class=""><br class=""> <span class="m_-1469304426851716179Apple-converted-space"> </span>@__llvm_profiling_metadata = "<metadata-dir>/<module-hash>.bc",<br class=""> section "__llvm_prf_link"<br class=""><br class="">This allows incremental builds to work properly, which is an important use case<br class="">for code coverage users. When an object is rebuilt, it gets a fresh link to a<br class="">fresh profiling metadata file. Although stale files can accumulate in the<br class="">metadata directory, the stale files cannot ever be used.<br class=""><br class="">In an IDE like Xcode, since there's just one target binary per scheme, it's<br class="">possible to clean the metadata directory by removing the modules which aren't<br class="">referenced by the target binary.<br class=""><br class="">5. The raw profile format is updated so that links to metadata files are written<br class="">out in each profile. This makes it possible for all existing llvm-profdata and<br class="">llvm-cov commands to work, seamlessly.<br class=""><br class="">The indexed profile format will *not* be updated: i.e, it will contain a full<br class="">symbol table, and no links. This simplifies the coverage mapping reader, because<br class="">a full symbol table is guaranteed to exist before any function records are<br class="">parsed. It also reduces the amount of coding, and makes it easier to preserve<br class="">backwards compatibility :).<br class=""><br class="">6. The raw profile reader will learn how to read links, open up the metadata<br class="">modules it finds links to, and collect name data from those modules.<br class=""><br class="">7. The coverage reader will learn how to read the __llvm_prf_link section, open<br class="">up metadata modules, and lazily read coverage mapping data.<br class=""><br class="">Alternate Solutions<br class="">-------------------<br class=""><br class="">1. Instead of copying name data into an external metadata module, just copy the<br class="">coverage mapping data.<br class=""><br class="">I've actually prototyped this. This might be a good way to split up patches,<br class="">although I don't see why we wouldn't want to tackle the name data problem<br class="">eventually.<br class=""><br class="">2. Instead of emitting links to external metadata modules, modify llvm-cov and<br class="">llvm-profdata so that they require a path to the metadata directory.<br class=""><br class="">The issue with this is that it's way too easy to read stale metadata. It's also<br class="">less user-friendly, which hurts adoption.<br class=""><br class="">3. Use something other than llvm bitcode for the metadata module format.<br class=""><br class="">Since we're mostly writing large binary blobs (compressed name data or<br class="">pre-encoded source range mapping info), using bitcode shouldn't be too slow, and<br class="">we're not likely to get better compression with a different format.<br class=""><br class="">Bitcode is also convenient, and is nice for backwards compatibility.<br class=""><br class="">--------------------------------------------------------------------------------<br class=""><br class="">If you've made it this far, thanks for taking a look! I'd appreciate any<br class="">feedback.<br class=""><br class="">vedant<br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></blockquote></div></div></blockquote></div><br class=""></div>_______________________________________________<br class="">
LLVM Developers mailing list<br class="">
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class="">
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br class="">
</blockquote></div></div>
</div></blockquote></div><br class=""><div class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">-- Dean</div></div>
</div>
<br class=""></body></html>