<div dir="ltr">Is it that it'd be better to have the functionality in LLVM, or in a new tool? (is it about it being a different tool, or about it being in the LLVM tree, or something else?)<br><br>What about possibly moving Bloaty into the LLVM project & improving it there?</div><br><div class="gmail_quote"><div dir="ltr">On Mon, Oct 1, 2018 at 4:48 PM Vedant Kumar <<a href="mailto:vsk@apple.com">vsk@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><blockquote type="cite">On Oct 1, 2018, at 3:25 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> wrote:<br></blockquote></div><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><br class="m_2793308867876070509Apple-interchange-newline"><div><div dir="ltr"><br><div class="gmail_quote"><div dir="ltr">On Mon, Oct 1, 2018 at 3:24 PM JF Bastien <<a href="mailto:jfbastien@apple.com" target="_blank">jfbastien@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><blockquote type="cite"><div>On Oct 1, 2018, at 3:16 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> wrote:</div><br class="m_2793308867876070509m_-3455304991324924934Apple-interchange-newline"><div><div dir="ltr">(my vote, somewhat biased - is that I'd love to see more investment in Bloaty (to keep all these sort of size analysis tools and tricks in one place), but sort of accept folks are probably going to keep building more infrastructure for this sort of thing in LLVM directly)<br></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>I get where that comes from, but it seems a bit like a Valgrind versus sanitizer argument: integrating with the toolchain gives you things you can’t really get otherwise. Valgrind is still great as a self-standing thing.</div></div></div></blockquote><div><br>Not sure that's quite the same though - with sanitizer integrating with the optimizers is the key here.<br><br>With bloaty - it could, at worst, use LLVM's libDebugInfo as a library to implement the more advanced debug-using features without being less functional than an in-LLVM implementation.<br></div></div></div></div></blockquote><div><br></div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div>I’m a bit biased too, but fwiw: my preference would be to add a new size analysis tool to llvm.</div><div><br></div><div>Such a tool might grow to depend on code for object file parsing, debug info parsing, demangling, and disassembling (all of which bloaty either reimplements or pulls in). Living in-tree should make it easier to pick up bug fixes in these dependencies and reduce maintenance overhead.</div><div><br></div><div>While I really like bloaty, my impression is that it’d be better to implement the functionality I’d like to use in a new tool.</div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div><br></div><div>vedant</div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><div><br></div><br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_quote"><div><br>- Dave<br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space"><div><div><br></div><br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Sep 26, 2018 at 12:03 PM Vedant Kumar <<a href="mailto:vsk@apple.com" target="_blank">vsk@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>
<br>
I worked on a code size analysis tool for a 'week of code' project and think<br>
that it might be useful enough to upstream.<br>
<br>
The tool is inspired by bloaty (<a href="https://github.com/google/bloaty" rel="noreferrer" target="_blank">https://github.com/google/bloaty</a>), but tries to<br>
do more to attribute code size in actionable ways.<br>
<br>
For example, it can calculate how many bytes inlined instances of a function<br>
added to a binary. In its diff mode, it can show how much more aggressively a<br>
function was inlined compared to a baseline. This can be useful when you're,<br>
say, trying to figure out why firmware compiled by a new compiler is just a few<br>
bytes over the size limit imposed by your embedded device :). In this case,<br>
extra information about inlining can help inform a decision to either tweak the<br>
inliner's cost model or to judiciously add a few `noinline` attributes. (Note<br>
that if you're willing to recompile & write a few SQL queries, optimization<br>
remarks can give you similar information, albeit at the IR level.)<br>
<br>
As another example, this code size tool can attribute code size to semantically<br>
interesting groups of code, like C++/Swift classes, or files. In the diff mode,<br>
you can see how the code size of a class/file grew compared to a baseline. The<br>
tool understands inheritance, so you can also see interesting high-level trends.<br>
E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and clang-7.<br>
<br>
Unlike bloaty, this tool focuses exclusively on the text segment. Also unlike<br>
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is<br>
currently implemented as a sub-tool of llvm-dwarfdump.<br>
<br>
To get size information about a program, you do:<br>
<br>
llvm-dwarfdump size-info -baseline <object> -stats-dir <dir><br>
<br>
This emits four *.stats files into <dir>, each containing a distinct 'view' into<br>
the code groups in <object>. There's a file view, a function view, a class view,<br>
and an inlining view. Each view is sorted by code size, so you can see the<br>
largest functions/classes/etc immediately.<br>
<br>
The *.stats files are just human-readable text files. As it happens, they use<br>
the flamegraph format (<a href="http://brendangregg.com/flamegraphs.html" rel="noreferrer" target="_blank">http://brendangregg.com/flamegraphs.html</a>). This makes it<br>
easy to visualize any view as a flamegraph. (If you haven't seen one before,<br>
it's a hierarchical visualization where the width of each entry corresponds to<br>
its frequency (or in this case size).)<br>
<br>
To look at code growth between two programs, you'd do:<br>
<br>
llvm-dwarfdump size-info -baseline <object> -target <object> -stats-dir <dir><br>
<br>
Similarly, this emits four 'view' files into <dir>, but with a *.diffstats<br>
suffix. The format is the same.<br>
<br>
Pending Work<br>
------------<br>
<br>
I think the main piece of work the tool needs is better testing. Currently<br>
there's just a single end-to-end test in clang. It might be better to check in<br>
a few binaries so we can check that the tool reports sizes correctly.<br>
<br>
Also, it may turn out that folks are interested in different ways of visualizing<br>
size data. While the textual format of flamegraphs is really convenient for<br>
humans to read, the graphs themselves do make more sense when the underlying<br>
data have a frequentist interpretation. If there's enough interest I can explore<br>
using an alternative format for visualization, e.g:<br>
<br>
<a href="http://neugierig.org/software/chromium/bloat/" rel="noreferrer" target="_blank">http://neugierig.org/software/chromium/bloat/</a><br>
<a href="https://github.com/evmar/webtreemap" rel="noreferrer" target="_blank">https://github.com/evmar/webtreemap</a><br>
<br>
(Thanks JF for pointing these out!)<br>
<br>
Here's a link to the source code:<br>
<br>
<a href="https://github.com/vedantk/llvm-project/tree/sizeinfo" rel="noreferrer" target="_blank">https://github.com/vedantk/llvm-project/tree/sizeinfo</a> <br>
<br>
Selected Examples<br>
-----------------<br>
<br>
Here are a few interesting snippets from a comparison of clang-6 vs. clang-7.<br>
<br>
First, let's take a look at the function view diffstat. Here are the 10<br>
functions which grew in size the most. On the left hand side, you'll see the<br>
demangled function name. The *change* in code size in bytes is reported on the<br>
right hand side (only positive changes are reported).<br>
<br>
clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316<br>
ProcessDeclAttribute([snip]) [function] 125893<br>
llvm::AArch64InstPrinter::printAliasInstr([snip]) [function] 105133<br>
llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133<br>
ParseCodeGenArgs([snip]) [function] 64692<br>
unswitchNontrivialInvariants([snip]) [function] 40180<br>
getAttrKind([snip]) [function] 35811<br>
clang::DumpCompilerOptionsAction::ExecuteAction() [function] 32417<br>
llvm::UpgradeIntrinsicCall([snip]) [function] 30239<br>
bool llvm::InstructionSelector::executeMatchTable<(anonymous namespace)::ARMInstructionSelector const, [snip]) const [function] 29352<br>
<br>
<br>
Next, let's look at the file view diffstat. This can be useful because it goes<br>
beyond simply identifying the files which grew the most. It actually describes<br>
which *functions* grew the most in those files, creating more opportunites to<br>
do something about the code growth.<br>
<br>
lib/Target/X86/X86ISelLowering.cpp [file];combineX86ShuffleChain([snip]) [function] 24864<br>
lib/Target/X86/X86ISelLowering.cpp [file];combineMul([snip]) [function] 14907<br>
lib/Target/X86/X86ISelLowering.cpp [file];combineStore([snip]) [function] 12220<br>
...<br>
tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::CheckCompareOperands([snip]) [function] 16024<br>
tools/clang/lib/Sema/SemaExpr.cpp [file];diagnoseTautologicalComparison([snip]) [function] 1740<br>
tools/clang/lib/Sema/SemaExpr.cpp [file];clang::Sema::ActOnNumericConstant([snip]) [function] 1436<br>
tools/clang/lib/Sema/SemaExpr.cpp [file];checkThreeWayNarrowingConversion([snip]) [function] 1356<br>
tools/clang/lib/Sema/SemaExpr.cpp [file];CheckIdentityFieldAssignment([snip]) [function] 1280<br>
<br>
<br>
The class view diffstat is a bit different because it has more levels of<br>
nesting than the other views, due to inheritance. This might help give a sense<br>
for the high-level changes in a program, but may also be less actionable.<br>
<br>
clang::Sema [class];clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316<br>
clang::Sema [class];clang::Sema::CheckHexagonBuiltinArgument([snip]) [function] 24156<br>
clang::Sema [class];clang::Sema::ActOnTag([snip]) [function] 22373<br>
...<br>
llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function] 105133<br>
llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter [class];llvm::AArch64AppleInstPrinter::printInstruction([snip]) [function] 5824<br>
...<br>
llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::X86SpeculativeLoadHardeningPass [class];(anonymous namespace)::X86SpeculativeLoadHardeningPass::checkAllLoads(llvm::MachineFunction&) [function] 19287<br>
...<br>
llvm::Pass [class];llvm::FunctionPass [class];llvm::MachineFunctionPass [class];(anon)::MachineLICMBase [class];(anonymous namespace)::MachineLICMBase::runOnMachineFunction(llvm::MachineFunction&) [function] 20343<br>
<br>
Here's a link to a flamegraph of the class view diffstat (warning: it's big):<br>
<br>
<a href="http://net.vedantk.com/static/llvm/swift-clang-4.2-vs-5.0.class-view.diffstats.svg" rel="noreferrer" target="_blank">http://net.vedantk.com/static/llvm/swift-clang-4.2-vs-5.0.class-view.diffstats.svg</a><br>
<br>
Finally, here are a few interesting entries from the inlining view diffstat. As<br>
with all of the other views, the right hand side still shows code growth in<br>
bytes. For a given inlining target, this size is computed by diffing the sum of<br>
PC range lengths from all DW_TAG_inlined_subroutines referring to that target.<br>
This allows the size tool to attribute code size to an inlining target even<br>
when the inlined code is not contiguous in the caller.<br>
<br>
llvm::raw_ostream::operator<<(char const*) [inlining-target] 66720<br>
llvm::MCRegisterClass::contains(unsigned int) const [inlining-target] 64161<br>
llvm::StringRef::StringRef(char const*) [inlining-target] 39262<br>
llvm::MCInst::getOperand(unsigned int) const [inlining-target] 33268<br>
clang::CodeCompletionResult::~CodeCompletionResult() [inlining-target] 25763<br>
llvm::operator+(llvm::Twine const&, llvm::Twine const&) [inlining-target] 25525<br>
clang::ASTImporter::Import(clang::SourceLocation) [inlining-target] 21096<br>
clang::Sema::Diag(clang::SourceLocation, unsigned int) [inlining-target] 20898<br>
<br>
Feedback & questions welcome!<br>
<br>
thanks,<br>
vedant<br>
</blockquote></div></div>
</div></blockquote></div></div></blockquote></div></div>
</div></blockquote></div></div></blockquote></div>