[llvm-dev] RFC: Adding a code size analysis tool

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 1 16:50:30 PDT 2018


Is it that it'd be better to have the functionality in LLVM, or in a new
tool? (is it about it being a different tool, or about it being in the LLVM
tree, or something else?)

What about possibly moving Bloaty into the LLVM project & improving it
there?

On Mon, Oct 1, 2018 at 4:48 PM Vedant Kumar <vsk at apple.com> wrote:

> On Oct 1, 2018, at 3:25 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Mon, Oct 1, 2018 at 3:24 PM JF Bastien <jfbastien at apple.com> wrote:
>
>> On Oct 1, 2018, at 3:16 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> (my vote, somewhat biased - is that I'd love to see more investment in
>> Bloaty (to keep all these sort of size analysis tools and tricks in one
>> place), but sort of accept folks are probably going to keep building more
>> infrastructure for this sort of thing in LLVM directly)
>>
>>
>> I get where that comes from, but it seems a bit like a Valgrind versus
>> sanitizer argument: integrating with the toolchain gives you things you
>> can’t really get otherwise. Valgrind is still great as a self-standing
>> thing.
>>
>
> Not sure that's quite the same though - with sanitizer integrating with
> the optimizers is the key here.
>
> With bloaty - it could, at worst, use LLVM's libDebugInfo as a library to
> implement the more advanced debug-using features without being less
> functional than an in-LLVM implementation.
>
>
> I’m a bit biased too, but fwiw: my preference would be to add a new size
> analysis tool to llvm.
>
> Such a tool might grow to depend on code for object file parsing, debug
> info parsing, demangling, and disassembling (all of which bloaty either
> reimplements or pulls in). Living in-tree should make it easier to pick up
> bug fixes in these dependencies and reduce maintenance overhead.
>
> While I really like bloaty, my impression is that it’d be better to
> implement the functionality I’d like to use in a new tool.
>
> vedant
>
>
>
> - Dave
>
>
>>
>>
>> On Wed, Sep 26, 2018 at 12:03 PM Vedant Kumar <vsk at apple.com> wrote:
>>
>>> Hello,
>>>
>>> I worked on a code size analysis tool for a 'week of code' project and
>>> think
>>> that it might be useful enough to upstream.
>>>
>>> The tool is inspired by bloaty (https://github.com/google/bloaty), but
>>> tries to
>>> do more to attribute code size in actionable ways.
>>>
>>> For example, it can calculate how many bytes inlined instances of a
>>> function
>>> added to a binary. In its diff mode, it can show how much more
>>> aggressively a
>>> function was inlined compared to a baseline. This can be useful when
>>> you're,
>>> say, trying to figure out why firmware compiled by a new compiler is
>>> just a few
>>> bytes over the size limit imposed by your embedded device :). In this
>>> case,
>>> extra information about inlining can help inform a decision to either
>>> tweak the
>>> inliner's cost model or to judiciously add a few `noinline` attributes.
>>> (Note
>>> that if you're willing to recompile & write a few SQL queries,
>>> optimization
>>> remarks can give you similar information, albeit at the IR level.)
>>>
>>> As another example, this code size tool can attribute code size to
>>> semantically
>>> interesting groups of code, like C++/Swift classes, or files. In the
>>> diff mode,
>>> you can see how the code size of a class/file grew compared to a
>>> baseline. The
>>> tool understands inheritance, so you can also see interesting high-level
>>> trends.
>>> E.g `clang::Sema` grew more than `llvm::Pass` between clang-6 and
>>> clang-7.
>>>
>>> Unlike bloaty, this tool focuses exclusively on the text segment. Also
>>> unlike
>>> bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool
>>> is
>>> currently implemented as a sub-tool of llvm-dwarfdump.
>>>
>>> To get size information about a program, you do:
>>>
>>>   llvm-dwarfdump size-info -baseline <object> -stats-dir <dir>
>>>
>>> This emits four *.stats files into <dir>, each containing a distinct
>>> 'view' into
>>> the code groups in <object>. There's a file view, a function view, a
>>> class view,
>>> and an inlining view. Each view is sorted by code size, so you can see
>>> the
>>> largest functions/classes/etc immediately.
>>>
>>> The *.stats files are just human-readable text files. As it happens,
>>> they use
>>> the flamegraph format (http://brendangregg.com/flamegraphs.html). This
>>> makes it
>>> easy to visualize any view as a flamegraph. (If you haven't seen one
>>> before,
>>> it's a hierarchical visualization where the width of each entry
>>> corresponds to
>>> its frequency (or in this case size).)
>>>
>>> To look at code growth between two programs, you'd do:
>>>
>>>   llvm-dwarfdump size-info -baseline <object> -target <object>
>>> -stats-dir <dir>
>>>
>>> Similarly, this emits four 'view' files into <dir>, but with a
>>> *.diffstats
>>> suffix. The format is the same.
>>>
>>> Pending Work
>>> ------------
>>>
>>> I think the main piece of work the tool needs is better testing.
>>> Currently
>>> there's just a single end-to-end test in clang. It might be better to
>>> check in
>>> a few binaries so we can check that the tool reports sizes correctly.
>>>
>>> Also, it may turn out that folks are interested in different ways of
>>> visualizing
>>> size data. While the textual format of flamegraphs is really convenient
>>> for
>>> humans to read, the graphs themselves do make more sense when the
>>> underlying
>>> data have a frequentist interpretation. If there's enough interest I can
>>> explore
>>> using an alternative format for visualization, e.g:
>>>
>>>   http://neugierig.org/software/chromium/bloat/
>>>   https://github.com/evmar/webtreemap
>>>
>>> (Thanks JF for pointing these out!)
>>>
>>> Here's a link to the source code:
>>>
>>>   https://github.com/vedantk/llvm-project/tree/sizeinfo
>>>
>>> Selected Examples
>>> -----------------
>>>
>>> Here are a few interesting snippets from a comparison of clang-6 vs.
>>> clang-7.
>>>
>>> First, let's take a look at the function view diffstat. Here are the 10
>>> functions which grew in size the most. On the left hand side, you'll see
>>> the
>>> demangled function name. The *change* in code size in bytes is reported
>>> on the
>>> right hand side (only positive changes are reported).
>>>
>>>   clang::Sema::CheckHexagonBuiltinCpu([snip]) [function] 170316
>>>   ProcessDeclAttribute([snip]) [function] 125893
>>>   llvm::AArch64InstPrinter::printAliasInstr([snip]) [function] 105133
>>>   llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function]
>>> 105133
>>>   ParseCodeGenArgs([snip]) [function] 64692
>>>   unswitchNontrivialInvariants([snip]) [function] 40180
>>>   getAttrKind([snip]) [function] 35811
>>>   clang::DumpCompilerOptionsAction::ExecuteAction() [function] 32417
>>>   llvm::UpgradeIntrinsicCall([snip]) [function] 30239
>>>   bool llvm::InstructionSelector::executeMatchTable<(anonymous
>>> namespace)::ARMInstructionSelector const, [snip]) const [function] 29352
>>>
>>>
>>> Next, let's look at the file view diffstat. This can be useful because
>>> it goes
>>> beyond simply identifying the files which grew the most. It actually
>>> describes
>>> which *functions* grew the most in those files, creating more
>>> opportunites to
>>> do something about the code growth.
>>>
>>>   lib/Target/X86/X86ISelLowering.cpp
>>> [file];combineX86ShuffleChain([snip]) [function] 24864
>>>   lib/Target/X86/X86ISelLowering.cpp [file];combineMul([snip])
>>> [function] 14907
>>>   lib/Target/X86/X86ISelLowering.cpp [file];combineStore([snip])
>>> [function] 12220
>>>   ...
>>>   tools/clang/lib/Sema/SemaExpr.cpp
>>> [file];clang::Sema::CheckCompareOperands([snip]) [function] 16024
>>>   tools/clang/lib/Sema/SemaExpr.cpp
>>> [file];diagnoseTautologicalComparison([snip]) [function] 1740
>>>   tools/clang/lib/Sema/SemaExpr.cpp
>>> [file];clang::Sema::ActOnNumericConstant([snip]) [function] 1436
>>>   tools/clang/lib/Sema/SemaExpr.cpp
>>> [file];checkThreeWayNarrowingConversion([snip]) [function] 1356
>>>   tools/clang/lib/Sema/SemaExpr.cpp
>>> [file];CheckIdentityFieldAssignment([snip]) [function] 1280
>>>
>>>
>>> The class view diffstat is a bit different because it has more levels of
>>> nesting than the other views, due to inheritance. This might help give a
>>> sense
>>> for the high-level changes in a program, but may also be less actionable.
>>>
>>>   clang::Sema [class];clang::Sema::CheckHexagonBuiltinCpu([snip])
>>> [function] 170316
>>>   clang::Sema [class];clang::Sema::CheckHexagonBuiltinArgument([snip])
>>> [function] 24156
>>>   clang::Sema [class];clang::Sema::ActOnTag([snip]) [function] 22373
>>>   ...
>>>   llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter
>>> [class];llvm::AArch64AppleInstPrinter::printAliasInstr([snip]) [function]
>>> 105133
>>>   llvm::AArch64InstPrinter [class];llvm::AArch64AppleInstPrinter
>>> [class];llvm::AArch64AppleInstPrinter::printInstruction([snip]) [function]
>>> 5824
>>>   ...
>>>   llvm::Pass [class];llvm::FunctionPass
>>> [class];llvm::MachineFunctionPass
>>> [class];(anon)::X86SpeculativeLoadHardeningPass [class];(anonymous
>>> namespace)::X86SpeculativeLoadHardeningPass::checkAllLoads(llvm::MachineFunction&)
>>> [function] 19287
>>>   ...
>>>   llvm::Pass [class];llvm::FunctionPass
>>> [class];llvm::MachineFunctionPass [class];(anon)::MachineLICMBase
>>> [class];(anonymous
>>> namespace)::MachineLICMBase::runOnMachineFunction(llvm::MachineFunction&)
>>> [function] 20343
>>>
>>> Here's a link to a flamegraph of the class view diffstat (warning: it's
>>> big):
>>>
>>>
>>> http://net.vedantk.com/static/llvm/swift-clang-4.2-vs-5.0.class-view.diffstats.svg
>>>
>>> Finally, here are a few interesting entries from the inlining view
>>> diffstat. As
>>> with all of the other views, the right hand side still shows code growth
>>> in
>>> bytes. For a given inlining target, this size is computed by diffing the
>>> sum of
>>> PC range lengths from all DW_TAG_inlined_subroutines referring to that
>>> target.
>>> This allows the size tool to attribute code size to an inlining target
>>> even
>>> when the inlined code is not contiguous in the caller.
>>>
>>>   llvm::raw_ostream::operator<<(char const*) [inlining-target] 66720
>>>   llvm::MCRegisterClass::contains(unsigned int) const [inlining-target]
>>> 64161
>>>   llvm::StringRef::StringRef(char const*) [inlining-target] 39262
>>>   llvm::MCInst::getOperand(unsigned int) const [inlining-target] 33268
>>>   clang::CodeCompletionResult::~CodeCompletionResult() [inlining-target]
>>> 25763
>>>   llvm::operator+(llvm::Twine const&, llvm::Twine const&)
>>> [inlining-target] 25525
>>>   clang::ASTImporter::Import(clang::SourceLocation) [inlining-target]
>>> 21096
>>>   clang::Sema::Diag(clang::SourceLocation, unsigned int)
>>> [inlining-target] 20898
>>>
>>> Feedback & questions welcome!
>>>
>>> thanks,
>>> vedant
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181001/1363512e/attachment.html>


More information about the llvm-dev mailing list