<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Mar 29, 2014 at 4:02 AM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5"><br>

On 2014 Mar 28, at 15:47, Eric Christopher <<a href="mailto:echristo@gmail.com">echristo@gmail.com</a>> wrote:<br>

<br>

> On Fri, Mar 28, 2014 at 3:18 PM, Duncan P. N. Exon Smith<br>

> <<a href="mailto:dexonsmith@apple.com">dexonsmith@apple.com</a>> wrote:<br>

>><br>

>> On 2014 Mar 28, at 14:59, Bob Wilson <<a href="mailto:bob.wilson@apple.com">bob.wilson@apple.com</a>> wrote:<br>

>><br>

>>><br>

>>> On Mar 28, 2014, at 1:33 AM, Kostya Serebryany <<a href="mailto:kcc@google.com">kcc@google.com</a>> wrote:<br>

>>><br>

>>>> Some more data on code size.<br>

>>>><br>

>>>> I've build CPU2006/483.xalancbmk with<br>

>>>> a) -O2 -fsanitize=address -m64 -gline-tables-only -mllvm -asan-coverage=1<br>

>>>> b) -O2 -fsanitize=address -m64 -gline-tables-only -fprofile-instr-generate<br>

>>>><br>

>>>> The first is 27Mb and the second is 48Mb.<br>

>>>><br>

>>>> The extra size comes from __llvm_prf_* sections.<br>

>>>> You may be able to make these sections more compact, but you will not make them tiny.<br>

>>>><br>

>>>> The instrumentation code generated by -asan-coverage=1 is less efficient than -fprofile-instr-generate<br>

>>>> in several ways (slower, fatter, provides less data).<br>

>>>> But it does not add any extra sections to the binary and wins in the overall binary size.<br>

>>>> Ideally, I'd like to see such options for -fprofile-instr-generate as well.<br>

>>>><br>

>>>> --kcc<br>

>>><br>

>>> It might make sense to move at least some of the counters into the .bss section so they don't take up space in the executable.<br>

>>><br>

>>> We're also seeing that the instrumentation bloats the code size much more than expected and we're still investigating to see why that is the case.<br>

>><br>

>> The __llvm_prf_cnts section is likely the largest.  It's zero-initialized,<br>

>> so it's a good candidate for .bss or similar.  The counters are currently in<br>

>> their own section to make write out easy (just one big array), but we could<br>

>> change that.  Or, is there linker magic that can make a special section<br>

>> behave like the .bss?<br>

><br>

> Possibly. The zerofill stuff is a bit weird, but you should be able to<br>

> specify a large enough block to zerofill and a concrete section. A<br>

> separate section with the S_ZEROFILL attribute would probably work to<br>

> get it all initialized and just switch sections and not use the<br>

> zerofill directive.<br>

><br>

> -eric<br>

<br>

</div></div>Heh, I'm a little lost here.  Where can we specify this?  I had a look<br>

in MCSectionMachO.cpp, and S_ZEROFILL isn't accessible from LLVM IR.<br>

Should we add logic somewhere to recognize these sections?  Will that<br>

actually reduce the executable size?  (I tried hacking it in but that<br>

didn't seem to save disk space.)<br>

<br>

Also, this doesn't solve ELF.  Can we do similar things there?<br>

<br>

For clarity, there are three __llvm_prf_* sections.  Without having<br>

seen Kostya's data, I'm speculating that __llvm_prf_cnts is the largest<br>

section.<br></blockquote><div><br></div><div>This is what I see on 483.xalancbmk:</div><div><div> 15 __llvm_prf_names 0036fcb0  00000000010abf40  00000000010abf40  00cabf40  2**5</div><div> 16 __llvm_prf_data 001b77e0  000000000141bc00  000000000141bc00  0101bc00  2**5</div>

<div> 32 __llvm_prf_cnts 00123468  0000000001af2fe0  0000000001af2fe0  014f2fe0  2**5</div></div><div><br></div><div>__llvm_prf_names is larger than the other two combined. </div><div>483.xalancbmk is C++ and the function names are long. The same is true for most of the code we care about.<br>

</div><div>Can't we use function PCs instead of function names in this table (and retrieve the function names from debug info)?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<br>

  - __llvm_prf_data is 32B per function, and has pointers into the<br>

    other two sections.  This section is necessary to avoid static<br>

    initialization (implemented on Darwin, not quite on ELF).<br>

<br>

  - __llvm_prf_names is the mangled name of every function.  It should<br>

    be on the same order of magnitude as __llvm_prf_data.  This<br>

    section is convenient for writing out the profile, since the names<br>

    are effectively placed in one big char array whose bounds are known<br>

    at link time.<br>

<br>

  - __llvm_prf_cnts is 8B per region counter.  Each function has one<br>

    at entry and roughly one per CFG-relevant AST node (for, if, etc.).<br>

    This section is convenient for writing out the profile, since the<br>

    counters are effectively placed in one big array whose bounds are<br>

    known at link time.  However, I don't think the data in this<br>

    section needs to be explicitly stored in the executable if we can<br>

    somehow make it act like .bss (or the like).<br></blockquote><div><br></div><div>Why can't we simply create this buffer at startup? </div><div>We solve similar task in asan, so it's definitely possible. </div>

<div><br></div><div>--kcc </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

In the latter two cases, it's possible to avoid the special sections if<br>

there are good reasons, but it will add runtime overhead.<br>

</blockquote></div><br></div></div>