[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking

Reid Kleckner via llvm-dev llvm-dev at lists.llvm.org
Wed Apr 7 10:31:50 PDT 2021


Thanks for gathering the data, Zequan. For more context, I'll add that
chrome.dll is around 163MB, so saving 6MB is around 3.5% of binary size. I
feel like this is a compelling use case for aggressive ICF.

Chrome has historically used MSVC ICF, which is aggressive, and it can make
debugging challenging. We're looking for a way that we can selectively
power down ICF for functions that developers are interested in. I think it
would be reasonable to have a compiler flag that emits an extra section
with the same format as llvm_addrsig, with the semantics that it blocks ICF
even in the aggressive mode.

For the user interface, we can either invent a new attribute, (noicf,
blockicf, disable_icf), document that it only works with linkers that
support the new section (LLD), and go that way. Alternatively we can
overload an existing attribute like __attribute__((used)). I think I favor
the new attribute. It makes it easier to search and find the relevant
documentation.

Does that seem reasonable?

On Wed, Apr 7, 2021 at 10:06 AM Zequan Wu <zequanwu at google.com> wrote:

> I got some data for the size increase from safe icf to all icf. The 14 MB
> I initially found was due to .addrsig not emitted on COFF thinlto.
> After enabling it, the size of chrome.dll is increased by 6 MB when
> switching from all icf to safe icf on windows and 5 MB increase on linux.
>
> Here is a breakdown of chrome.dll section size increase on Windows.
> 5574656 bytes increase in .text
> 643072 bytes increase in .rdata
> 573440 bytes increase in .pdata
> 57344 bytes increase in .reloc
> 407152 bytes increase in rvatable
>
> I dumped the sum of section size removed and selected/removed sections
> count by icf in ELF and COFF as below.
> all-icf.win.txt:
> total size removed: 10763454 selected: 19712 removed: 153544
> safe-icf.win.txt:
> total size removed: 6122783 selected: 8653 removed: 48999
> all-icf.linux.txt:
> total size removed: 10423699 selected: 22254 removed: 158806
> safe-icf.linux.txt:
> total size removed: 6165470 selected: 9313 removed: 53981
>
> There is not much difference. So, safe icf is performing the same on linux
> and windows. I think we have enough motivation to add a new section for
> disabling icf.
>
> On Thu, Mar 25, 2021 at 10:57 AM Reid Kleckner <rnk at google.com> wrote:
>
>> I should add, though, that I agree with the final direction here: if the
>> cost of safe ICF is low, it is much better to just use it instead of adding
>> a whole new feature requiring documentation, etc. We need new measurements
>> to show if the feature is justified.
>>
>> On Thu, Mar 25, 2021 at 10:55 AM Reid Kleckner <rnk at google.com> wrote:
>>
>>> This is a bit of a hot take, but I'd rather make aggressive ICF (fold
>>> any identical code) the norm and scrap the C++ language guarantee about
>>> function pointer identity. I don't participate in C++ standardization, so
>>> my opinion doesn't carry much weight, but I lean towards giving the
>>> optimizer a free hand here. There are a number of other ways in which
>>> function pointer identity breaks down. Consider
>>> -fvisibility-inlines=hidden, which if we are being real, is the only way to
>>> compile big C++ projects, even those with explicit visibility annotations.
>>> Function pointer identity is just fragile and easy to break, and I don't
>>> encourage people to rely on it. If you need some kind of globally unique
>>> identifier, it's pretty straightforward to use data instead of code.
>>>
>>> On Tue, Mar 23, 2021 at 2:55 PM James Y Knight via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> In general, I don't like the "icf=all" mode. It's not nice to have the
>>>> linker break semantics of the program, and I think we should not encourage
>>>> its use. (And, in particular, adding new features just for it seems like a
>>>> bad idea.) Instead, we should make sure that icf=safe works as well as it
>>>> possibly can, and encourage its use.
>>>>
>>>> It's not possible to conclude anything based only on the size results,
>>>> but it does indicate that some further investigation may be a good idea, to
>>>> look into *why* icf=safe is apparently less-effective on the windows
>>>> binary vs the linux one.
>>>>
>>>>
>>>> On Tue, Mar 23, 2021 at 2:50 PM Fangrui Song via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>>
>>>>> On 2021-03-23, Zequan Wu wrote:
>>>>> >The size increase of chrome on Linux by switching from all ICF to
>>>>> safe ICF
>>>>> >is small.
>>>>> >All ICF:
>>>>> > text    data     bss     dec     hex filename
>>>>> >169314343       8472660 2368965 180155968       abcf640 chrome
>>>>> >Safe ICF:
>>>>> > text    data     bss     dec     hex filename
>>>>> >174521550       8497604 2368965 185388119       b0ccc57 chrome
>>>>> >
>>>>> >On Windows, chrome.dll increases size by around 14 MB (12MB increases
>>>>> in
>>>>> >.text section).
>>>>> >All ICF:
>>>>> >Size of out\Default\chrome.dll is 170.715648 MB
>>>>> >      name:   mem size  ,  disk size
>>>>> >     .text: 141.701417 MB
>>>>> >    .rdata: 22.458476 MB
>>>>> >     .data:  3.093948 MB,  0.523264 MB
>>>>> >    .pdata:  4.412364 MB
>>>>> >    .00cfg:  0.000040 MB
>>>>> >  .gehcont:  0.000132 MB
>>>>> >  .retplne:  0.000108 MB
>>>>> >   .rodata:  0.004544 MB
>>>>> >      .tls:  0.000561 MB
>>>>> >  CPADinfo:  0.000056 MB
>>>>> >    _RDATA:  0.000244 MB
>>>>> >     .rsrc:  0.285232 MB
>>>>> >    .reloc:  1.324196 MB
>>>>> >Safe ICF:
>>>>> >Size of out\icf-safe\chrome.dll is 184.499712 MB
>>>>> >      name:   mem size  ,  disk size
>>>>> >     .text: 153.809529 MB
>>>>> >    .rdata: 23.123628 MB
>>>>> >     .data:  3.093948 MB,  0.523264 MB
>>>>> >    .pdata:  5.367396 MB
>>>>> >    .00cfg:  0.000040 MB
>>>>> >  .gehcont:  0.000132 MB
>>>>> >  .retplne:  0.000108 MB
>>>>> >   .rodata:  0.004544 MB
>>>>> >      .tls:  0.000561 MB
>>>>> >  CPADinfo:  0.000056 MB
>>>>> >    _RDATA:  0.000244 MB
>>>>> >     .rsrc:  0.285232 MB
>>>>> >    .reloc:  1.379364 MB
>>>>> >
>>>>> >If an attribute is used and it affects unnamed_addr of a symbol, it
>>>>> >determines whether the symbols should show up in the .addrsig table.
>>>>> >All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig
>>>>> table,
>>>>> >if they belong to code sections. So, it won't have an effect on
>>>>> disabling
>>>>> >ICF.
>>>>>
>>>>> Is there something which can be improved on lld-link?  Note also that
>>>>> the size difference between gold --icf={safe,all} is smaller than the
>>>>> difference between ld.lld --icf={safe,all}.  Some may be due to gold
>>>>> performing "unsafe" safe ICF, some may suggest places where clang fails
>>>>> to add {,local_}unnamed_addr.
>>>>>
>>>>>
>>>>> >On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray at google.com>
>>>>> wrote:
>>>>> >
>>>>> >>
>>>>> >> On 2021-03-22, David Blaikie via llvm-dev wrote:
>>>>> >> >ICF: Identical Code Folding
>>>>> >> >
>>>>> >> >Linker deduplicates functions by collapsing any identical functions
>>>>> >> >together - with icf=safe, the linker looks at a .addressing
>>>>> section in the
>>>>> >> >object file and any functions listed in that section are not
>>>>> treated as
>>>>> >> >collapsible (eg: because they need to meet C++'s "distinct
>>>>> functions have
>>>>> >> >distinct addresses" guarantee)
>>>>> >>
>>>>> >> The name originated from MSVC link.exe where icf stands for
>>>>> "identical
>>>>> >> COMDAT folding".
>>>>> >> gold named it "identical code folding" - which makes some sense
>>>>> because
>>>>> >> gold does not fold readonly data.
>>>>> >>
>>>>> >> In LLD, the name is not accurate for two reasons: (1) the feature
>>>>> can
>>>>> >> apply to readonly data as well; (2) the folding is by section, not
>>>>> by
>>>>> >> function.
>>>>> >>
>>>>> >> We define identical sections as they have identical content and
>>>>> their
>>>>> >> outgoing relocation sets cannot be distinguished: they need to have
>>>>> the
>>>>> >> same number of relocations, with the same relative locations, with
>>>>> the
>>>>> >> referenced symbols indistinguishable.
>>>>> >>
>>>>> >> Then, ld.lld --icf={safe,all} works like this:
>>>>> >>
>>>>> >> For a set of identical sections, the linker picks one
>>>>> representative and
>>>>> >> drops the rest, then redirects references to the representative.
>>>>> >>
>>>>> >> Note: this can confuse debuggers/symbolizers/profilers easily.
>>>>> >>
>>>>> >> lld-link /opt:icf is different from ld.lld --icf but I haven't
>>>>> looked
>>>>> >> into it closely.
>>>>> >>
>>>>> >>
>>>>> >> I find that the feature's saving is small given its downside
>>>>> >> (also increaded link time: the current LLD's implementation is
>>>>> inferior:
>>>>> >> it performs a quadratic number of comparisons among an equality
>>>>> class):
>>>>> >>
>>>>> >> This is the size differences for the 'lld' executable:
>>>>> >>
>>>>> >> % size lld.{none,safe,all}
>>>>> >>     text    data     bss     dec     hex filename
>>>>> >> 96821040        7210504  550810 104582354       63bccd2 lld.none
>>>>> >> 95217624        7167656  550810 102936090       622ae1a lld.safe
>>>>> >> 94038808        7167144  550810 101756762       610af5a lld.all
>>>>> >> % size gold.{none,safe,all}
>>>>> >>     text    data     bss     dec     hex filename
>>>>> >> 96857302        7174792  550825 104582919       63bcf07 gold.none
>>>>> >> 94469390        7174792  550825 102195007       6175f3f gold.safe
>>>>> >> 94184430        7174792  550825 101910047       613061f gold.all
>>>>> >>
>>>>> >> Note that the --icf=all result caps the potential saving of the
>>>>> proposed
>>>>> >> annotation.
>>>>> >>
>>>>> >> Actually with some large internal targets I get even smaller
>>>>> savings.
>>>>> >>
>>>>> >>
>>>>> >> ld.lld --icf=safe is safer than gold --icf=safe but probably misses
>>>>> some
>>>>> >> opportunities.
>>>>> >> It can be that clang codegen/optimizer fail to mark some cases as
>>>>> >> {,local_}unnamed_addr.
>>>>> >>
>>>>> >> I know Chromium and the Windows world can be different:) But I'd
>>>>> still
>>>>> >> want to
>>>>> >> get some numbers first.
>>>>> >>
>>>>> >>
>>>>> >> Last, I have seen that Chromium has some code like
>>>>> >>
>>>>> >>
>>>>> https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp
>>>>> >>
>>>>> >>    void sk_abort_no_print() {
>>>>> >>        // Linker's ICF feature may merge this function with other
>>>>> >> functions with
>>>>> >>        // the same definition (e.g. any function whose sole job is
>>>>> to call
>>>>> >> abort())
>>>>> >>        // and it may confuse the crash report processing system.
>>>>> >>        // http://crbug.com/860850
>>>>> >>        static int static_variable_to_make_this_function_unique =
>>>>> 0x736b;
>>>>> >> // "sk"
>>>>> >>
>>>>> base::debug::Alias(&static_variable_to_make_this_function_unique);
>>>>> >>
>>>>> >>        abort();
>>>>> >>    }
>>>>> >>
>>>>> >> If we want an approach to work with link.exe, I don't know what we
>>>>> can
>>>>> >> do...
>>>>> >> If no desire for link.exe compatibility, I can see that having a
>>>>> proper
>>>>> >> way marking the function
>>>>> >> can be useful... but in any case if an attribute is used, it
>>>>> probably
>>>>> >> should affect
>>>>> >> unnamed_addr directly instead of being called *icf*.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> >On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev <
>>>>> >> >llvm-dev at lists.llvm.org> wrote:
>>>>> >> >
>>>>> >> >> Can you define ICF please?  And give a bit of context?
>>>>> >> >>
>>>>> >> >> Philip
>>>>> >> >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote:
>>>>> >> >>
>>>>> >> >> Hi all,
>>>>> >> >>
>>>>> >> >> Background:
>>>>> >> >> It's been a longstanding difficulty of debugging with ICF.
>>>>> Programmers
>>>>> >> >> don't have control over which sections should be folded by ICF,
>>>>> which
>>>>> >> >> sections shouldn't. The existing address significant table won't
>>>>> have
>>>>> >> >> effect for code sections during all ICF mode in both ld.lld and
>>>>> >> lld-link.
>>>>> >> >> By switching to safe ICF could mark code sections as unique, but
>>>>> at a
>>>>> >> cost
>>>>> >> >> of increasing binary size out of control. So, it would be good if
>>>>> >> >> programmers could selectively disable ICF in source code by
>>>>> annotating
>>>>> >> >> global functions/variables with an attribute to improve debugging
>>>>> >> >> experience and have the control on the binary size increase.
>>>>> >> >>
>>>>> >> >> My plan is to add a new section table(`.no_icf`) to object files.
>>>>> >> Sections
>>>>> >> >> of all symbols inside the table should not be folded by all ICF
>>>>> mode.
>>>>> >> And
>>>>> >> >> symbols can only be added into the table by annotating global
>>>>> >> >> functions/variables with a new attribute(`no_icf`) in source
>>>>> code.
>>>>> >> >>
>>>>> >> >> What do you think about this approach?
>>>>> >> >>
>>>>> >> >> Thanks,
>>>>> >> >> Zequan
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> _______________________________________________
>>>>> >> >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://
>>>>> >> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> >> >>
>>>>> >> >> _______________________________________________
>>>>> >> >> LLVM Developers mailing list
>>>>> >> >> llvm-dev at lists.llvm.org
>>>>> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> >> >>
>>>>> >>
>>>>> >> >_______________________________________________
>>>>> >> >LLVM Developers mailing list
>>>>> >> >llvm-dev at lists.llvm.org
>>>>> >> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> >>
>>>>> >>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210407/3bf12d12/attachment.html>


More information about the llvm-dev mailing list