[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking

Fangrui Song via llvm-dev llvm-dev at lists.llvm.org
Tue Mar 23 11:50:16 PDT 2021


On 2021-03-23, Zequan Wu wrote:
>The size increase of chrome on Linux by switching from all ICF to safe ICF
>is small.
>All ICF:
> text    data     bss     dec     hex filename
>169314343       8472660 2368965 180155968       abcf640 chrome
>Safe ICF:
> text    data     bss     dec     hex filename
>174521550       8497604 2368965 185388119       b0ccc57 chrome
>
>On Windows, chrome.dll increases size by around 14 MB (12MB increases in
>.text section).
>All ICF:
>Size of out\Default\chrome.dll is 170.715648 MB
>      name:   mem size  ,  disk size
>     .text: 141.701417 MB
>    .rdata: 22.458476 MB
>     .data:  3.093948 MB,  0.523264 MB
>    .pdata:  4.412364 MB
>    .00cfg:  0.000040 MB
>  .gehcont:  0.000132 MB
>  .retplne:  0.000108 MB
>   .rodata:  0.004544 MB
>      .tls:  0.000561 MB
>  CPADinfo:  0.000056 MB
>    _RDATA:  0.000244 MB
>     .rsrc:  0.285232 MB
>    .reloc:  1.324196 MB
>Safe ICF:
>Size of out\icf-safe\chrome.dll is 184.499712 MB
>      name:   mem size  ,  disk size
>     .text: 153.809529 MB
>    .rdata: 23.123628 MB
>     .data:  3.093948 MB,  0.523264 MB
>    .pdata:  5.367396 MB
>    .00cfg:  0.000040 MB
>  .gehcont:  0.000132 MB
>  .retplne:  0.000108 MB
>   .rodata:  0.004544 MB
>      .tls:  0.000561 MB
>  CPADinfo:  0.000056 MB
>    _RDATA:  0.000244 MB
>     .rsrc:  0.285232 MB
>    .reloc:  1.379364 MB
>
>If an attribute is used and it affects unnamed_addr of a symbol, it
>determines whether the symbols should show up in the .addrsig table.
>All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig table,
>if they belong to code sections. So, it won't have an effect on disabling
>ICF.

Is there something which can be improved on lld-link?  Note also that
the size difference between gold --icf={safe,all} is smaller than the
difference between ld.lld --icf={safe,all}.  Some may be due to gold
performing "unsafe" safe ICF, some may suggest places where clang fails
to add {,local_}unnamed_addr.


>On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray at google.com> wrote:
>
>>
>> On 2021-03-22, David Blaikie via llvm-dev wrote:
>> >ICF: Identical Code Folding
>> >
>> >Linker deduplicates functions by collapsing any identical functions
>> >together - with icf=safe, the linker looks at a .addressing section in the
>> >object file and any functions listed in that section are not treated as
>> >collapsible (eg: because they need to meet C++'s "distinct functions have
>> >distinct addresses" guarantee)
>>
>> The name originated from MSVC link.exe where icf stands for "identical
>> COMDAT folding".
>> gold named it "identical code folding" - which makes some sense because
>> gold does not fold readonly data.
>>
>> In LLD, the name is not accurate for two reasons: (1) the feature can
>> apply to readonly data as well; (2) the folding is by section, not by
>> function.
>>
>> We define identical sections as they have identical content and their
>> outgoing relocation sets cannot be distinguished: they need to have the
>> same number of relocations, with the same relative locations, with the
>> referenced symbols indistinguishable.
>>
>> Then, ld.lld --icf={safe,all} works like this:
>>
>> For a set of identical sections, the linker picks one representative and
>> drops the rest, then redirects references to the representative.
>>
>> Note: this can confuse debuggers/symbolizers/profilers easily.
>>
>> lld-link /opt:icf is different from ld.lld --icf but I haven't looked
>> into it closely.
>>
>>
>> I find that the feature's saving is small given its downside
>> (also increaded link time: the current LLD's implementation is inferior:
>> it performs a quadratic number of comparisons among an equality class):
>>
>> This is the size differences for the 'lld' executable:
>>
>> % size lld.{none,safe,all}
>>     text    data     bss     dec     hex filename
>> 96821040        7210504  550810 104582354       63bccd2 lld.none
>> 95217624        7167656  550810 102936090       622ae1a lld.safe
>> 94038808        7167144  550810 101756762       610af5a lld.all
>> % size gold.{none,safe,all}
>>     text    data     bss     dec     hex filename
>> 96857302        7174792  550825 104582919       63bcf07 gold.none
>> 94469390        7174792  550825 102195007       6175f3f gold.safe
>> 94184430        7174792  550825 101910047       613061f gold.all
>>
>> Note that the --icf=all result caps the potential saving of the proposed
>> annotation.
>>
>> Actually with some large internal targets I get even smaller savings.
>>
>>
>> ld.lld --icf=safe is safer than gold --icf=safe but probably misses some
>> opportunities.
>> It can be that clang codegen/optimizer fail to mark some cases as
>> {,local_}unnamed_addr.
>>
>> I know Chromium and the Windows world can be different:) But I'd still
>> want to
>> get some numbers first.
>>
>>
>> Last, I have seen that Chromium has some code like
>>
>> https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp
>>
>>    void sk_abort_no_print() {
>>        // Linker's ICF feature may merge this function with other
>> functions with
>>        // the same definition (e.g. any function whose sole job is to call
>> abort())
>>        // and it may confuse the crash report processing system.
>>        // http://crbug.com/860850
>>        static int static_variable_to_make_this_function_unique = 0x736b;
>> // "sk"
>>        base::debug::Alias(&static_variable_to_make_this_function_unique);
>>
>>        abort();
>>    }
>>
>> If we want an approach to work with link.exe, I don't know what we can
>> do...
>> If no desire for link.exe compatibility, I can see that having a proper
>> way marking the function
>> can be useful... but in any case if an attribute is used, it probably
>> should affect
>> unnamed_addr directly instead of being called *icf*.
>>
>>
>>
>> >On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev <
>> >llvm-dev at lists.llvm.org> wrote:
>> >
>> >> Can you define ICF please?  And give a bit of context?
>> >>
>> >> Philip
>> >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote:
>> >>
>> >> Hi all,
>> >>
>> >> Background:
>> >> It's been a longstanding difficulty of debugging with ICF. Programmers
>> >> don't have control over which sections should be folded by ICF, which
>> >> sections shouldn't. The existing address significant table won't have
>> >> effect for code sections during all ICF mode in both ld.lld and
>> lld-link.
>> >> By switching to safe ICF could mark code sections as unique, but at a
>> cost
>> >> of increasing binary size out of control. So, it would be good if
>> >> programmers could selectively disable ICF in source code by annotating
>> >> global functions/variables with an attribute to improve debugging
>> >> experience and have the control on the binary size increase.
>> >>
>> >> My plan is to add a new section table(`.no_icf`) to object files.
>> Sections
>> >> of all symbols inside the table should not be folded by all ICF mode.
>> And
>> >> symbols can only be added into the table by annotating global
>> >> functions/variables with a new attribute(`no_icf`) in source code.
>> >>
>> >> What do you think about this approach?
>> >>
>> >> Thanks,
>> >> Zequan
>> >>
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://
>> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>
>>
>> >_______________________________________________
>> >LLVM Developers mailing list
>> >llvm-dev at lists.llvm.org
>> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>


More information about the llvm-dev mailing list