[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking

Fangrui Song via llvm-dev llvm-dev at lists.llvm.org
Mon Mar 22 22:18:57 PDT 2021


On 2021-03-22, David Blaikie via llvm-dev wrote:
>ICF: Identical Code Folding
>
>Linker deduplicates functions by collapsing any identical functions
>together - with icf=safe, the linker looks at a .addressing section in the
>object file and any functions listed in that section are not treated as
>collapsible (eg: because they need to meet C++'s "distinct functions have
>distinct addresses" guarantee)

The name originated from MSVC link.exe where icf stands for "identical COMDAT folding".
gold named it "identical code folding" - which makes some sense because gold does not fold readonly data.

In LLD, the name is not accurate for two reasons: (1) the feature can
apply to readonly data as well; (2) the folding is by section, not by function.

We define identical sections as they have identical content and their
outgoing relocation sets cannot be distinguished: they need to have the
same number of relocations, with the same relative locations, with the
referenced symbols indistinguishable.

Then, ld.lld --icf={safe,all} works like this:

For a set of identical sections, the linker picks one representative and
drops the rest, then redirects references to the representative.

Note: this can confuse debuggers/symbolizers/profilers easily.

lld-link /opt:icf is different from ld.lld --icf but I haven't looked
into it closely.


I find that the feature's saving is small given its downside
(also increaded link time: the current LLD's implementation is inferior:
it performs a quadratic number of comparisons among an equality class):

This is the size differences for the 'lld' executable:

% size lld.{none,safe,all}
    text    data     bss     dec     hex filename
96821040        7210504  550810 104582354       63bccd2 lld.none
95217624        7167656  550810 102936090       622ae1a lld.safe
94038808        7167144  550810 101756762       610af5a lld.all
% size gold.{none,safe,all}
    text    data     bss     dec     hex filename
96857302        7174792  550825 104582919       63bcf07 gold.none
94469390        7174792  550825 102195007       6175f3f gold.safe
94184430        7174792  550825 101910047       613061f gold.all

Note that the --icf=all result caps the potential saving of the proposed annotation.

Actually with some large internal targets I get even smaller savings.


ld.lld --icf=safe is safer than gold --icf=safe but probably misses some opportunities.
It can be that clang codegen/optimizer fail to mark some cases as {,local_}unnamed_addr.

I know Chromium and the Windows world can be different:) But I'd still want to
get some numbers first.


Last, I have seen that Chromium has some code like
https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp

   void sk_abort_no_print() {
       // Linker's ICF feature may merge this function with other functions with
       // the same definition (e.g. any function whose sole job is to call abort())
       // and it may confuse the crash report processing system.
       // http://crbug.com/860850
       static int static_variable_to_make_this_function_unique = 0x736b;  // "sk"
       base::debug::Alias(&static_variable_to_make_this_function_unique);

       abort();
   }

If we want an approach to work with link.exe, I don't know what we can do...
If no desire for link.exe compatibility, I can see that having a proper way marking the function
can be useful... but in any case if an attribute is used, it probably should affect
unnamed_addr directly instead of being called *icf*.



>On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev <
>llvm-dev at lists.llvm.org> wrote:
>
>> Can you define ICF please?  And give a bit of context?
>>
>> Philip
>> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote:
>>
>> Hi all,
>>
>> Background:
>> It's been a longstanding difficulty of debugging with ICF. Programmers
>> don't have control over which sections should be folded by ICF, which
>> sections shouldn't. The existing address significant table won't have
>> effect for code sections during all ICF mode in both ld.lld and lld-link.
>> By switching to safe ICF could mark code sections as unique, but at a cost
>> of increasing binary size out of control. So, it would be good if
>> programmers could selectively disable ICF in source code by annotating
>> global functions/variables with an attribute to improve debugging
>> experience and have the control on the binary size increase.
>>
>> My plan is to add a new section table(`.no_icf`) to object files. Sections
>> of all symbols inside the table should not be folded by all ICF mode. And
>> symbols can only be added into the table by annotating global
>> functions/variables with a new attribute(`no_icf`) in source code.
>>
>> What do you think about this approach?
>>
>> Thanks,
>> Zequan
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>

>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list