[llvm-dev] [RFC] Annotating global functions and variables to prevent ICF during linking

Reid Kleckner via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 25 10:57:10 PDT 2021


I should add, though, that I agree with the final direction here: if the
cost of safe ICF is low, it is much better to just use it instead of adding
a whole new feature requiring documentation, etc. We need new measurements
to show if the feature is justified.

On Thu, Mar 25, 2021 at 10:55 AM Reid Kleckner <rnk at google.com> wrote:

> This is a bit of a hot take, but I'd rather make aggressive ICF (fold any
> identical code) the norm and scrap the C++ language guarantee about
> function pointer identity. I don't participate in C++ standardization, so
> my opinion doesn't carry much weight, but I lean towards giving the
> optimizer a free hand here. There are a number of other ways in which
> function pointer identity breaks down. Consider
> -fvisibility-inlines=hidden, which if we are being real, is the only way to
> compile big C++ projects, even those with explicit visibility annotations.
> Function pointer identity is just fragile and easy to break, and I don't
> encourage people to rely on it. If you need some kind of globally unique
> identifier, it's pretty straightforward to use data instead of code.
>
> On Tue, Mar 23, 2021 at 2:55 PM James Y Knight via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> In general, I don't like the "icf=all" mode. It's not nice to have the
>> linker break semantics of the program, and I think we should not encourage
>> its use. (And, in particular, adding new features just for it seems like a
>> bad idea.) Instead, we should make sure that icf=safe works as well as it
>> possibly can, and encourage its use.
>>
>> It's not possible to conclude anything based only on the size results,
>> but it does indicate that some further investigation may be a good idea, to
>> look into *why* icf=safe is apparently less-effective on the windows
>> binary vs the linux one.
>>
>>
>> On Tue, Mar 23, 2021 at 2:50 PM Fangrui Song via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>> On 2021-03-23, Zequan Wu wrote:
>>> >The size increase of chrome on Linux by switching from all ICF to safe
>>> ICF
>>> >is small.
>>> >All ICF:
>>> > text    data     bss     dec     hex filename
>>> >169314343       8472660 2368965 180155968       abcf640 chrome
>>> >Safe ICF:
>>> > text    data     bss     dec     hex filename
>>> >174521550       8497604 2368965 185388119       b0ccc57 chrome
>>> >
>>> >On Windows, chrome.dll increases size by around 14 MB (12MB increases in
>>> >.text section).
>>> >All ICF:
>>> >Size of out\Default\chrome.dll is 170.715648 MB
>>> >      name:   mem size  ,  disk size
>>> >     .text: 141.701417 MB
>>> >    .rdata: 22.458476 MB
>>> >     .data:  3.093948 MB,  0.523264 MB
>>> >    .pdata:  4.412364 MB
>>> >    .00cfg:  0.000040 MB
>>> >  .gehcont:  0.000132 MB
>>> >  .retplne:  0.000108 MB
>>> >   .rodata:  0.004544 MB
>>> >      .tls:  0.000561 MB
>>> >  CPADinfo:  0.000056 MB
>>> >    _RDATA:  0.000244 MB
>>> >     .rsrc:  0.285232 MB
>>> >    .reloc:  1.324196 MB
>>> >Safe ICF:
>>> >Size of out\icf-safe\chrome.dll is 184.499712 MB
>>> >      name:   mem size  ,  disk size
>>> >     .text: 153.809529 MB
>>> >    .rdata: 23.123628 MB
>>> >     .data:  3.093948 MB,  0.523264 MB
>>> >    .pdata:  5.367396 MB
>>> >    .00cfg:  0.000040 MB
>>> >  .gehcont:  0.000132 MB
>>> >  .retplne:  0.000108 MB
>>> >   .rodata:  0.004544 MB
>>> >      .tls:  0.000561 MB
>>> >  CPADinfo:  0.000056 MB
>>> >    _RDATA:  0.000244 MB
>>> >     .rsrc:  0.285232 MB
>>> >    .reloc:  1.379364 MB
>>> >
>>> >If an attribute is used and it affects unnamed_addr of a symbol, it
>>> >determines whether the symbols should show up in the .addrsig table.
>>> >All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig
>>> table,
>>> >if they belong to code sections. So, it won't have an effect on
>>> disabling
>>> >ICF.
>>>
>>> Is there something which can be improved on lld-link?  Note also that
>>> the size difference between gold --icf={safe,all} is smaller than the
>>> difference between ld.lld --icf={safe,all}.  Some may be due to gold
>>> performing "unsafe" safe ICF, some may suggest places where clang fails
>>> to add {,local_}unnamed_addr.
>>>
>>>
>>> >On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray at google.com>
>>> wrote:
>>> >
>>> >>
>>> >> On 2021-03-22, David Blaikie via llvm-dev wrote:
>>> >> >ICF: Identical Code Folding
>>> >> >
>>> >> >Linker deduplicates functions by collapsing any identical functions
>>> >> >together - with icf=safe, the linker looks at a .addressing section
>>> in the
>>> >> >object file and any functions listed in that section are not treated
>>> as
>>> >> >collapsible (eg: because they need to meet C++'s "distinct functions
>>> have
>>> >> >distinct addresses" guarantee)
>>> >>
>>> >> The name originated from MSVC link.exe where icf stands for "identical
>>> >> COMDAT folding".
>>> >> gold named it "identical code folding" - which makes some sense
>>> because
>>> >> gold does not fold readonly data.
>>> >>
>>> >> In LLD, the name is not accurate for two reasons: (1) the feature can
>>> >> apply to readonly data as well; (2) the folding is by section, not by
>>> >> function.
>>> >>
>>> >> We define identical sections as they have identical content and their
>>> >> outgoing relocation sets cannot be distinguished: they need to have
>>> the
>>> >> same number of relocations, with the same relative locations, with the
>>> >> referenced symbols indistinguishable.
>>> >>
>>> >> Then, ld.lld --icf={safe,all} works like this:
>>> >>
>>> >> For a set of identical sections, the linker picks one representative
>>> and
>>> >> drops the rest, then redirects references to the representative.
>>> >>
>>> >> Note: this can confuse debuggers/symbolizers/profilers easily.
>>> >>
>>> >> lld-link /opt:icf is different from ld.lld --icf but I haven't looked
>>> >> into it closely.
>>> >>
>>> >>
>>> >> I find that the feature's saving is small given its downside
>>> >> (also increaded link time: the current LLD's implementation is
>>> inferior:
>>> >> it performs a quadratic number of comparisons among an equality
>>> class):
>>> >>
>>> >> This is the size differences for the 'lld' executable:
>>> >>
>>> >> % size lld.{none,safe,all}
>>> >>     text    data     bss     dec     hex filename
>>> >> 96821040        7210504  550810 104582354       63bccd2 lld.none
>>> >> 95217624        7167656  550810 102936090       622ae1a lld.safe
>>> >> 94038808        7167144  550810 101756762       610af5a lld.all
>>> >> % size gold.{none,safe,all}
>>> >>     text    data     bss     dec     hex filename
>>> >> 96857302        7174792  550825 104582919       63bcf07 gold.none
>>> >> 94469390        7174792  550825 102195007       6175f3f gold.safe
>>> >> 94184430        7174792  550825 101910047       613061f gold.all
>>> >>
>>> >> Note that the --icf=all result caps the potential saving of the
>>> proposed
>>> >> annotation.
>>> >>
>>> >> Actually with some large internal targets I get even smaller savings.
>>> >>
>>> >>
>>> >> ld.lld --icf=safe is safer than gold --icf=safe but probably misses
>>> some
>>> >> opportunities.
>>> >> It can be that clang codegen/optimizer fail to mark some cases as
>>> >> {,local_}unnamed_addr.
>>> >>
>>> >> I know Chromium and the Windows world can be different:) But I'd still
>>> >> want to
>>> >> get some numbers first.
>>> >>
>>> >>
>>> >> Last, I have seen that Chromium has some code like
>>> >>
>>> >>
>>> https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp
>>> >>
>>> >>    void sk_abort_no_print() {
>>> >>        // Linker's ICF feature may merge this function with other
>>> >> functions with
>>> >>        // the same definition (e.g. any function whose sole job is to
>>> call
>>> >> abort())
>>> >>        // and it may confuse the crash report processing system.
>>> >>        // http://crbug.com/860850
>>> >>        static int static_variable_to_make_this_function_unique =
>>> 0x736b;
>>> >> // "sk"
>>> >>
>>> base::debug::Alias(&static_variable_to_make_this_function_unique);
>>> >>
>>> >>        abort();
>>> >>    }
>>> >>
>>> >> If we want an approach to work with link.exe, I don't know what we can
>>> >> do...
>>> >> If no desire for link.exe compatibility, I can see that having a
>>> proper
>>> >> way marking the function
>>> >> can be useful... but in any case if an attribute is used, it probably
>>> >> should affect
>>> >> unnamed_addr directly instead of being called *icf*.
>>> >>
>>> >>
>>> >>
>>> >> >On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev <
>>> >> >llvm-dev at lists.llvm.org> wrote:
>>> >> >
>>> >> >> Can you define ICF please?  And give a bit of context?
>>> >> >>
>>> >> >> Philip
>>> >> >> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote:
>>> >> >>
>>> >> >> Hi all,
>>> >> >>
>>> >> >> Background:
>>> >> >> It's been a longstanding difficulty of debugging with ICF.
>>> Programmers
>>> >> >> don't have control over which sections should be folded by ICF,
>>> which
>>> >> >> sections shouldn't. The existing address significant table won't
>>> have
>>> >> >> effect for code sections during all ICF mode in both ld.lld and
>>> >> lld-link.
>>> >> >> By switching to safe ICF could mark code sections as unique, but
>>> at a
>>> >> cost
>>> >> >> of increasing binary size out of control. So, it would be good if
>>> >> >> programmers could selectively disable ICF in source code by
>>> annotating
>>> >> >> global functions/variables with an attribute to improve debugging
>>> >> >> experience and have the control on the binary size increase.
>>> >> >>
>>> >> >> My plan is to add a new section table(`.no_icf`) to object files.
>>> >> Sections
>>> >> >> of all symbols inside the table should not be folded by all ICF
>>> mode.
>>> >> And
>>> >> >> symbols can only be added into the table by annotating global
>>> >> >> functions/variables with a new attribute(`no_icf`) in source code.
>>> >> >>
>>> >> >> What do you think about this approach?
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Zequan
>>> >> >>
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> LLVM Developers mailing listllvm-dev at lists.llvm.orghttps://
>>> >> lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> LLVM Developers mailing list
>>> >> >> llvm-dev at lists.llvm.org
>>> >> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >> >>
>>> >>
>>> >> >_______________________________________________
>>> >> >LLVM Developers mailing list
>>> >> >llvm-dev at lists.llvm.org
>>> >> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >>
>>> >>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210325/59c71eb5/attachment-0001.html>


More information about the llvm-dev mailing list