[llvm-dev] [ELF] String literals don't obey -fdata-sections

Fangrui Song via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 16 10:42:33 PDT 2020


On 2020-09-16, Gaƫl Jobin wrote:
>On 2020-09-16 00:18, Fangrui Song wrote:
>
>>Usually it is because nobody has noticed the problem or nobody is
>>motivated enough to fix the problems, not that they intentionally leave
>>a problem open:) I took some time to look at the problem and conclude
>>that clang should do nothing on this. Actually, with the clang behavior,
>>you can discard "Unused" if you use LLD. Read on.
>
>Sorry if I misspoke, I was not suggesting that the bug was known and
>voluntary not fixed by laziness ;-). I am sure there is a valid reason
>and wanted to know about it. Just like you explained, it appears that
>LLVM rely on LLD to do that instead of enforcing it in the middle-end
>which is a different approach to GCC.
>
>>In GCC, -O turns on -fmerge-constants. Clang does not implement this
>>option, but implement the level 2 -fmerge-all-constants, which is non-conforming ("Languages like C or C++
>>require each variable, including multiple instances of the same variable
>>in recursive calls, to have distinct locations, so using this option
>>results in non-conforming behavior.").
>
>Non-confirming in the sense of C/C++ standard? How is it related to the
>-fdata-sections implementation?
>
>>With (-fmerge-constants or -fmerge-all-constants) & -fdata-sections, string literals are placed in .rodata.xxx.str1.1
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c16
>>This is, however, suboptimal because the cost of a section header
>>(sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large.
>>I have replied on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c19 and
>>created a GNU ld feature request
>>(https://sourceware.org/bugzilla/show_bug.cgi?id=26622)
>
>In my example, LLVM/Clang already put both pointer "test" and "unused"
>in different data section because of "-fdata-sections" as seen below.

Your example uses global mutable variables "test" and "unused" and that
is why they are in the .data.* sections. They are initialized to
addresses of string literals in .rodata.* . .rodata.* are what we care
about, not .data.* (.data.* can always be correctly garbage collected by
GNU ld/gold/LLD).

>>; Segment unnamed segment
>>; Range: [0x5c; 0x64[ (8 bytes)
>>; File offset : [144; 152[ (8 bytes)
>>; Permissions:  -
>>
>>; Section .data.test
>>; Range: [0x5c; 0x60[ (4 bytes)
>>; File offset : [144; 148[ (4 bytes)
>>; Flags: 0x3
>>;   SHT_PROGBITS
>>;   SHF_WRITE
>>;   SHF_ALLOC
>>
>>test:
>>
>>0000005c         dd         0x00000063
>>
>>; Section .data.unused
>>; Range: [0x60; 0x64[ (4 bytes)
>>; File offset : [148; 153[ (4 bytes)
>>; Flags: 0x3
>>;   SHT_PROGBITS
>>;   SHF_WRITE
>>;   SHF_ALLOC
>>
>>unused:
>>
>>00000060         dw        0x00000070
>
>So I am not sure to understand the point about sub-optimality here since
>it is already the case for the .data section where each variable imply a
>suboptimal cost in term of section header. How the c-string like datas
>are different ? I mean, the concept of -fdata-section/-ffunction-section
>("one section for each data/functions") should be the same for every
>kind of data, no?


More information about the llvm-dev mailing list