[cfe-users] how clang merge strings in .rodata section
Matthew Fernandez via cfe-users
cfe-users at lists.llvm.org
Fri Jul 6 12:47:05 PDT 2018
> On Jul 6, 2018, at 12:00, via cfe-users <cfe-users at lists.llvm.org> wrote:
>
> Message: 1
> Date: Fri, 6 Jul 2018 09:54:05 +0200
> From: Hans Wennborg via cfe-users <cfe-users at lists.llvm.org>
> To: "Jian, Xu" <Xu.Jian at dell.com>
> Cc: "cfe-users at lists.llvm.org" <cfe-users at lists.llvm.org>
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
> <CAB8jPhdpFjtLkV-9abV4UtLBkLeXwhNHzz0hG7pFe+9jxm=aEA at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users
> <cfe-users at lists.llvm.org> wrote:
>> Hi,
>>
>> The following c source code abc.c:
>>
>> #include <stdio.h>
>>
>> int g_val=10;
>>
>> const char *g_str="abc";
>>
>> const char *g_str1="c";
>>
>> int main(void)
>>
>> {
>>
>> printf("%s %s: %d\n",g_str,g_str1,g_val);
>>
>> return 0;
>>
>> }
>>
>>
>>
>> When compile with “clang abc.c -o abc” then dump .rodata section:
>>
>> # readelf -p .rodata abc
>>
>>
>>
>> String dump of section '.rodata':
>>
>> [ 0] abc
>>
>> [ 4] %s %s: %d
>>
>>
>>
>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>>
>> $ readelf -p .rodata abc
>>
>>
>>
>> String dump of section '.rodata':
>>
>> [ 10] abc
>>
>> [ 14] c
>>
>> [ 16] %s %s: %d^J
>>
>>
>>
>> clang is able to merge short string (“c”) into the tail of a long string
>> (“abc”), while gcc will not.
>>
>> Does anybody know how to disable this behavior (make it similar to gcc) ?
>
> I don't think there is a way to disable it.
>
> Why do you want to disable this behaviour?
>
> - Hans
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 6 Jul 2018 08:22:57 +0000
> From: "Jian, Xu via cfe-users" <cfe-users at lists.llvm.org>
> To: Hans Wennborg <hans at chromium.org>
> Cc: "cfe-users at lists.llvm.org" <cfe-users at lists.llvm.org>
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
> <21F00CE5CA12AA41A34A1A361177F7F901761E92 at MX201CL03.corp.emc.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Hans,
> We need to compare whether ELF files of two builds are identical.
> Because of string merge, the comparison has some trouble.
>
> For example in case following code lines (may be in different files):
> ---------------------------------------------------------------
> const char* s_array[1]="s";
> const char *first_s="this first bigger s";
> const char *second_s="this second bigger s";
> ---------------------------------------------------------------
>
> After clang build ELF out, sometimes the s_array[1] contail the position of the tail of first_s in .rodata second, while sometimes second_s.
> This lead to .data section diff since s_array is in it.
> The ELF diffs, while nothing changed from functionality point of view.
>
> Thanks.
>
> -----Original Message-----
> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of Hans Wennborg
> Sent: Friday, July 6, 2018 3:54 PM
> To: Jian, Xu
> Cc: cfe-users at lists.llvm.org
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
>
> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users <cfe-users at lists.llvm.org> wrote:
>> Hi,
>>
>> The following c source code abc.c:
>>
>> #include <stdio.h>
>>
>> int g_val=10;
>>
>> const char *g_str="abc";
>>
>> const char *g_str1="c";
>>
>> int main(void)
>>
>> {
>>
>> printf("%s %s: %d\n",g_str,g_str1,g_val);
>>
>> return 0;
>>
>> }
>>
>>
>>
>> When compile with “clang abc.c -o abc” then dump .rodata section:
>>
>> # readelf -p .rodata abc
>>
>>
>>
>> String dump of section '.rodata':
>>
>> [ 0] abc
>>
>> [ 4] %s %s: %d
>>
>>
>>
>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>>
>> $ readelf -p .rodata abc
>>
>>
>>
>> String dump of section '.rodata':
>>
>> [ 10] abc
>>
>> [ 14] c
>>
>> [ 16] %s %s: %d^J
>>
>>
>>
>> clang is able to merge short string (“c”) into the tail of a long
>> string (“abc”), while gcc will not.
>>
>> Does anybody know how to disable this behavior (make it similar to gcc) ?
>
> I don't think there is a way to disable it.
>
> Why do you want to disable this behaviour?
>
> - Hans
>
> ------------------------------
>
> Message: 3
> Date: Fri, 6 Jul 2018 11:01:25 +0200
> From: Hans Wennborg via cfe-users <cfe-users at lists.llvm.org>
> To: "Jian, Xu" <Xu.Jian at dell.com>
> Cc: "cfe-users at lists.llvm.org" <cfe-users at lists.llvm.org>
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
> <CAB8jPhfmvPUkBRDytgJgqe51NW2w-D9ti+6tMzfE=_cOESqfbg at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> On Fri, Jul 6, 2018 at 10:22 AM, Jian, Xu <Xu.Jian at dell.com> wrote:
>> Hi Hans,
>> We need to compare whether ELF files of two builds are identical.
>> Because of string merge, the comparison has some trouble.
>>
>> For example in case following code lines (may be in different files):
>> ---------------------------------------------------------------
>> const char* s_array[1]="s";
>> const char *first_s="this first bigger s";
>> const char *second_s="this second bigger s";
>> ---------------------------------------------------------------
>>
>> After clang build ELF out, sometimes the s_array[1] contail the position of the tail of first_s in .rodata second, while sometimes second_s.
>> This lead to .data section diff since s_array is in it.
>> The ELF diffs, while nothing changed from functionality point of view.
>
> Did the inputs change? If Clang is sometimes using the tail of first_s
> and sometimes second_s, for the same input, that's a bug. The
> compilation should be deterministic.
>
> Can you provide sample input files and command lines that show this problem?
>
> Thanks,
> Hans
>
>
>> -----Original Message-----
>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of Hans Wennborg
>> Sent: Friday, July 6, 2018 3:54 PM
>> To: Jian, Xu
>> Cc: cfe-users at lists.llvm.org
>> Subject: Re: [cfe-users] how clang merge strings in .rodata section
>>
>> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users <cfe-users at lists.llvm.org> wrote:
>>> Hi,
>>>
>>> The following c source code abc.c:
>>>
>>> #include <stdio.h>
>>>
>>> int g_val=10;
>>>
>>> const char *g_str="abc";
>>>
>>> const char *g_str1="c";
>>>
>>> int main(void)
>>>
>>> {
>>>
>>> printf("%s %s: %d\n",g_str,g_str1,g_val);
>>>
>>> return 0;
>>>
>>> }
>>>
>>>
>>>
>>> When compile with “clang abc.c -o abc” then dump .rodata section:
>>>
>>> # readelf -p .rodata abc
>>>
>>>
>>>
>>> String dump of section '.rodata':
>>>
>>> [ 0] abc
>>>
>>> [ 4] %s %s: %d
>>>
>>>
>>>
>>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>>>
>>> $ readelf -p .rodata abc
>>>
>>>
>>>
>>> String dump of section '.rodata':
>>>
>>> [ 10] abc
>>>
>>> [ 14] c
>>>
>>> [ 16] %s %s: %d^J
>>>
>>>
>>>
>>> clang is able to merge short string (“c”) into the tail of a long
>>> string (“abc”), while gcc will not.
>>>
>>> Does anybody know how to disable this behavior (make it similar to gcc) ?
>>
>> I don't think there is a way to disable it.
>>
>> Why do you want to disable this behaviour?
>>
>> - Hans
>
I think -fno-merge-all-constants is the option you are looking for. Either that or -fno-global-merge. More information about these options at [0]. The GCC docs for this option [1] also note that constant merging is not standards compliant. However, I have never seen a program in the wild where this type of merging causes problems.
[0]: https://clang.llvm.org/docs/ClangCommandLineReference.html
[1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
More information about the cfe-users
mailing list