[cfe-users] how clang merge strings in .rodata section

Matthew Fernandez via cfe-users cfe-users at lists.llvm.org
Fri Jul 6 12:47:05 PDT 2018



> On Jul 6, 2018, at 12:00, via cfe-users <cfe-users at lists.llvm.org> wrote:
> 
> Message: 1
> Date: Fri, 6 Jul 2018 09:54:05 +0200
> From: Hans Wennborg via cfe-users <cfe-users at lists.llvm.org>
> To: "Jian, Xu" <Xu.Jian at dell.com>
> Cc: "cfe-users at lists.llvm.org" <cfe-users at lists.llvm.org>
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
> 	<CAB8jPhdpFjtLkV-9abV4UtLBkLeXwhNHzz0hG7pFe+9jxm=aEA at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users
> <cfe-users at lists.llvm.org> wrote:
>> Hi,
>> 
>> The following c source code abc.c:
>> 
>> #include <stdio.h>
>> 
>> int g_val=10;
>> 
>> const char *g_str="abc";
>> 
>> const char *g_str1="c";
>> 
>> int main(void)
>> 
>> {
>> 
>>    printf("%s %s: %d\n",g_str,g_str1,g_val);
>> 
>>    return 0;
>> 
>> }
>> 
>> 
>> 
>> When compile with “clang abc.c -o abc” then dump .rodata section:
>> 
>> # readelf -p .rodata abc
>> 
>> 
>> 
>> String dump of section '.rodata':
>> 
>>  [     0]  abc
>> 
>> [     4]  %s %s: %d
>> 
>> 
>> 
>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>> 
>> $ readelf -p .rodata abc
>> 
>> 
>> 
>> String dump of section '.rodata':
>> 
>>  [    10]  abc
>> 
>>  [    14]  c
>> 
>>  [    16]  %s %s: %d^J
>> 
>> 
>> 
>> clang is able to merge short string (“c”) into the tail of a long string
>> (“abc”), while gcc will not.
>> 
>> Does anybody know how to disable this behavior (make it similar to gcc) ?
> 
> I don't think there is a way to disable it.
> 
> Why do you want to disable this behaviour?
> 
> - Hans
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Fri, 6 Jul 2018 08:22:57 +0000
> From: "Jian, Xu via cfe-users" <cfe-users at lists.llvm.org>
> To: Hans Wennborg <hans at chromium.org>
> Cc: "cfe-users at lists.llvm.org" <cfe-users at lists.llvm.org>
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
> 	<21F00CE5CA12AA41A34A1A361177F7F901761E92 at MX201CL03.corp.emc.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Hans,
> We need to compare whether ELF files of two builds are identical.
> Because of string merge, the comparison has some trouble.
> 
> For example in case following code lines (may be in different files):
> ---------------------------------------------------------------
> const char* s_array[1]="s";
> const char *first_s="this first bigger s";
> const char *second_s="this second bigger s";
> ---------------------------------------------------------------
> 
> After clang build ELF out, sometimes the s_array[1] contail the position of the tail of first_s in .rodata second, while sometimes second_s.
> This lead to .data section diff since s_array is in it.
> The ELF diffs, while nothing changed from functionality point of view.
> 
> Thanks.
> 
> -----Original Message-----
> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of Hans Wennborg
> Sent: Friday, July 6, 2018 3:54 PM
> To: Jian, Xu
> Cc: cfe-users at lists.llvm.org
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> 
> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users <cfe-users at lists.llvm.org> wrote:
>> Hi,
>> 
>> The following c source code abc.c:
>> 
>> #include <stdio.h>
>> 
>> int g_val=10;
>> 
>> const char *g_str="abc";
>> 
>> const char *g_str1="c";
>> 
>> int main(void)
>> 
>> {
>> 
>>    printf("%s %s: %d\n",g_str,g_str1,g_val);
>> 
>>    return 0;
>> 
>> }
>> 
>> 
>> 
>> When compile with “clang abc.c -o abc” then dump .rodata section:
>> 
>> # readelf -p .rodata abc
>> 
>> 
>> 
>> String dump of section '.rodata':
>> 
>>  [     0]  abc
>> 
>> [     4]  %s %s: %d
>> 
>> 
>> 
>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>> 
>> $ readelf -p .rodata abc
>> 
>> 
>> 
>> String dump of section '.rodata':
>> 
>>  [    10]  abc
>> 
>>  [    14]  c
>> 
>>  [    16]  %s %s: %d^J
>> 
>> 
>> 
>> clang is able to merge short string (“c”) into the tail of a long 
>> string (“abc”), while gcc will not.
>> 
>> Does anybody know how to disable this behavior (make it similar to gcc) ?
> 
> I don't think there is a way to disable it.
> 
> Why do you want to disable this behaviour?
> 
> - Hans
> 
> ------------------------------
> 
> Message: 3
> Date: Fri, 6 Jul 2018 11:01:25 +0200
> From: Hans Wennborg via cfe-users <cfe-users at lists.llvm.org>
> To: "Jian, Xu" <Xu.Jian at dell.com>
> Cc: "cfe-users at lists.llvm.org" <cfe-users at lists.llvm.org>
> Subject: Re: [cfe-users] how clang merge strings in .rodata section
> Message-ID:
> 	<CAB8jPhfmvPUkBRDytgJgqe51NW2w-D9ti+6tMzfE=_cOESqfbg at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Fri, Jul 6, 2018 at 10:22 AM, Jian, Xu <Xu.Jian at dell.com> wrote:
>> Hi Hans,
>> We need to compare whether ELF files of two builds are identical.
>> Because of string merge, the comparison has some trouble.
>> 
>> For example in case following code lines (may be in different files):
>> ---------------------------------------------------------------
>> const char* s_array[1]="s";
>> const char *first_s="this first bigger s";
>> const char *second_s="this second bigger s";
>> ---------------------------------------------------------------
>> 
>> After clang build ELF out, sometimes the s_array[1] contail the position of the tail of first_s in .rodata second, while sometimes second_s.
>> This lead to .data section diff since s_array is in it.
>> The ELF diffs, while nothing changed from functionality point of view.
> 
> Did the inputs change? If Clang is sometimes using the tail of first_s
> and sometimes second_s, for the same input, that's a bug. The
> compilation should be deterministic.
> 
> Can you provide sample input files and command lines that show this problem?
> 
> Thanks,
> Hans
> 
> 
>> -----Original Message-----
>> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of Hans Wennborg
>> Sent: Friday, July 6, 2018 3:54 PM
>> To: Jian, Xu
>> Cc: cfe-users at lists.llvm.org
>> Subject: Re: [cfe-users] how clang merge strings in .rodata section
>> 
>> On Thu, Jul 5, 2018 at 3:18 AM, Jian, Xu via cfe-users <cfe-users at lists.llvm.org> wrote:
>>> Hi,
>>> 
>>> The following c source code abc.c:
>>> 
>>> #include <stdio.h>
>>> 
>>> int g_val=10;
>>> 
>>> const char *g_str="abc";
>>> 
>>> const char *g_str1="c";
>>> 
>>> int main(void)
>>> 
>>> {
>>> 
>>>    printf("%s %s: %d\n",g_str,g_str1,g_val);
>>> 
>>>    return 0;
>>> 
>>> }
>>> 
>>> 
>>> 
>>> When compile with “clang abc.c -o abc” then dump .rodata section:
>>> 
>>> # readelf -p .rodata abc
>>> 
>>> 
>>> 
>>> String dump of section '.rodata':
>>> 
>>>  [     0]  abc
>>> 
>>> [     4]  %s %s: %d
>>> 
>>> 
>>> 
>>> When compile with “gcc abc.c -o abc” then dump .rodata section:
>>> 
>>> $ readelf -p .rodata abc
>>> 
>>> 
>>> 
>>> String dump of section '.rodata':
>>> 
>>>  [    10]  abc
>>> 
>>>  [    14]  c
>>> 
>>>  [    16]  %s %s: %d^J
>>> 
>>> 
>>> 
>>> clang is able to merge short string (“c”) into the tail of a long
>>> string (“abc”), while gcc will not.
>>> 
>>> Does anybody know how to disable this behavior (make it similar to gcc) ?
>> 
>> I don't think there is a way to disable it.
>> 
>> Why do you want to disable this behaviour?
>> 
>> - Hans
> 

I think -fno-merge-all-constants is the option you are looking for. Either that or -fno-global-merge. More information about these options at [0]. The GCC docs for this option [1] also note that constant merging is not standards compliant. However, I have never seen a program in the wild where this type of merging causes problems.

  [0]: https://clang.llvm.org/docs/ClangCommandLineReference.html
  [1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html


More information about the cfe-users mailing list