[cfe-dev] libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)
Michael Zolotukhin via cfe-dev
cfe-dev at lists.llvm.org
Sat Oct 10 14:17:38 PDT 2015
Hi,
This is a smaller reproducer of the issue:
$ cat test.ll
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"
@cout = global [24 x i8] zeroinitializer, align 8
define void @foo() {
entry:
store i64 100, i64* bitcast ([24 x i8]* @cout to i64*), align 8
store i64 200, i64* bitcast (i8* getelementptr inbounds ([24 x i8], [24 x i8]* @cout, i64 0, i64 8) to i64*), align 8
ret void
}
$ opt -slp-vectorizer < test.ll -S
; ModuleID = '<stdin>'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"
@cout = global [24 x i8] zeroinitializer, align 8
define void @foo() {
entry:
store <2 x i64> <i64 100, i64 200>, <2 x i64>* bitcast ([24 x i8]* @cout to <2 x i64>*), align 8
ret void
}
$ opt -slp-vectorizer < test.ll | opt -instcombine -S
; ModuleID = '<stdin>'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"
@cout = global [24 x i8] zeroinitializer, align 16
define void @foo() {
entry:
store <2 x i64> <i64 100, i64 200>, <2 x i64>* bitcast ([24 x i8]* @cout to <2 x i64>*), align 16
ret void
}
It looks like instcombine intentionally rewrites the alignment. It happens in getOrEnforceKnownAlignment called by InstCombiner::visitStoreInst. Then compiler performs several checks and decides that it can bump up the alignment, which is presumable illegal in this case.
Reid,
it looks like you relatively recently touched code in enforceKnownAlignment. Could you please comment on whether it’s correct behavior or not?
Thanks,
Michael
> On Oct 10, 2015, at 12:24 PM, Michael Zolotukhin via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> Hi Dimitry,
>
> Thanks for the report! I took a quick look at the test and found the following:
>
> 1) Before SLP we have this:
>
> @cout = global [24 x i8] zeroinitializer, align 8
> …
> define void @_Z7do_initv() #0 {
> entry:
> store i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)*** bitcast ([24 x i8]* @cout to i32 (...)***), align 8, !tbaa !2
> store i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**), i32 (...)*** bitcast (i8* getelementptr inbounds ([24 x i8], [24 x i8]* @cout, i64 0, i64 8) to i32 (...)***), align 8, !tbaa !2
> ret void
> }
>
> 2) After SLP we get this:
>
> @cout = global [24 x i8] zeroinitializer, align 8
> …
> define void @_Z7do_initv() #2 {
> entry:
> store <2 x i32 (...)**> <i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**)>, <2 x i32 (...)**>* bitcast ([24 x i8]* @cout to <2 x i32 (...)**>*), align 8, !tbaa !2
> ret void
> }
>
> 3) And then, after instcobmine, we get this:
>
> @cout = global [24 x i8] zeroinitializer, align 16
> …
> ; Function Attrs: nounwind ssp uwtable
> define void @_Z7do_initv() #2 {
> entry:
> store <2 x i32 (...)**> <i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (...)**), i32 (...)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (...)**)>, <2 x i32 (...)**>* bitcast ([24 x i8]* @cout to <2 x i32 (...)**>*), align 16, !tbaa !2
> ret void
> }
>
> I’ll take a look at instombine to understand why it replaces "align 8" with "align 16” in this case. Maybe it’s just a bug, or maybe we shouldn’t vectorize this case for some reason.
>
> Thanks,
> Michael
>
>
>> On Oct 10, 2015, at 11:53 AM, Dimitry Andric <dimitry at andric.com <mailto:dimitry at andric.com>> wrote:
>>
>> On 19 Jun 2015, at 19:40, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
>>> Author: mzolotukhin
>>> Date: Fri Jun 19 12:40:15 2015
>>> New Revision: 240144
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=240144&view=rev <http://llvm.org/viewvc/llvm-project?rev=240144&view=rev>
>>> Log:
>>> [SLP] Vectorize for all-constant entries.
>>>
>>> Differential Revision: http://reviews.llvm.org/D10558 <http://reviews.llvm.org/D10558>
>>
>> Hi,
>>
>> tl;dr: after some research, I found that this commit appears to change the behavior of llvm in such a way as to cause trouble for libc++ users on FreeBSD, due to alignment changes.
>>
>> Recently, I upgraded llvm and clang to 3.7.0 in the FreeBSD base system. Shortly afterwards, I received reports from users that some of their previously compiled C++ applications (dynamically linked to libc++.so.1) were crashing with SIGBUS errors. The stack traces always looked like this:
>>
>> #0 0x0000000800c95cfd in std::__1::basic_ostream<char, std::__1::char_traits<char> >::basic_ostream (this=<optimized out>, __sb=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280
>> #1 std::__1::ios_base::Init::Init (this=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53
>> #2 0x0000000800c96029 in ?? () at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:44 from /usr/obj
>> /usr/src/lib/libc++/libc++.so.1
>> #3 0x0000000800cef352 in ?? () from /usr/obj/usr/src/lib/libc++/libc++.so.1
>> #4 0x0000000800628c00 in ?? ()
>> #5 0x00007fffffffdf18 in ?? ()
>> #6 0x00007fffffffdea0 in ?? ()
>> #7 0x0000000800c925c6 in _init () from /usr/obj/usr/src/lib/libc++/libc++.so.1
>> #8 0x00007fffffffdea0 in ?? ()
>>
>> E.g. it crashed in ios_base::Init::Init(), at this line:
>>
>> ostream* cout_ptr = ::new(cout) ostream(::new(__cout) __stdoutbuf<char>(stdout, &mb_cout));
>>
>> and the relevant disassembly is:
>>
>> 0x0000000800c95ccf <+255>: callq 0x800c96210 <std::__1::__stdoutbuf<char>::__stdoutbuf(__sFILE*, __mbstate_t*)>
>> 0x0000000800c95cd4 <+260>: mov 0x27d565(%rip),%rax # 0x800f13240
>> 0x0000000800c95cdb <+267>: lea 0x40(%rax),%rcx
>> 0x0000000800c95cdf <+271>: movq %rcx,%xmm0
>> 0x0000000800c95ce4 <+276>: lea 0x18(%rax),%rcx
>> 0x0000000800c95ce8 <+280>: movq %rcx,%xmm1
>> 0x0000000800c95ced <+285>: punpcklqdq %xmm0,%xmm1
>> 0x0000000800c95cf1 <+289>: movdqa %xmm1,-0x40(%rbp)
>> 0x0000000800c95cf6 <+294>: mov 0x27d86b(%rip),%rcx # 0x800f13568
>> => 0x0000000800c95cfd <+301>: movdqa %xmm1,(%rcx)
>> 0x0000000800c95d01 <+305>: mov (%rax),%r14
>> 0x0000000800c95d04 <+308>: lea (%rcx,%r14,1),%rbx
>> 0x0000000800c95d08 <+312>: mov %rbx,%rdi
>> 0x0000000800c95d0b <+315>: mov %r15,%rsi
>> 0x0000000800c95d0e <+318>: callq 0x800c92c4c <_ZNSt3__18ios_base4initEPv at plt>
>>
>> In this case, %rcx was the address of std::cout, 0x609068 (or some other address that was aligned to 8 bytes, *not* 16 bytes), which causes movdqa to result in a SIGBUS.
>>
>> These crashing executables were compiled by clang 3.6.1 (or earlier versions), and it turns out that all of them had a global symbol for std::cout, which was aligned to 8 bytes. For example, in case of the original report, one of the executables showed the following in readelf output:
>>
>> Relocation section '.rela.dyn' at offset 0x2070 contains 6 entries:
>> Offset Info Type Symbol's Value Symbol's Name + Addend
>> [...]
>> 0000000000609068 0000003c00000005 R_X86_64_COPY 0000000000609068 _ZNSt3__14coutE + 0
>>
>> and further down:
>>
>> Symbol table '.dynsym' contains 87 entries:
>> Num: Value Size Type Bind Vis Ndx Name
>> [...]
>> 60: 0000000000609068 160 OBJECT GLOBAL DEFAULT 25 _ZNSt3__14coutE
>>
>> (Note: the executable gets a copy of std::cout, because the linker finds a global data object with copy relocations.)
>>
>> The std::cout symbol is explicitly declared with an alignment in iostream.cpp, as follows:
>>
>> _ALIGNAS_TYPE (ostream) _LIBCPP_FUNC_VIS char cout[sizeof(ostream)];
>>
>> The alignment of ostream is 8 bytes, therefore the alignment of cout is also 8 bytes.
>>
>> When libc++ was previously compiled by clang 3.6.1, the assembly generated from ios_base::Init::Init() looked different than above, and std::cout was explicitly aligned to 8 bytes, in the .bss section:
>>
>> #DEBUG_VALUE: Init:this <- RDI
>> .loc 3 669 5 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:5
>> movq $0, 136(%r15,%rbx)
>> .loc 3 670 5 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:5
>> movl $-1, 144(%r15,%rbx)
>> .Ltmp44:
>> .loc 2 53 77 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
>> movq __stdoutp at GOTPCREL(%rip), %r15
>> movq (%r15), %rsi
>> .loc 2 53 45 is_stmt 0 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:45
>> leaq _ZNSt3__1L6__coutE(%rip), %r12
>> leaq _ZNSt3__1L7mb_coutE(%rip), %rdx
>> movq %r12, %rdi
>> .Ltmp45:
>> callq _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
>> .loc 20 280 1 is_stmt 1 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
>> .Ltmp46:
>> movq _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE at GOTPCREL(%rip), %rax
>> leaq 24(%rax), %rcx
>> movq %rcx, -64(%rbp) # 8-byte Spill
>> movq _ZNSt3__14coutE at GOTPCREL(%rip), %rbx
>> movq %rcx, (%rbx)
>> leaq 64(%rax), %rcx
>> movq %rcx, -48(%rbp) # 8-byte Spill
>> movq %rcx, 8(%rbx)
>> .loc 20 281 5 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
>> .Ltmp47:
>> movq (%rax), %r14
>> leaq (%rbx,%r14), %rdi
>> .loc 3 668 15 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:15
>> .Ltmp48:
>> .Ltmp6:
>> movq %r12, %rsi
>> callq _ZNSt3__18ios_base4initEPv at PLT
>> [... skip to .bss ...]
>> .type _ZNSt3__14coutE, at object # @_ZNSt3__14coutE
>> .globl _ZNSt3__14coutE
>> .align 8
>> _ZNSt3__14coutE:
>> .zero 160
>> .size _ZNSt3__14coutE, 160
>>
>> In contrast, when you compile the same file with clang 3.7.0, the assembly becomes similar to the crashing case:
>>
>> #DEBUG_VALUE: Init:this <- RDI
>> .loc 3 669 12 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:12
>> movq $0, 136(%rbx)
>> .loc 3 670 13 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:13
>> movl $-1, 144(%rbx)
>> .Ltmp44:
>> .loc 2 53 77 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
>> movq __stdoutp at GOTPCREL(%rip), %r12
>> movq (%r12), %rsi
>> .loc 2 53 59 is_stmt 0 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:59
>> leaq _ZNSt3__1L6__coutE(%rip), %r15
>> leaq _ZNSt3__1L7mb_coutE(%rip), %rdx
>> movq %r15, %rdi
>> .Ltmp45:
>> callq _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
>> .loc 20 280 1 is_stmt 1 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
>> .Ltmp46:
>> movq _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE at GOTPCREL(%rip), %rax
>> leaq 64(%rax), %rcx
>> movd %rcx, %xmm0
>> leaq 24(%rax), %rcx
>> movd %rcx, %xmm1
>> punpcklqdq %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0]
>> movdqa %xmm1, -64(%rbp) # 16-byte Spill
>> movq _ZNSt3__14coutE at GOTPCREL(%rip), %rcx
>> movdqa %xmm1, (%rcx)
>> .Ltmp47:
>> .loc 20 281 5 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
>> movq (%rax), %r14
>> .Ltmp48:
>> .loc 20 281 5 is_stmt 0 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
>> leaq (%rcx,%r14), %rbx
>> .loc 3 668 5 is_stmt 1 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:5
>> .Ltmp49:
>> .Ltmp6:
>> movq %rbx, %rdi
>> movq %r15, %rsi
>> callq _ZNSt3__18ios_base4initEPv at PLT
>>
>> and the definition of of std::cout is now with 16 bytes alignment instead:
>>
>> .type _ZNSt3__14coutE, at object # @_ZNSt3__14coutE
>> .globl _ZNSt3__14coutE
>> .align 16
>> _ZNSt3__14coutE:
>> .zero 160
>> .size _ZNSt3__14coutE, 160
>>
>> Bisecting has shown that r240144 is the commit where this behavior changed, i.e. before this commit, std::cout is aligned at 8 bytes, and the instructions to access it take this into account. After this commit, it becomes aligned at 16 bytes, and it is then accessed using SSE (movdqa, etc). (Note that it is certainly possible that r240144 is just exposing a deeper issue, instead of being the root cause!)
>>
>> So unfortunately, this new alignment makes a dynamic libc++ library, compiled by clang after this commit, incompatible with previously built applications. This is a big problem for FreeBSD, since we rather value backwards compatibility. :-)
>>
>> I have had several discussions with people who indicate that 16 byte alignment is an x86_64 ABI requirement, at least for "large enough" objects. This may very well be true, but on the other hand, if the program authors explicitly specify that their objects must be aligned to X bytes, where X < 16, then the compiler should obey this, right? And changing existing behavior is incompatible with previously compiled programs.
>>
>> As a temporary workaround, I have now reverted r240144 in FreeBSD, which restores the previous behavior, and aligns std::cout to 8 bytes again. But I would like to bring this to your attention, in the hope that we can find out what the real fix should be.
>>
>> And to finish this way too long mail, I am attaching a minimized test case, which shows the behavior, and which should be appropriate to attach to a PR, if that is needed later on.
>>
>> This test case should be compiled with -O2, to see the effects. Clang 3.6.x will result in the following IR for std::cout:
>>
>> @cout = global [24 x i8] zeroinitializer, align 8
>>
>> while clang 3.7.0 will result in:
>>
>> @cout = global [24 x i8] zeroinitializer, align 16
>>
>> -Dimitry
>> <cout-align.cpp>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151010/0b353ca9/attachment.html>
More information about the cfe-dev
mailing list