libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Dimitry Andric via llvm-commits llvm-commits at lists.llvm.org
Sat Oct 10 11:53:16 PDT 2015


On 19 Jun 2015, at 19:40, Michael Zolotukhin <mzolotukhin at apple.com> wrote:
> Author: mzolotukhin
> Date: Fri Jun 19 12:40:15 2015
> New Revision: 240144
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=240144&view=rev
> Log:
> [SLP] Vectorize for all-constant entries.
> 
> Differential Revision: http://reviews.llvm.org/D10558

Hi,

tl;dr: after some research, I found that this commit appears to change the behavior of llvm in such a way as to cause trouble for libc++ users on FreeBSD, due to alignment changes.

Recently, I upgraded llvm and clang to 3.7.0 in the FreeBSD base system.  Shortly afterwards, I received reports from users that some of their previously compiled C++ applications (dynamically linked to libc++.so.1) were crashing with SIGBUS errors.  The stack traces always looked like this:

    #0  0x0000000800c95cfd in std::__1::basic_ostream<char, std::__1::char_traits<char> >::basic_ostream (this=<optimized out>, __sb=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280
    #1  std::__1::ios_base::Init::Init (this=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53
    #2  0x0000000800c96029 in ?? () at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:44 from /usr/obj
    /usr/src/lib/libc++/libc++.so.1
    #3  0x0000000800cef352 in ?? () from /usr/obj/usr/src/lib/libc++/libc++.so.1
    #4  0x0000000800628c00 in ?? ()
    #5  0x00007fffffffdf18 in ?? ()
    #6  0x00007fffffffdea0 in ?? ()
    #7  0x0000000800c925c6 in _init () from /usr/obj/usr/src/lib/libc++/libc++.so.1
    #8  0x00007fffffffdea0 in ?? ()

E.g. it crashed in ios_base::Init::Init(), at this line:

    ostream* cout_ptr = ::new(cout) ostream(::new(__cout) __stdoutbuf<char>(stdout, &mb_cout));

and the relevant disassembly is:

    0x0000000800c95ccf <+255>:   callq  0x800c96210 <std::__1::__stdoutbuf<char>::__stdoutbuf(__sFILE*, __mbstate_t*)>
    0x0000000800c95cd4 <+260>:   mov    0x27d565(%rip),%rax        # 0x800f13240
    0x0000000800c95cdb <+267>:   lea    0x40(%rax),%rcx
    0x0000000800c95cdf <+271>:   movq   %rcx,%xmm0
    0x0000000800c95ce4 <+276>:   lea    0x18(%rax),%rcx
    0x0000000800c95ce8 <+280>:   movq   %rcx,%xmm1
    0x0000000800c95ced <+285>:   punpcklqdq %xmm0,%xmm1
    0x0000000800c95cf1 <+289>:   movdqa %xmm1,-0x40(%rbp)
    0x0000000800c95cf6 <+294>:   mov    0x27d86b(%rip),%rcx        # 0x800f13568
 => 0x0000000800c95cfd <+301>:   movdqa %xmm1,(%rcx)
    0x0000000800c95d01 <+305>:   mov    (%rax),%r14
    0x0000000800c95d04 <+308>:   lea    (%rcx,%r14,1),%rbx
    0x0000000800c95d08 <+312>:   mov    %rbx,%rdi
    0x0000000800c95d0b <+315>:   mov    %r15,%rsi
    0x0000000800c95d0e <+318>:   callq  0x800c92c4c <_ZNSt3__18ios_base4initEPv at plt>

In this case, %rcx was the address of std::cout, 0x609068 (or some other address that was aligned to 8 bytes, *not* 16 bytes), which causes movdqa to result in a SIGBUS.

These crashing executables were compiled by clang 3.6.1 (or earlier versions), and it turns out that all of them had a global symbol for std::cout, which was aligned to 8 bytes.  For example, in case of the original report, one of the executables showed the following in readelf output:

    Relocation section '.rela.dyn' at offset 0x2070 contains 6 entries:
        Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
    [...]
    0000000000609068  0000003c00000005 R_X86_64_COPY          0000000000609068 _ZNSt3__14coutE + 0

and further down:

    Symbol table '.dynsym' contains 87 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    [...]
        60: 0000000000609068   160 OBJECT  GLOBAL DEFAULT   25 _ZNSt3__14coutE

(Note: the executable gets a copy of std::cout, because the linker finds a global data object with copy relocations.)

The std::cout symbol is explicitly declared with an alignment in iostream.cpp, as follows:

    _ALIGNAS_TYPE (ostream)  _LIBCPP_FUNC_VIS char cout[sizeof(ostream)];

The alignment of ostream is 8 bytes, therefore the alignment of cout is also 8 bytes.

When libc++ was previously compiled by clang 3.6.1, the assembly generated from ios_base::Init::Init() looked different than above, and std::cout was explicitly aligned to 8 bytes, in the .bss section:

        #DEBUG_VALUE: Init:this <- RDI
        .loc    3 669 5                 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:5
        movq    $0, 136(%r15,%rbx)
        .loc    3 670 5                 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:5
        movl    $-1, 144(%r15,%rbx)
.Ltmp44:
        .loc    2 53 77                 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
        movq    __stdoutp at GOTPCREL(%rip), %r15
        movq    (%r15), %rsi
        .loc    2 53 45 is_stmt 0       # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:45
        leaq    _ZNSt3__1L6__coutE(%rip), %r12
        leaq    _ZNSt3__1L7mb_coutE(%rip), %rdx
        movq    %r12, %rdi
.Ltmp45:
        callq   _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
        .loc    20 280 1 is_stmt 1      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
        movq    _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE at GOTPCREL(%rip), %rax
        leaq    24(%rax), %rcx
        movq    %rcx, -64(%rbp)         # 8-byte Spill
        movq    _ZNSt3__14coutE at GOTPCREL(%rip), %rbx
        movq    %rcx, (%rbx)
        leaq    64(%rax), %rcx
        movq    %rcx, -48(%rbp)         # 8-byte Spill
        movq    %rcx, 8(%rbx)
        .loc    20 281 5                # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
.Ltmp47:
        movq    (%rax), %r14
        leaq    (%rbx,%r14), %rdi
        .loc    3 668 15                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:15
.Ltmp48:
.Ltmp6:
        movq    %r12, %rsi
        callq   _ZNSt3__18ios_base4initEPv at PLT
[... skip to .bss ...]
        .type   _ZNSt3__14coutE, at object # @_ZNSt3__14coutE
        .globl  _ZNSt3__14coutE
        .align  8
_ZNSt3__14coutE:
        .zero   160
        .size   _ZNSt3__14coutE, 160

In contrast, when you compile the same file with clang 3.7.0, the assembly becomes similar to the crashing case:

        #DEBUG_VALUE: Init:this <- RDI
        .loc    3 669 12                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:12
        movq    $0, 136(%rbx)
        .loc    3 670 13                # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:13
        movl    $-1, 144(%rbx)
.Ltmp44:
        .loc    2 53 77                 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
        movq    __stdoutp at GOTPCREL(%rip), %r12
        movq    (%r12), %rsi
        .loc    2 53 59 is_stmt 0       # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:59
        leaq    _ZNSt3__1L6__coutE(%rip), %r15
        leaq    _ZNSt3__1L7mb_coutE(%rip), %rdx
        movq    %r15, %rdi
.Ltmp45:
        callq   _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
        .loc    20 280 1 is_stmt 1      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
        movq    _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE at GOTPCREL(%rip), %rax
        leaq    64(%rax), %rcx
        movd    %rcx, %xmm0
        leaq    24(%rax), %rcx
        movd    %rcx, %xmm1
        punpcklqdq      %xmm0, %xmm1    # xmm1 = xmm1[0],xmm0[0]
        movdqa  %xmm1, -64(%rbp)        # 16-byte Spill
        movq    _ZNSt3__14coutE at GOTPCREL(%rip), %rcx
        movdqa  %xmm1, (%rcx)
.Ltmp47:
        .loc    20 281 5                # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
        movq    (%rax), %r14
.Ltmp48:
        .loc    20 281 5 is_stmt 0      # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
        leaq    (%rcx,%r14), %rbx
        .loc    3 668 5 is_stmt 1       # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:5
.Ltmp49:
.Ltmp6:
        movq    %rbx, %rdi
        movq    %r15, %rsi
        callq   _ZNSt3__18ios_base4initEPv at PLT

and the definition of of std::cout is now with 16 bytes alignment instead:

        .type   _ZNSt3__14coutE, at object # @_ZNSt3__14coutE
        .globl  _ZNSt3__14coutE
        .align  16
_ZNSt3__14coutE:
        .zero   160
        .size   _ZNSt3__14coutE, 160

Bisecting has shown that r240144 is the commit where this behavior changed, i.e. before this commit, std::cout is aligned at 8 bytes, and the instructions to access it take this into account.  After this commit, it becomes aligned at 16 bytes, and it is then accessed using SSE (movdqa, etc).  (Note that it is certainly possible that r240144 is just exposing a deeper issue, instead of being the root cause!)

So unfortunately, this new alignment makes a dynamic libc++ library, compiled by clang after this commit, incompatible with previously built applications.  This is a big problem for FreeBSD, since we rather value backwards compatibility. :-)

I have had several discussions with people who indicate that 16 byte alignment is an x86_64 ABI requirement, at least for "large enough" objects.  This may very well be true, but on the other hand, if the program authors explicitly specify that their objects must be aligned to X bytes, where X < 16, then the compiler should obey this, right?  And changing existing behavior is incompatible with previously compiled programs.

As a temporary workaround, I have now reverted r240144 in FreeBSD, which restores the previous behavior, and aligns std::cout to 8 bytes again.  But I would like to bring this to your attention, in the hope that we can find out what the real fix should be.

And to finish this way too long mail, I am attaching a minimized test case, which shows the behavior, and which should be appropriate to attach to a PR, if that is needed later on.

This test case should be compiled with -O2, to see the effects.  Clang 3.6.x will result in the following IR for std::cout:

    @cout = global [24 x i8] zeroinitializer, align 8

while clang 3.7.0 will result in:

    @cout = global [24 x i8] zeroinitializer, align 16

-Dimitry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cout-align.cpp
Type: application/octet-stream
Size: 402 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151010/a1fed62d/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151010/a1fed62d/attachment.sig>


More information about the llvm-commits mailing list