[LLVMdev] radr://12777299, "potential pthread/eh bug exposed by libsanitizer"

Fri Nov 30 10:32:51 PST 2012

No, we are not going to use mach_inject. This isn't portable and may
be even harder to set up than mach_override.
The new ASan runtime will use the dylib interposition and will in fact
require DYLD_INSERT_LIBRARIES to work. However ASan already handles it
correctly itself: if the corresponding env var is missing the app is
just re-execed.
Dylib interposition is supported by Apple and should work on iOS as
well as Mac OS. It will also probably simplify hooking the memory
allocations in ASan, which is now very tricky.

On Fri, Nov 30, 2012 at 6:56 AM, Jack Howarth <howarth at bromo.med.uc.edu> wrote:
> On Fri, Nov 30, 2012 at 01:41:05PM +0400, Kostya Serebryany wrote:
>> Just want to remind everyone that we plan to stop using mach_override in
>> asanin favor of OSX's native function interposition.
>> So, we probably don't want to spend too much effort fixing mach_override.
>>
>> --kcc
>
> Kostya,
>     Is the native function interposition that is being adopted based on...
>
> https://github.com/rentzsch/mach_inject
>
> ? I assume that any method used will be transparent to the user and not require
> manually setting DYLD_INSERT_LIBRARIES, correct?
>           Jack
>
>>
>> On Fri, Nov 30, 2012 at 4:46 AM, Alexander Potapenko <glider at google.com>wrote:
>>
>> > Looks like this happens on x86_64 because the position of __cxa_throw
>> > is too far from the allocated branch island (should be <2G). This can
>> > be solved by allocating the branch islands somewhere near the text
>> > segment (look for kIslandEnd in asan_mac.cc, this is currently
>> > 0x7fffffdf0000) or by patching the function with a longer instruction
>> > sequence that stores the jump target in a register and jumps to that
>> > target (which is a bit more complex to implement).
>> >
>> > Once this problem is fixed, another one is going to arise. This is how
>> > the first bytes of __cxa_throw look like:
>> >
>> > 0x0020c49ba5d916e0 <__cxa_throw+0>: lea    0xb4f01(%rip),%rax        #
>> > 0x20c49ba5e465e8 <_ZN10__cxxabiv120__unexpected_handlerE>
>> > 0x0020c49ba5d916e7 <__cxa_throw+7>: push   %rbx
>> > 0x0020c49ba5d916e8 <__cxa_throw+8>: lea    -0x20(%rdi),%rbx
>> >
>> > If we move the relative LEA instruction somewhere, we must fix the
>> > constant in order to keep it pointing to the same address.
>> > mach_override already does this for relative CALL and JMP
>> > instructions, but not for LEA. This should be fairly simple to fix.
>> >
>> > Note that the 32-bit variant crashes on another invalid address:
>> >
>> > ASAN:SIGSEGV
>> > =================================================================
>> > ==89768== ERROR: AddressSanitizer: SEGV on unknown address 0xcccccccc
>> > (pc 0x00061f8c sp 0xbffa8bd0 bp 0xbffa8cc8 T0)
>> > AddressSanitizer can not provide additional info.
>> >     #0 0x61f8b
>> > (/Users/glider/src/gcc-asan/inst/lib/i386/libstdc++.6.dylib+0x3f8b)
>> >     #1 0x91391724 (/usr/lib/system/libdyld.dylib+0x2724)
>> >     #2 0x0
>> > Stats: 0M malloced (0M for red zones) by 3 calls
>> > Stats: 0M realloced by 0 calls
>> > Stats: 0M freed by 0 calls
>> > Stats: 0M really freed by 0 calls
>> > Stats: 1M (256 full pages) mmaped in 2 calls
>> >   mmaps   by size class: 7:4095; 8:2047;
>> >   mallocs by size class: 7:1; 8:2;
>> >   frees   by size class:
>> >   rfrees  by size class:
>> > Stats: malloc large: 0 small slow: 2
>> > ==89768== ABORTING
>> >
>> > My guess is that this is caused by the following code being moved to a
>> > branch island:
>> >
>> > Dump of assembler code for function __cxa_throw:
>> > 0x00008f60 <__cxa_throw+0>: push   %esi
>> > 0x00008f61 <__cxa_throw+1>: push   %ebx
>> > 0x00008f62 <__cxa_throw+2>: call   0x7a60 <__x86.get_pc_thunk.bx>
>> >
>> > Perhaps this makes __x86.get_pc_thunk.bx return an incorrect value.
>> >
>> > Since libstdc++-v3 is built together with gcc, the two issues related
>> > to instructions being moved to another place can be solved by padding
>> > __cxa_throw() with five NOP instructions (enough to hold a JMP). I
>> > believe this should be acceptable, because the performance penalty for
>> > additional NOPs is negligible, and __cxa_throw() isn't a hot point.
>> >
>> > On Thu, Nov 29, 2012 at 1:01 PM, Nick Kledzik <kledzik at apple.com> wrote:
>> > > I debugged this a bit and it seems the mach_override patching of
>> > __cxa_throw is bogus.  The start of that function is patched to jump to
>> > garbage.
>> > >
>> > > Breakpoint 1, 0x0000000100001c19 in main ()
>> > > (gdb) display/i $pc
>> > > 2: x/i $pc  0x100001c19 <main+318>:     callq  0x100016386
>> > <dyld_stub___cxa_throw>
>> > > (gdb) si
>> > > 0x0000000100016386 in dyld_stub___cxa_throw ()
>> > > 2: x/i $pc  0x100016386 <dyld_stub___cxa_throw>:        jmpq
>> > *0xae1c(%rip)        # 0x1000211a8
>> > > (gdb)
>> > > 0x0000000102244870 in __cxa_throw ()
>> > > 2: x/i $pc  0x102244870 <__cxa_throw>:  jmpq   0xffd27000
>> > > (gdb)  # the above its __cxa_throw in gcc's libstdc++.6.dylib.  The
>> > first instruction has been patch to jump to a garbage address.
>> > >
>> > > (gdb) x/8i 0x102244870-8
>> > > 0x102244868
>> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+56>:
>> > std
>> > > 0x102244869
>> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+57>:
>> > (bad)
>> > > 0x10224486a
>> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+58>:
>> > decl   (%rdi)
>> > > 0x10224486c
>> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+60>:
>> > (bad)
>> > > 0x10224486d
>> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+61>:
>> > add    %r8b,(%rax)
>> > > 0x102244870 <__cxa_throw>:      jmpq   0xffd27000
>> > > 0x102244875 <__cxa_throw+5>:    or     (%rax),%eax
>> > > 0x102244877 <__cxa_throw+7>:    push   %rbx
>> > > (gdb)
>> > > (gdb) watch *0x102244870
>> > > Hardware watchpoint 2: *4330899568
>> > > (gdb) r
>> > >
>> > > Old value = -788165304
>> > > New value = -1373139991
>> > > 0x0000000100016203 in __asan_mach_override_ptr_custom ()
>> > > (gdb) bt
>> > > #0  0x0000000100016203 in __asan_mach_override_ptr_custom ()
>> > > #1  0x0000000100015a9e in __interception::OverrideFunction ()
>> > > #2  0x00007fff5fc13378 in ImageLoaderMachO::doModInitFunctions ()
>> > > #3  0x00007fff5fc13762 in ImageLoaderMachO::doInitialization ()
>> > > #4  0x00007fff5fc1006e in ImageLoader::recursiveInitialization ()
>> > > #5  0x00007fff5fc0feba in ImageLoader::runInitializers ()
>> > > #6  0x00007fff5fc01fc0 in dyld::initializeMainExecutable ()
>> > > #7  0x00007fff5fc05b04 in dyld::_main ()
>> > > #8  0x00007fff5fc01397 in dyldbootstrap::start ()
>> > > #9  0x00007fff5fc0105e in _dyld_start ()
>> > > (gdb) x/8i 0x102244870
>> > > 0x102244870 <__cxa_throw>:      jmpq   0xffd27000
>> > > 0x102244875 <__cxa_throw+5>:    or     (%rax),%eax
>> > > 0x102244877 <__cxa_throw+7>:    push   %rbx
>> > > 0x102244878 <__cxa_throw+8>:    lea    -0x20(%rdi),%rbx
>> > > 0x10224487c <__cxa_throw+12>:   mov    %rsi,-0x70(%rdi)
>> > > # Here is where the patching is being done
>> > >
>> > > -Nick
>> > >
>> > > On Nov 29, 2012, at 11:07 AM, Alexander Potapenko wrote:
>> > >>> On Thu, Nov 29, 2012 at 9:55 PM, Jack Howarth <
>> > howarth at bromo.med.uc.edu>
>> > >>> wrote:
>> > >>>>
>> > >>>> Nick,
>> > >>>>   Can you take a quick look at the asan_eh_bug.tar.bz testcase
>> > >>>> I uploaded into the newly opened radr://12777299, "potential
>> > >>>> pthread/eh bug exposed by libsanitizer". The FSF gcc developers
>> > >>>> have ported llvm.org's asan code into FSF gcc (and are keeping
>> > >>>> it synced to the upstream llvm.org code). I have been helping
>> > >>>> with the darwin build and testing -fsanitize=address against the
>> > >>>> complete FSF gcc testsuite. This seems to have exposed a potential
>> > >>>> bug in pthread or eh on darwin under libasan. Hundreds of test cases
>> > >>>> in the g++ and libstdc++ testsuites fail under -fsanitize=address
>> > >>>> in the following manner...
>> > >>>>
>> > >>>> ASAN:SIGSEGV
>> > >>>> =================================================================
>> > >>>> ==2738== ERROR: AddressSanitizer: SEGV on unknown address
>> > 0x0000ffd27000
>> > >>>> (pc 0x0000ffd27000 sp 0x7fff55e40828 bp 0x7fff55e408f0 T0)
>> > >>>> AddressSanitizer can not provide additional info.
>> > >>>>    #0 0xffd26fff
>> > (/Users/howarth/asan_eh_bug/./cond1_asan.exe+0xf5f67fff)
>> > >>>>    #1 0x7fff8bd827e0 (/usr/lib/system/libdyld.dylib+0x27e0)
>> > >>>>    #2 0x0
>> > >>>> Stats: 0M malloced (0M for red zones) by 3 calls
>> > >>>> Stats: 0M realloced by 0 calls
>> > >>>> Stats: 0M freed by 0 calls
>> > >>>> Stats: 0M really freed by 0 calls
>> > >>>> Stats: 1M (384 full pages) mmaped in 3 calls
>> > >>>>  mmaps   by size class: 7:4095; 8:2047; 9:1023;
>> > >>>>  mallocs by size class: 7:1; 8:1; 9:1;
>> > >>>>  frees   by size class:
>> > >>>>  rfrees  by size class:
>> > >>>> Stats: malloc large: 0 small slow: 3
>> > >>>> ==2738== ABORTING
>> > >>>>
>> > >>>> The failure of...
>> > >>>>
>> > >>>> FAIL: g++.dg/eh/cond1.C -std=c++98 execution test
>> > >>>>
>> > >>>> was used as the test case for the radar report and compiled with...
>> > >>>>
>> > >>>> g++-fsf-4.8 -static-libasan -fsanitize=address -std=c++98 cond1.C -g
>> > -O0
>> > >>>> -o cond1_asan.exe
>> > >>>>
>> > >>>> to produce the above failure. When compiled without libasan as...
>> > >>>>
>> > >>>> g++-fsf-4.8 -std=c++98 cond1.C -g -O0 -o cond1_no_asan.exe
>> > >>>>
>> > >>>> the resulting executable runs fine. Debugging this in gdb seems to
>> > show
>> > >>>> that the failure
>> > >>>> is occuring in the final call to dyld_stub_pthread_once (). The same
>> > test
>> > >>>> case
>> > >>>> compiles fine with -fsanitize=address under llvm 3.2 clang++ and
>> > produces
>> > >>>> no runtime errors
>> > >>>> but the code execution path is very different in that case (because
>> > of the
>> > >>>> different
>> > >>>> libstdc++).
>> > >>>>    Can you take a quick peek at this and determine if this is a darwin
>> > >>>> pthread or unwinder
>> > >>>> bug or an issue with libasan that FSF gcc's compiler is exposing?
>> > Thanks
>> > >>>> in advance for
>> > >>>> any help on this.
>> > >>>>         Jack
>> > >>>> _______________________________________________
>> > >>>> LLVM Developers mailing list
>> > >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >>>
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Alexander Potapenko
>> > >> Software Engineer
>> > >> Google Moscow
>> > >
>> >
>> >
>> >
>> > --
>> > Alexander Potapenko
>> > Software Engineer
>> > Google Moscow
>> >

-- 
Alexander Potapenko
Software Engineer
Google Moscow