[LLVMdev] Proposed Enhancement to AddressSanitizer: Initialization Order

Alexander Potapenko glider at google.com
Tue Jun 26 04:25:16 PDT 2012


On Tue, Jun 26, 2012 at 2:43 PM, Kostya Serebryany <kcc at google.com> wrote:
> +llvmdev, -llvm-dev
>
> On Tue, Jun 26, 2012 at 2:28 PM, Kostya Serebryany <kcc at google.com> wrote:
>>
>> Hi Reid,
>>
>> On Tue, Jun 26, 2012 at 4:30 AM, Reid Watson <reidw at google.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm starting work on a project to detect initialization order problems
>>> in C++ files using AddressSanitizer.
>>> The extension in question will hopefully result in AddressSanitizer
>>> being able to detect initializers which read an undefined value from a
>>> static or global variable defined in another TU.
>>> I'm currently working on this as a patch to AddressSanitizer, but I'm
>>> open to suggestions as to what the proper way to implement this
>>> extension would be.
>>>
>>> One of the simplest examples of this is the following example:
>>> It is undefined what this program will output, and it's fairly easy to
>>> see this behavior.
>>>
>>> When compiled as:
>>> $ clang++ file_1.cpp file_2.cpp main.cpp
>>> $./a.out
>>> x: 2
>>> y: 1
>>>
>>> However, when compiled as:
>>> $ clang++ file_2.cpp file_1.cpp main.cpp
>>> $./a.out
>>> x: 1
>>> y: 2
>>>
>>> //file_1.cpp
>>> extern int y;
>>> int x = y + 1;
>>>
>>> //file_2.cpp
>>> extern int x;
>>> int y = x + 1;
>>>
>>> //main.cpp
>>> #include <iostream>
>>> extern int x,y;
>>>
>>> int main(){
>>>   std::cout << "x: " << x << std::endl;
>>>   std::cout << "y: " << y << std::endl;
>>> }
>>>
>>> Here's a sketch of the detection algorithm:
>>> For each TU:
>>>     1. Before each TU's initializers run, conditionally poison the
>>> global variable shadow memory
>>>         -Each global variable is poisoned, unless it was defined in that
>>> TU
>>>         -Additional information is added to struct __asan_global to
>>> identify which TU a global was declared in
>>
>>
>> This could be tricky.
>> First, we don't want to poison the linker-initialized globals because they
>> are always initialized regardless the TU order.
>>
>> Second, consider we have 3 TUs, t1, t2, and t3, each has a global (g1, g2
>> and g3) with initializer.
>> When we are running initializers in t2, we need to poison g1 and g3, but
>> so far we have seen only g1.
>> I don't know any good and portable way to get g3.
>>
>> One solution is to run the binary twice: once with the default order of TU
>> initializers, and second time with the reverted order (not sure if that's
>> easy).
>>
As we've discussed offline, it may be easy to wrap and re-implement
__libc_global_ctors (which essentially iterates over __CTOR_LIST__ and
calls the ctors for each module in the linkage order).

We can then shuffle the ctors in any order we want, e.g. explicitly
ask for reverse order. Other means of changing the ctor order may
require relinking the binary.
We'll just need to associate each pointer in the __CTOR_LIST__ with
the corresponding per-module structure that describes the globals,
poison all the globals in a certain module after its ctor has been
called, and unpoison all the globals after __libc_global_ctors is
done.




More information about the llvm-dev mailing list