[llvm-commits] patch: CXAGuardElimination pass.

Tue May 26 09:07:07 PDT 2009

Eli Friedman wrote:
> On Tue, May 26, 2009 at 7:06 AM, Luke Dalessandro
> <luked at cs.rochester.edu> wrote:
>> Nick Lewycky wrote:
>>> Eli Friedman wrote:
>>>> On Mon, May 25, 2009 at 5:17 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
>>>>> Eli Friedman wrote:
>>>>>> On Mon, May 25, 2009 at 3:59 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
>> [cut]
>>
>>>>>> Ooh, that's right.  Actually, it gets nastier than that: suppose
>>>>>> something like the following:
>>>>>>
>>>>>> static int gval1 = 1;
>>>>>> int gval2 = 1;
>>>>>> struct A { A() { if (gval1) { printf("%d\n", gval2); } };
>>>>>> void a() { static A x(); gval2 = 0;}
>>>>>> void b() { a(); }
>>>>>> void c() { gval1 = 0; a(); }
>>>>>>
>>>>>> Then suppose b() and c() are called at the same time on separate
>>>>>> threads, and we've inlined/optimized everything; if the guard is
>>>>>> removed from c(), we can end up printing out "0", which shouldn't be
>>>>>> possible because setting gval2 to zero only happens after the
>>>>>> constructor for A finishes.
>>>>>>
>>>>> Right. deadGuardElim checks for memory *reads* as well as writes and
>>>>> unwinds in order to prevent this sort of thing from happening.
>>>>>
>>>> If a() gets inlined into c(), the load inside A() of gval1 can be
>>>> eliminated; I'm not sure if we do that particular optimization at the
>>>> moment, though.
>> This example has data races on gval1 and gval2. The LLVM VM doesn't have
>> a memory model as far as I know, which basically means that all sorts of
>> crazy things are allowed to happen to this code. Printing out 0 is
>> probably fine. Even in something like the Java Memory Model printing out
>> 0 is fine, given the raciness of the code.
> 
> There is no race on gval2: both threads only set gval2 after "static A
> x()" is fully constructed, and we don't care about the value at that
> point.  The race of gval1 is irrelevant to the point; I left out
> synchronization for it to simplify the example.

Hi Eli,

I'm sorry that I wasn't clear.

I agree that if the cxa_guard routines are meant to have locking 
semantics, then there is no data race on gval2 (given a Java/C++0X-ish 
style synchronization model).

What I was trying to point out was that, if the cxa guards don't have a 
memory model synchronization component, then the transformation is fine, 
and the unexpected result is fine.

If the cxa guards to have a synchronization component, then it should be 
modeled as part of the guard call, i.e., the acquire call clobbers 
memory and serves as a compiler fence. Whatever the modeled memory 
behavior of gcc's __sync_lock_test_and_set is probably fine. This should 
suppress the optimization that eliminates the load inside of c() when 
the guarded constructor is inlined (or eliminates the load inside c() by 
hoisting the acquire over the store to gval1). I think this prevents 
guarded constructors that have side effects from becoming side effect 
free -- allowing the guard eliminator to run as coded.

I'm still pretty sure that removing guard pairs is likely to require 
leaving behind a compiler+hardware memory fence though, which will 
prevent future optimizations (and the hardware) from doing things they 
shouldn't.

Luke

> 
> -Eli
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits