[llvm-commits] patch: CXAGuardElimination pass.

Mon May 25 17:17:49 PDT 2009

Eli Friedman wrote:
> On Mon, May 25, 2009 at 3:59 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
>> Eli Friedman wrote:
>>> On Sun, May 24, 2009 at 11:25 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
>>>> Eli Friedman wrote:
>>>>> On Sun, May 24, 2009 at 9:34 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
>>>>>> However, it doesn't simplify it down all the way. See llvm.org/PR4261 for an
>>>>>> example of what happens after this optimization is applied on the above
>>>>>> program. We may decide that PR4261 is too hard to fix in general and just
>>>>>> add some extra logic to this pass, but I'd rather have this committed for a
>>>>>> start.
>>>>> Couldn't you just change AcquireRet from a constant 1 to a constant 0?
>>>>>  If it's safe to remove the guard, I don't see how the chosen path
>>>>> could make a difference.
>>>> Release has to run. It has the visible effect of changing the guard
>>>> variable to a 1. Nowhere in this pass do we prove those two calls are
>>>> the only places looking at the guard variable.
>>> Assuming it's actually a guard, nothing besides the guard should care
>>> whether the guard variable initialized or not in the current call. The
>>> pass as written already makes assumptions that agree with that: for
>>> example, it doesn't bother to check for instructions with side-effects
>>> after the call to __cxa_guard_release.
>> I'm going to change my mind on this one. That's not safe. Here's my example.
>>
>> We have a function that calls __cxa_guard_acquire/release on a single
>> guard variable. Then we decide to clone it as part of partial
>> specialization.
>>
>> The initialization being guarded depends on some global variable -- if
>> it's zero it does nothing, else it prints "foo". When cloneA is run we
>> know that the global is always 0 so the initialization does nothing and
>> we eliminate the guards. In cloneB the global is always non-zero, so it
>> always prints "foo" to the screen and we keep the guards.
>>
>> When cloneA runs it needs to mark the guard variable as being "already
>> initialized" or else when cloneB runs it will go ahead and print "foo"
>> and it's not supposed to. You've changed the behaviour of the program.
> 
> Ooh, that's right.  Actually, it gets nastier than that: suppose
> something like the following:
> 
> static int gval1 = 1;
> int gval2 = 1;
> struct A { A() { if (gval1) { printf("%d\n", gval2); } };
> void a() { static A x(); gval2 = 0;}
> void b() { a(); }
> void c() { gval1 = 0; a(); }
> 
> Then suppose b() and c() are called at the same time on separate
> threads, and we've inlined/optimized everything; if the guard is
> removed from c(), we can end up printing out "0", which shouldn't be
> possible because setting gval2 to zero only happens after the
> constructor for A finishes.

Right. deadGuardElim checks for memory *reads* as well as writes and 
unwinds in order to prevent this sort of thing from happening. In the 
example I gave above it only works because the "partial specialization" 
step also eliminated the dead read of that 'some global variable'.

I'm still not sure what to do about the case where someone uses the 
result from the Acquire to do more than just perform initialization 
inside the release. You brought this up in an earlier email, something like:

   if (x = acquire()) {
     release();
     return 1;
   }
   return 0;

would break us because we see that there is nothing between acquire() 
and release() but we can't just eliminate it. We still need to return 1 
the first time and 0 on subsequent calls.

Nick