[llvm-commits] cxa_guard elimination

Sat Jan 28 23:10:28 PST 2012

Nick Lewycky wrote:
> Eli Friedman wrote:
>> On Sat, Jan 7, 2012 at 6:48 PM, Nick Lewycky<nicholas at mxc.ca>   wrote:
>>> This patch implements some really basic elimination of calls to __cxa_guard.
>>> More advanced cases can be added on request. Patch attached, please review!
>>>
>>> There's one thing I need to draw to a reviewer's attention. I make the
>>> assumption that the code in the 'not-yet-initialized' path is indeed
>>> initializing. That is to say that I assume that this pseudo-code:
>>>
>>>    *global = x;
>>>    if (expr) {
>>>      if (cxa_guard_acquire(global_guard)) {
>>>        *global = 0;
>>>        cxa_guard_release(global_guard);
>>>      }
>>>    }
>>>    use(global);
>>>
>>> represents undefined behavior because global is initialized before the call
>>> to __cxa_guard_acquire, and the guard is protecting a normal store, not the
>>> first store.
>>>
>>> The Itanium C++ ABI never states that guard_acquire/release must be used for
>>> *initializing*, so this patch arguably can miscompile. However, it's
>>> generally understood that this is for initialization (indeed, the section
>>> heading is "Once-time initialization API") so I'm hoping we can agree that
>>> this assumption is valid and ask users doing other things with
>>> cxa_guard_acquire/release to pass -fno-builtins.
>>
>> I think I agree with your assumption for the global associated with
>> the guard.  You can't make that assumption for all globals, though.
>> Consider something like the following:
>>
>> static int x;
>> struct C { C() { x = 10; } };
>> int f() { x = 20; static C dummy; return x; }
>>
>> As far as I can tell, with a bit of inlining etc. we end up with code
>> which looks exactly like your bad pattern.
>
> Clever! Thanks for the testcase.
>
> That's unfortunate, but repairable. Given that we already check that all
> accesses come from within the same function, we'll need to show that no
> access is made to the global before the acquire is reached (and don't
> forget recursive calls).
>
> Regrettably, it means that we'll probably need a domtree (or else do
> potentially-expensive CFG analysis ourselves). I'll see whether it still
> fits in -simplify-libcalls when I'm done.

... done!

The good news is that we already have a valid DomTree by the time 
simplify-libcalls runs, so we're good to request one here.

The updated patch is barely more complicated, there's just an additional 
check to see whether there's a single predecessor block that's testing 
the guard variable, and an additional domtree BB test (so, no linear 
time) on each StoreInst. No sweat!

Please review! I'd like to land this patch and then start working on 
some of its deficiencies in follow-up patches.

It doesn't transform a lot of things due to pass ordering problems, and 
even when it does fire, we don't get optimal code because of more pass 
ordering problems. One thing to note is that we put it in 
simplify-libcalls because it's folding away known library calls, but it 
might be better off in globalopt because what it's really doing is 
converting dynamic initialization to static initialization. I haven't 
really decided what the right move is yet, but I'm seeing things like:

   static int x;
   struct C { C() { x = 10; } };
   int f() { static C dummy; return x; }

fail to optimize because 'x' is written to in the two constructor 
functions the ABI requires, and those functions get deleted at the end 
of the optz'n pass anyways. The good news is that we get near-optimal 
code on that testcase if you run opt -O2 twice, or if you wrap struct C 
in an anonymous namespace. Progress!

Nick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cxa-guard-2.patch
Type: text/x-patch
Size: 4008 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120128/0eda24eb/attachment.bin>