[llvm-commits] cxa_guard elimination

Eli Friedman eli.friedman at gmail.com
Wed Feb 1 14:47:50 PST 2012


On Sat, Jan 28, 2012 at 11:50 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
> Nick Lewycky wrote:
>>
>> Nick Lewycky wrote:
>>>
>>> Eli Friedman wrote:
>>>>
>>>> On Sat, Jan 7, 2012 at 6:48 PM, Nick Lewycky<nicholas at mxc.ca> wrote:
>>>>>
>>>>> This patch implements some really basic elimination of calls to
>>>>> __cxa_guard.
>>>>> More advanced cases can be added on request. Patch attached, please
>>>>> review!
>>>>>
>>>>> There's one thing I need to draw to a reviewer's attention. I make the
>>>>> assumption that the code in the 'not-yet-initialized' path is indeed
>>>>> initializing. That is to say that I assume that this pseudo-code:
>>>>>
>>>>> *global = x;
>>>>> if (expr) {
>>>>> if (cxa_guard_acquire(global_guard)) {
>>>>> *global = 0;
>>>>> cxa_guard_release(global_guard);
>>>>> }
>>>>> }
>>>>> use(global);
>>>>>
>>>>> represents undefined behavior because global is initialized before
>>>>> the call
>>>>> to __cxa_guard_acquire, and the guard is protecting a normal store,
>>>>> not the
>>>>> first store.
>>>>>
>>>>> The Itanium C++ ABI never states that guard_acquire/release must be
>>>>> used for
>>>>> *initializing*, so this patch arguably can miscompile. However, it's
>>>>> generally understood that this is for initialization (indeed, the
>>>>> section
>>>>> heading is "Once-time initialization API") so I'm hoping we can
>>>>> agree that
>>>>> this assumption is valid and ask users doing other things with
>>>>> cxa_guard_acquire/release to pass -fno-builtins.
>>>>
>>>>
>>>> I think I agree with your assumption for the global associated with
>>>> the guard. You can't make that assumption for all globals, though.
>>>> Consider something like the following:
>>>>
>>>> static int x;
>>>> struct C { C() { x = 10; } };
>>>> int f() { x = 20; static C dummy; return x; }
>>>>
>>>> As far as I can tell, with a bit of inlining etc. we end up with code
>>>> which looks exactly like your bad pattern.
>>>
>>>
>>> Clever! Thanks for the testcase.
>>>
>>> That's unfortunate, but repairable. Given that we already check that all
>>> accesses come from within the same function, we'll need to show that no
>>> access is made to the global before the acquire is reached (and don't
>>> forget recursive calls).
>>>
>>> Regrettably, it means that we'll probably need a domtree (or else do
>>> potentially-expensive CFG analysis ourselves). I'll see whether it still
>>> fits in -simplify-libcalls when I'm done.
>>
>>
>> ... done!
>>
>> The good news is that we already have a valid DomTree by the time
>> simplify-libcalls runs, so we're good to request one here.
>>
>> The updated patch is barely more complicated, there's just an additional
>> check to see whether there's a single predecessor block that's testing
>> the guard variable, and an additional domtree BB test (so, no linear
>> time) on each StoreInst. No sweat!
>>
>> Please review! I'd like to land this patch and then start working on
>> some of its deficiencies in follow-up patches.
>
>
> Uh... I mailed out the entirely wrong patch.
>
> Correct patch attached!
>
> Nick
>
>>
>> It doesn't transform a lot of things due to pass ordering problems, and
>> even when it does fire, we don't get optimal code because of more pass
>> ordering problems. One thing to note is that we put it in
>> simplify-libcalls because it's folding away known library calls, but it
>> might be better off in globalopt because what it's really doing is
>> converting dynamic initialization to static initialization. I haven't
>> really decided what the right move is yet, but I'm seeing things like:
>>
>> static int x;
>> struct C { C() { x = 10; } };
>> int f() { static C dummy; return x; }
>>
>> fail to optimize because 'x' is written to in the two constructor
>> functions the ABI requires, and those functions get deleted at the end
>> of the optz'n pass anyways. The good news is that we get near-optimal
>> code on that testcase if you run opt -O2 twice, or if you wrap struct C
>> in an anonymous namespace. Progress!

+      if (C) {
+        if (isa<GlobalValue>(C))
+          return true;  // eg., used in initializer of another global.
+        if (BlockAddress *BA = dyn_cast<BlockAddress>(C))
+          return BA->getFunction() == F;

Why are you handling BlockAddress specially here? And what is this
handling supposed to even do?  (I'm having a bit of trouble reasoning
about it because I don't think it's possible to hit this case with
your patch.)

GlobalOpt has some similar simulation code; it would be nice if you
could somehow leverage that (but it's not a hard requirement).

Otherwise, this is looking good.

-Eli



More information about the llvm-commits mailing list