[llvm-commits] cxa_guard elimination

Sun Feb 12 01:38:43 PST 2012

Chris Lattner wrote:
>
> On Feb 9, 2012, at 10:21 AM, Nick Lewycky wrote:
>>> Of course GlobalOpt is interprocedural, since it already has to not be
>>> fouled up by constructor calls.
>>>
>>> Updated patch attached, including more bugfixes than ever before! Please
>>> review!
>>
>> Updated patch again, now without memory corruption. Also slightly more aggressive by ignoring AllocaTmps when scanning through ReadMemory and MutatedMemory.
>>
>> It turns out that llvm-test is not a good test suite for this particular change.
>
>
> Ok.  Some thoughts:
>
>       virtual void getAnalysisUsage(AnalysisUsage&AU) const {
> +      AU.addRequired<DominatorTree>();
>
> Is this *really really* required?  This will be the first module pass that requires a function pass in mainline IIRC.

We really just need to show that the load/store to the global can't 
precede the top of the guard. Replacing it with a unique-pred walk will 
work for cases where it isn't in or preceded by a loop. That's nice, but 
it isn't good enough.

What I don't need out of DomTree is building out the tree for the 
entirety of the function, particularly the parts dominated by the 
guard-top. I'd be happy with a subset. SSAUpdater does this nifty trick 
where it creates a subset of the CFG then builds a DomTree on that 
alone. I think that idea would work here, but that's too complicated for 
me to implement. Refactor-o?

> Does the passmanager only compute DominatorTree for functions lazily when asking for it?

Yep! This actually works!

> Are you sure that GlobalOpt isn't changing CFGs in a way that would invalidate DomTree?

I've done more than a cursory glance to see that it doesn't play with 
TerminatorInsts, but I haven't taken the effort to seriously prove it to 
myself.

I was assuming that calling getAnalysis<DominatorTree>(F) twice would 
actually run the domtree analysis on F twice, especially since we don't 
claim to preserve DominatorTree. I haven't checked though.

> @@ -273,6 +283,25 @@
>           assert(MSI->getArgOperand(0) == V&&  "Memset only takes one pointer!");
>           if (MSI->isVolatile()) return true;
>           GS.StoredType = GlobalStatus::isStored;
> +      } else if (const CallInst *CI = dyn_cast<CallInst>(I)) {
> +        Function *Callee = CI->getCalledFunction();
> +        if (!Callee)
> +          return true;
>
> Please add a comment here along the lines of: "Check to see if this global is being used as an initializer guard by seeing if it is passed into __cxa_guard_acquire/release".
>
> I don't have a strong opinion on this, but does this need to use TargetLibraryInfo to disable the transformation with -fno-builtin?

Done.

> It's not your bug, but that loop really really should use 'continue' to make the control flow more obvious.

I'm not sure about this. Are you asking for the tests to be flipped 
around and call continue in the case everything is okay, then have an 
unconditional "return true" as the fall-through? I'm not sure that would 
be better?

> +static Instruction *UseToInst(Value *V) {
>
> Perhaps this is prevented by how it is called, but you can have an infinite constant loop through a global variable and its initializer.

static char *x = (char*)&x; ?!  :-(

Good catch, I haven't fixed this yet locally but I'll get to it. I think 
I want to do away with UseToInst entirely.

> The EvaluateBlock/EvaluateFunction interfaces really need to be split out into methods on a helper class that maintain the state, instead of passing the state through a dozen arguments into each recursive call.  Can you split this out into its own patch?
>
>
> There are various blocks of logic in OptimizeGuardedInitialization that really need some block comments explaining what is going on :)
>
>
>        InstResult = ConstantExpr::get(BO->getOpcode(),
> -                                     getVal(Values, BO->getOperand(0)),
> -                                     getVal(Values, BO->getOperand(1)));
> +      Constant *Op0 = getVal(Values, BO->getOperand(0));
> +      Constant *Op1 = getVal(Values, BO->getOperand(1));
> +      if (!Op0 || !Op1) return false;
>
> It's hard for me to understand what is going on here.  Why did getVal suddenly start being able to fail?  Can you split this out into its own patch?

This is the ugly part; the constant evaluator for guard elimination 
shouldn't be the same as the constant evaluator for running global 
constructors.

When running the global constructors, you start at the beginning of 
every function, and you run every instruction. You know all function 
arguments, because the outer-most function call takes void, and 
everything else is either read from a global or it's an Instruction that 
dominates it. Furthermore, because this is before-main, you're allowed 
to assume that any initializer in a global is the value it has at this 
point, even if the global isn't constant.

The way that the cxa_guard elimination works is by finding the first 
block that only runs on the first execution of the guard, and then 
simulating from there. Because we haven't simulated the blocks leading 
up to that point, none of the local variables are present. Furthermore, 
we can't really rely on ComputeLoadResult because it will look inside 
non-Constant global variable and trust that the initializer is correct, 
but we're simulating code past-main so that isn't true any more. The 
attached patch handles this by making ComputeLoadResult record what GVs 
it looked at when computing its result, then verifying that there were 
all constant or set inside the once-initialized code first.

Digression. I'd recently noticed that we have a number of different 
little IR interpreters all over, and started making a list. There's 
globalopt, and there interpreter executionengine, then there's another 
in jump-threading and SCEV has one (getSCEV) or two (EvaluateExpression) 
depending on how you count. I suspect there's more, but I'm not actively 
looking for them, just writing them down as I see them.

Would it help to refactor them? We could do so, where the user provides 
an InstVisitor implementation and we provide the ability to ask for the 
result of a given instruction (lazily computing the things it depends 
on) or forwards computation, or a mix of both. I think it would help. I 
just didn't want to do all this before committing cxa_guard elimination.

Back from the digression. An alternative way to do cxa_guard elimination 
using the evaluator in globalopt as it exists today would be to start 
the function and make decisions that would lead you towards the 
cxa_guard_acquire. This would let us support cases like Bullet returning 
an identity matrix by alloca'ing the elements, then in the once storing 
to them and running the copy-constructor. Currently that fails because 
we haven't executed the allocas. How you actually pick the right values 
is tricky (especially when both paths in the CFG work), but still possible.

Anywho... yes I can split this into its own patch, assuming that's still 
what you want me to do instead of not abusing globalopt like this. :)

> Overall, I'm really glad you've tackled this.  It would help a lot if you could do some of the other patches to make it more clear and straight-forward.  It's hard to understand the actual transformation as is :)

Thanks! One of the other cleanups is that I'd like is to make this code 
use ConstantFoldLoadFromConstPtr somehow. Reusing the constant folder is 
nice because it knows lots of tricks to loading constants that 
ComputeLoadResult doesn't, but the gotcha is that bails on globals that 
aren't marked constant. GlobalOpt wants to use the definitive 
initializers, even if they aren't constant.

I could go through the ConstantExpr I'm given and temporarily mark all 
globals constant, call into constant folding, then mark them non-const 
afterwards? Eww?

Nick