[llvm-commits] cxa_guard elimination
Nick Lewycky
nicholas at mxc.ca
Sun Feb 12 01:38:43 PST 2012
Chris Lattner wrote:
>
> On Feb 9, 2012, at 10:21 AM, Nick Lewycky wrote:
>>> Of course GlobalOpt is interprocedural, since it already has to not be
>>> fouled up by constructor calls.
>>>
>>> Updated patch attached, including more bugfixes than ever before! Please
>>> review!
>>
>> Updated patch again, now without memory corruption. Also slightly more aggressive by ignoring AllocaTmps when scanning through ReadMemory and MutatedMemory.
>>
>> It turns out that llvm-test is not a good test suite for this particular change.
>
>
> Ok. Some thoughts:
>
> virtual void getAnalysisUsage(AnalysisUsage&AU) const {
> + AU.addRequired<DominatorTree>();
>
> Is this *really really* required? This will be the first module pass that requires a function pass in mainline IIRC.
We really just need to show that the load/store to the global can't
precede the top of the guard. Replacing it with a unique-pred walk will
work for cases where it isn't in or preceded by a loop. That's nice, but
it isn't good enough.
What I don't need out of DomTree is building out the tree for the
entirety of the function, particularly the parts dominated by the
guard-top. I'd be happy with a subset. SSAUpdater does this nifty trick
where it creates a subset of the CFG then builds a DomTree on that
alone. I think that idea would work here, but that's too complicated for
me to implement. Refactor-o?
> Does the passmanager only compute DominatorTree for functions lazily when asking for it?
Yep! This actually works!
> Are you sure that GlobalOpt isn't changing CFGs in a way that would invalidate DomTree?
I've done more than a cursory glance to see that it doesn't play with
TerminatorInsts, but I haven't taken the effort to seriously prove it to
myself.
I was assuming that calling getAnalysis<DominatorTree>(F) twice would
actually run the domtree analysis on F twice, especially since we don't
claim to preserve DominatorTree. I haven't checked though.
> @@ -273,6 +283,25 @@
> assert(MSI->getArgOperand(0) == V&& "Memset only takes one pointer!");
> if (MSI->isVolatile()) return true;
> GS.StoredType = GlobalStatus::isStored;
> + } else if (const CallInst *CI = dyn_cast<CallInst>(I)) {
> + Function *Callee = CI->getCalledFunction();
> + if (!Callee)
> + return true;
>
> Please add a comment here along the lines of: "Check to see if this global is being used as an initializer guard by seeing if it is passed into __cxa_guard_acquire/release".
>
> I don't have a strong opinion on this, but does this need to use TargetLibraryInfo to disable the transformation with -fno-builtin?
Done.
> It's not your bug, but that loop really really should use 'continue' to make the control flow more obvious.
I'm not sure about this. Are you asking for the tests to be flipped
around and call continue in the case everything is okay, then have an
unconditional "return true" as the fall-through? I'm not sure that would
be better?
> +static Instruction *UseToInst(Value *V) {
>
> Perhaps this is prevented by how it is called, but you can have an infinite constant loop through a global variable and its initializer.
static char *x = (char*)&x; ?! :-(
Good catch, I haven't fixed this yet locally but I'll get to it. I think
I want to do away with UseToInst entirely.
> The EvaluateBlock/EvaluateFunction interfaces really need to be split out into methods on a helper class that maintain the state, instead of passing the state through a dozen arguments into each recursive call. Can you split this out into its own patch?
>
>
> There are various blocks of logic in OptimizeGuardedInitialization that really need some block comments explaining what is going on :)
>
>
> InstResult = ConstantExpr::get(BO->getOpcode(),
> - getVal(Values, BO->getOperand(0)),
> - getVal(Values, BO->getOperand(1)));
> + Constant *Op0 = getVal(Values, BO->getOperand(0));
> + Constant *Op1 = getVal(Values, BO->getOperand(1));
> + if (!Op0 || !Op1) return false;
>
> It's hard for me to understand what is going on here. Why did getVal suddenly start being able to fail? Can you split this out into its own patch?
This is the ugly part; the constant evaluator for guard elimination
shouldn't be the same as the constant evaluator for running global
constructors.
When running the global constructors, you start at the beginning of
every function, and you run every instruction. You know all function
arguments, because the outer-most function call takes void, and
everything else is either read from a global or it's an Instruction that
dominates it. Furthermore, because this is before-main, you're allowed
to assume that any initializer in a global is the value it has at this
point, even if the global isn't constant.
The way that the cxa_guard elimination works is by finding the first
block that only runs on the first execution of the guard, and then
simulating from there. Because we haven't simulated the blocks leading
up to that point, none of the local variables are present. Furthermore,
we can't really rely on ComputeLoadResult because it will look inside
non-Constant global variable and trust that the initializer is correct,
but we're simulating code past-main so that isn't true any more. The
attached patch handles this by making ComputeLoadResult record what GVs
it looked at when computing its result, then verifying that there were
all constant or set inside the once-initialized code first.
Digression. I'd recently noticed that we have a number of different
little IR interpreters all over, and started making a list. There's
globalopt, and there interpreter executionengine, then there's another
in jump-threading and SCEV has one (getSCEV) or two (EvaluateExpression)
depending on how you count. I suspect there's more, but I'm not actively
looking for them, just writing them down as I see them.
Would it help to refactor them? We could do so, where the user provides
an InstVisitor implementation and we provide the ability to ask for the
result of a given instruction (lazily computing the things it depends
on) or forwards computation, or a mix of both. I think it would help. I
just didn't want to do all this before committing cxa_guard elimination.
Back from the digression. An alternative way to do cxa_guard elimination
using the evaluator in globalopt as it exists today would be to start
the function and make decisions that would lead you towards the
cxa_guard_acquire. This would let us support cases like Bullet returning
an identity matrix by alloca'ing the elements, then in the once storing
to them and running the copy-constructor. Currently that fails because
we haven't executed the allocas. How you actually pick the right values
is tricky (especially when both paths in the CFG work), but still possible.
Anywho... yes I can split this into its own patch, assuming that's still
what you want me to do instead of not abusing globalopt like this. :)
> Overall, I'm really glad you've tackled this. It would help a lot if you could do some of the other patches to make it more clear and straight-forward. It's hard to understand the actual transformation as is :)
Thanks! One of the other cleanups is that I'd like is to make this code
use ConstantFoldLoadFromConstPtr somehow. Reusing the constant folder is
nice because it knows lots of tricks to loading constants that
ComputeLoadResult doesn't, but the gotcha is that bails on globals that
aren't marked constant. GlobalOpt wants to use the definitive
initializers, even if they aren't constant.
I could go through the ConstantExpr I'm given and temporarily mark all
globals constant, call into constant folding, then mark them non-const
afterwards? Eww?
Nick
More information about the llvm-commits
mailing list