[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Duncan P. N. Exon Smith dexonsmith at apple.com
Thu Jun 4 09:51:01 PDT 2015


> On 2015-Jun-04, at 07:10, Teresa Johnson <tejohnson at google.com> wrote:
> 
> On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> 
>>> On 2015-Jun-03, at 09:56, Teresa Johnson <tejohnson at google.com> wrote:
>>> 
>>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>> 
>>>> You might well be right - I know next to nothing about these things. Having
>>>> someone a double-check/review from someone more familiar would be nice.
>>>> 
>>>> I'll leave the rest of the review to any such someone who might be
>>>> sufficiently informed about the pass pipeline to decide whether this is the
>>>> right call. (and/or perf numbers that demonstrate it)
>>>> 
>>>> Hopefully my naive review has at least covered the basics.
>>> 
>>> I just did some SPEC cpu2006 performance testing with this change, at
>>> -O2 both with and without LTO. There were no measurable performance
>>> differences. In fact, the final binaries were largely identical
>>> (always the same size before and after, generated code looks
>>> identical, slight differences in label naming are the only differences
>>> in the nm and objdump output for a couple benchmarks - most are
>>> totally identical).
>>> 
>>> Can someone take a look at the patch and approve if possible? While
>>> currently this doesn't affect large numbers of functions, with ThinLTO
>>> we want to avoid keeping around the larger numbers of
>>> available-externally imported objects unnecessarily. This change
>>> avoids the need of special casing here for ThinLTO imported
>>> functions/variables.
>> 
>> I'd be interested in performance of a clang bootstrap or of Chromium (or
>> some other large C++ program built with -flto).
>> 
>> Moreover, I suspect this will cause a problem for full LTO, where a lot
>> of inlining takes place at link time.  There have even been proposals
>> (but no one has collected any data to share yet) to run just the
>> always-inliner at compile time, deferring almost all inlining until
>> later.
> 
> Note that one of the sets of SPEC cpu2006 data I collected was with
> full LTO. I'm trying to understand where this would be an issue for
> full LTO, since GlobalDCE is running after inlining in the link time
> LTO pipeline.
> 
> Just to be clear, are you concerned about the following:
> 
> 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an
> unreferenced available externally function
> 2) LTO link step would have inlined that available externally function
> somewhere (must be a different module since it was unreferenced in
> step 1) but now can't
> 
> I'm not sure when 2) can happen since it had to be unreferenced in its
> module to be removed with my changes.

That's not what it looks like to me.  Here's what I think the relevant
part of your patch is:

On 2015-May-18, at 21:09, Teresa Johnson <tejohnson at google.com> wrote:
> 
> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
>     for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>       for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
>         for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
> -          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
> -            GlobalIsNeeded(GV);
> +          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
> +            if (!GV->hasAvailableExternallyLinkage())
> +              GlobalIsNeeded(GV);
> +          }
>           else if (Constant *C = dyn_cast<Constant>(*U))
>             MarkUsedGlobalsAsNeeded(C);
>   }


IIUC, this changes the logic from "if it's referenced, keep it" to
"if it's referenced and not available_externally, keep it".

In particular, I'm worried about GlobalDCE removing a *referenced*
available_externally function, particularly if/when we stop inlining at
the compile step when -flto.

> For an available externally
> function, any reference from a different module would have needed its
> own copy to start with (at least that is the case with C inline
> functions). It looks like available externally C inline functions are
> already removed when unreferenced by the compile step (e.g. when
> inlined into that module's references during the "-flto -O2 -c"
> compile).
> 
>> 
>> But if we can have different pass pipelines for "normal" vs. "LTO"
>> compiles, then "ThinLTO" can have its own pass pipeline as well.  Seems
>> like it already needs one.  Why not add a late pass there to drop
>> available_externally functions?
>> 
> 
> Sure that's possible, but I am trying to understand what circumstance
> there is where a full LTO would need to keep around available
> externally functions after GlobalDCE where ThinLTO wouldn't.
> Presumably if they are useful for optimization after GlobalDCE then we
> would want to keep them around longer for ThinLTO as well. I just
> haven't identified any need to do so.
> 
> Thanks,
> Teresa
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413





More information about the llvm-dev mailing list