[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)
Duncan P. N. Exon Smith
dexonsmith at apple.com
Thu Jun 4 09:51:01 PDT 2015
> On 2015-Jun-04, at 07:10, Teresa Johnson <tejohnson at google.com> wrote:
>
> On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>>
>>> On 2015-Jun-03, at 09:56, Teresa Johnson <tejohnson at google.com> wrote:
>>>
>>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>> You might well be right - I know next to nothing about these things. Having
>>>> someone a double-check/review from someone more familiar would be nice.
>>>>
>>>> I'll leave the rest of the review to any such someone who might be
>>>> sufficiently informed about the pass pipeline to decide whether this is the
>>>> right call. (and/or perf numbers that demonstrate it)
>>>>
>>>> Hopefully my naive review has at least covered the basics.
>>>
>>> I just did some SPEC cpu2006 performance testing with this change, at
>>> -O2 both with and without LTO. There were no measurable performance
>>> differences. In fact, the final binaries were largely identical
>>> (always the same size before and after, generated code looks
>>> identical, slight differences in label naming are the only differences
>>> in the nm and objdump output for a couple benchmarks - most are
>>> totally identical).
>>>
>>> Can someone take a look at the patch and approve if possible? While
>>> currently this doesn't affect large numbers of functions, with ThinLTO
>>> we want to avoid keeping around the larger numbers of
>>> available-externally imported objects unnecessarily. This change
>>> avoids the need of special casing here for ThinLTO imported
>>> functions/variables.
>>
>> I'd be interested in performance of a clang bootstrap or of Chromium (or
>> some other large C++ program built with -flto).
>>
>> Moreover, I suspect this will cause a problem for full LTO, where a lot
>> of inlining takes place at link time. There have even been proposals
>> (but no one has collected any data to share yet) to run just the
>> always-inliner at compile time, deferring almost all inlining until
>> later.
>
> Note that one of the sets of SPEC cpu2006 data I collected was with
> full LTO. I'm trying to understand where this would be an issue for
> full LTO, since GlobalDCE is running after inlining in the link time
> LTO pipeline.
>
> Just to be clear, are you concerned about the following:
>
> 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an
> unreferenced available externally function
> 2) LTO link step would have inlined that available externally function
> somewhere (must be a different module since it was unreferenced in
> step 1) but now can't
>
> I'm not sure when 2) can happen since it had to be unreferenced in its
> module to be removed with my changes.
That's not what it looks like to me. Here's what I think the relevant
part of your patch is:
On 2015-May-18, at 21:09, Teresa Johnson <tejohnson at google.com> wrote:
>
> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
> for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
> for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
> for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
> - if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
> - GlobalIsNeeded(GV);
> + if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
> + if (!GV->hasAvailableExternallyLinkage())
> + GlobalIsNeeded(GV);
> + }
> else if (Constant *C = dyn_cast<Constant>(*U))
> MarkUsedGlobalsAsNeeded(C);
> }
IIUC, this changes the logic from "if it's referenced, keep it" to
"if it's referenced and not available_externally, keep it".
In particular, I'm worried about GlobalDCE removing a *referenced*
available_externally function, particularly if/when we stop inlining at
the compile step when -flto.
> For an available externally
> function, any reference from a different module would have needed its
> own copy to start with (at least that is the case with C inline
> functions). It looks like available externally C inline functions are
> already removed when unreferenced by the compile step (e.g. when
> inlined into that module's references during the "-flto -O2 -c"
> compile).
>
>>
>> But if we can have different pass pipelines for "normal" vs. "LTO"
>> compiles, then "ThinLTO" can have its own pass pipeline as well. Seems
>> like it already needs one. Why not add a late pass there to drop
>> available_externally functions?
>>
>
> Sure that's possible, but I am trying to understand what circumstance
> there is where a full LTO would need to keep around available
> externally functions after GlobalDCE where ThinLTO wouldn't.
> Presumably if they are useful for optimization after GlobalDCE then we
> would want to keep them around longer for ThinLTO as well. I just
> haven't identified any need to do so.
>
> Thanks,
> Teresa
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
More information about the llvm-dev
mailing list