[LLVMdev] Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Duncan P. N. Exon Smith dexonsmith at apple.com
Thu Jun 4 11:27:53 PDT 2015


> On 2015-Jun-04, at 10:13, Teresa Johnson <tejohnson at google.com> wrote:
> 
> On Thu, Jun 4, 2015 at 9:51 AM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> 
>>> On 2015-Jun-04, at 07:10, Teresa Johnson <tejohnson at google.com> wrote:
>>> 
>>> On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith
>>> <dexonsmith at apple.com> wrote:
>>>> 
>>>>> On 2015-Jun-03, at 09:56, Teresa Johnson <tejohnson at google.com> wrote:
>>>>> 
>>>>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>> 
>>>>>> You might well be right - I know next to nothing about these things. Having
>>>>>> someone a double-check/review from someone more familiar would be nice.
>>>>>> 
>>>>>> I'll leave the rest of the review to any such someone who might be
>>>>>> sufficiently informed about the pass pipeline to decide whether this is the
>>>>>> right call. (and/or perf numbers that demonstrate it)
>>>>>> 
>>>>>> Hopefully my naive review has at least covered the basics.
>>>>> 
>>>>> I just did some SPEC cpu2006 performance testing with this change, at
>>>>> -O2 both with and without LTO. There were no measurable performance
>>>>> differences. In fact, the final binaries were largely identical
>>>>> (always the same size before and after, generated code looks
>>>>> identical, slight differences in label naming are the only differences
>>>>> in the nm and objdump output for a couple benchmarks - most are
>>>>> totally identical).
>>>>> 
>>>>> Can someone take a look at the patch and approve if possible? While
>>>>> currently this doesn't affect large numbers of functions, with ThinLTO
>>>>> we want to avoid keeping around the larger numbers of
>>>>> available-externally imported objects unnecessarily. This change
>>>>> avoids the need of special casing here for ThinLTO imported
>>>>> functions/variables.
>>>> 
>>>> I'd be interested in performance of a clang bootstrap or of Chromium (or
>>>> some other large C++ program built with -flto).
>>>> 
>>>> Moreover, I suspect this will cause a problem for full LTO, where a lot
>>>> of inlining takes place at link time.  There have even been proposals
>>>> (but no one has collected any data to share yet) to run just the
>>>> always-inliner at compile time, deferring almost all inlining until
>>>> later.
>>> 
>>> Note that one of the sets of SPEC cpu2006 data I collected was with
>>> full LTO. I'm trying to understand where this would be an issue for
>>> full LTO, since GlobalDCE is running after inlining in the link time
>>> LTO pipeline.
>>> 
>>> Just to be clear, are you concerned about the following:
>>> 
>>> 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an
>>> unreferenced available externally function
>>> 2) LTO link step would have inlined that available externally function
>>> somewhere (must be a different module since it was unreferenced in
>>> step 1) but now can't
>>> 
>>> I'm not sure when 2) can happen since it had to be unreferenced in its
>>> module to be removed with my changes.
>> 
>> That's not what it looks like to me.  Here's what I think the relevant
>> part of your patch is:
>> 
>> On 2015-May-18, at 21:09, Teresa Johnson <tejohnson at google.com> wrote:
>>> 
>>> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
>>>    for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>>>      for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
>>>        for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
>>> -          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
>>> -            GlobalIsNeeded(GV);
>>> +          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
>>> +            if (!GV->hasAvailableExternallyLinkage())
>>> +              GlobalIsNeeded(GV);
>>> +          }
>>>          else if (Constant *C = dyn_cast<Constant>(*U))
>>>            MarkUsedGlobalsAsNeeded(C);
>>>  }
>> 
>> 
>> IIUC, this changes the logic from "if it's referenced, keep it" to
>> "if it's referenced and not available_externally, keep it".
> 
> Sorry, you're right, I conflated a couple of different things when I
> looked at this again this morning. For ThinLTO I don't in fact need
> this change for anything that is fully inlined after importing.
> 
>> 
>> In particular, I'm worried about GlobalDCE removing a *referenced*
>> available_externally function, particularly if/when we stop inlining at
>> the compile step when -flto.
> 
> Ok, that makes sense. In that case, I agree that doing this for
> ThinLTO imported functions is the right way to go. My initial approach
> was to mark the functions that were ThinLTO-imported and check that
> here. I think we will likely want to mark the imported functions as
> such regardless, since we may want to apply different optimization
> thresholds for imported functions, or at least use for debugging and
> tracing, so we could consider using that approach here.
> 
> Another possibility as you mentioned was to drop in a new pass in the
> ThinLTO backend optimization pipeline just for this. Or perhaps the
> GlobalDCE pass invocation could take a flag indicating that it should
> remove the avail extern functions, that we can set in the ThinLTO
> backend case. It seems like most of the code would be shared between
> the current GlobalDCE and any new ThinLTO-specific version that
> setting GlobalDCE up to do this optionally (either per-imported
> function or with a configuration flag on the pass) made sense.

Since the compiler is always free to delete available_externally
functions, I think you could just add a pass to the -flto=thin pipeline
that deletes all of them (referenced or not) -- it's just a single loop
through all the functions deleting the bodies of those with the right
linkage.  I imagine there are other pass pipelines that might want to do
something similar.  I don't really like having GlobalDCE delete them
(even behind a flag) because they aren't actually dead, and I think a
separate pass makes it easier to test and all that.  (I haven't actually
worked much with pass pipelines, though, so there's a chance I'm on my
own here?)

You make an interesting point about applying different thresholds to
imported functions, but is there any reason not to change all
available_externally functions in the some way?  Moreover, it feels like
thin-LTO's imported functions are a new source of functions with
available_externally linkage, but otherwise they aren't interestingly
different.





More information about the llvm-dev mailing list