[PATCH] D23488: ThinLTO: add early "dead-stripping" on the Index

Mon Dec 19 17:07:10 PST 2016

pcc added a comment.

In https://reviews.llvm.org/D23488#627220, @tejohnson wrote:

> In https://reviews.llvm.org/D23488#627205, @pcc wrote:
>
> > In https://reviews.llvm.org/D23488#627164, @tejohnson wrote:
> >
> > > In https://reviews.llvm.org/D23488#627154, @pcc wrote:
> > >
> > > > This information will probably need to be encoded in the summary somehow once we're at the point where we no longer need to load the module during symbol resolution.
> > >
> > >
> > > At that point we will have a separately loadable symbol table in the bitcode file, but there again I guess we will not know anything about these llvm.* symbols in the table, right?
> >
> >
> > I was envisioning that the live bits in the global summary wouldn't be applied to the llvm.* symbols themselves but to the globals referenced by them.
>
>
> Either one works for the index-based dead stripping. But my point was we need something to flag this in the summary because these symbols won't be in the bitcode symbol table (essentially the situation we're running into here).

Sure, that makes sense. I was thinking about whether we'd also need a separate "used" bit in the bitcode symbol table, so that we can apply the right semantics to those symbols in both regular and thin LTO. In that case we'd actually want to mark the globals referenced by llvm.used rather than just marking llvm.used itself (because the refs wouldn't be available in the BC symtab). But I think that's probably best dealt with orthogonally.

>>>> I think at this point we just need to collect a set of GC roots in add{Regular,Thin}LTO in the same way that we collect the set of used variables with collectUsedGlobalVariables.
>>> 
>>> Presumably using a new flag added to the summary, right? Or did you have a different approach in mind for now?
>> 
>> Yes, it could be done in the summary. I was also thinking you could have a function called something like collectGCRoots that does something like this:
>> 
>>   void collectGCRoots(Module *M, SmallSet<GlobalValue *> &GCRoots) {
>>     collectUsedGlobalVariables(M, GCRoots, true);
>>     collectUsedGlobalVariables(M, GCRoots, false);
>>     // similarly for llvm.global_ctors and llvm.global_dtors
>>   }
>> 
>> 
>> then use GCRoots to decide whether to set the bit in GlobalResolution. But now that I think about it, I think I prefer the approach of setting live bits in the summary, for example it would seem to make it much simpler to pull the live lists of type identifiers out of the combined summary for CFI/devirtualization.
> 
> We don't want to parse the module and invoke collectUsedGlobalVariables when we do the thin link, which is where we need this info. Oh - I see that we are currently doing this. ISTR that this is only temporary (I hope - we don't want to have to parse the IR in the ThinLink!).

Right, this is the thing we want to avoid doing with the bitcode symbol table.

> So it would have to be done during the compile step (where we could do what you suggest above). And then we put it in the summary. Although like I mentioned above, we can just flag the llvm.* variables as the live roots, that's simpler than walking all of their references and flagging them.

Works for me.

https://reviews.llvm.org/D23488