[PATCH] D23488: ThinLTO: add early "dead-stripping" on the Index

Thu Dec 29 10:29:47 PST 2016

tejohnson added a comment.

Here are some stats for ThinLTO linking clang:

  644 function-import              - Number of dead stripped symbols in index

155102 function-import              - Number of live symbols in index

(the patch I will upload in a minute contains these new stats)

The number of functions imported dropped only a very small amount (from 62889 to 62815), and the number of modules imported from went from 24470 to 24449. The number of internalizations was unchanged.

I measured the full ThinLink+BE+link time for clang and there was no consistent change across 3 runs each (stripping enabled/disabled).

So at least for clang, this didn't find much opportunity. I can try for larger internal apps later when it is integrated.

================
Comment at: include/llvm/LTO/LTO.h:389
+    /// Keep track if the symbol was referenced by the regular LTO partition.
+    bool VisibleToRegularLTOPartition = false;
+
----------------
mehdi_amini wrote:
> tejohnson wrote:
> > mehdi_amini wrote:
> > > tejohnson wrote:
> > > > mehdi_amini wrote:
> > > > > tejohnson wrote:
> > > > > > mehdi_amini wrote:
> > > > > > > tejohnson wrote:
> > > > > > > > pcc wrote:
> > > > > > > > > mehdi_amini wrote:
> > > > > > > > > > pcc wrote:
> > > > > > > > > > > Don't you just need one flag: VisibleOutsideThinLTO (or something).
> > > > > > > > > > I think one flag should be enough, but shouldn't we have the same flag for LTO internalization?
> > > > > > > > > > 
> > > > > > > > > Wouldn't that be redundant with Partition == 0?
> > > > > > > > You're right, we only need one flag (forgot that the VisibleToRegularObj flag was added to GlobalResolutions for this patch). I will combine them as suggested.
> > > > > > > You're right! 
> > > > > > > So do we need to record every ThinLTO module as an individual partition? When do we use this?
> > > > > > > We could have `enum partition { Unkown, Global, LTO, ThinLTO }`?
> > > > > > > So do we need to record every ThinLTO module as an individual partition? When do we use this?
> > > > > > 
> > > > > > To detect when symbols are used by multiple ThinLTO modules (i.e. exported). E.g. it transitions from Unknown -> partition1  when first called with Partition=partition1, then from partition1 -> External if invoked again for the same symbol with a different partition2.
> > > > > > To detect when symbols are used by multiple ThinLTO modules (i.e. exported). E.g. it transitions from Unknown -> partition1 when first called with Partition=partition1, then from partition1 -> External if invoked again for the same symbol with a different partition2.
> > > > > 
> > > > > The way partitions are handled and the flag VisibleOutsideThinLTO is still not clear to me, we know when symbols are exported from the index itself why do we need partitions for that?
> > > > > 
> > > > > 
> > > > > The way partitions are handled and the flag VisibleOutsideThinLTO is still not clear to me, we know when symbols are exported from the index itself why do we need partitions for that?
> > > > 
> > > > At some point we have to collate that information to be able to look up which guid are exported from their modules (for creating the ExportedGUIDs set for internalization). We could detect cross module references among ThinLTO modules by walking the index, but since we have the information here already, it seems more efficient to just keep track of it. Otherwise we need to walk the whole index, comparing module paths, etc.
> > > > At some point we have to collate that information to be able to look up which guid are exported from their modules (for creating the ExportedGUIDs set for internalization). We could detect cross module references among ThinLTO modules by walking the index, but since we have the information here already, it seems more efficient to just keep track of it. Otherwise we need to walk the whole index, comparing module paths, etc.
> > > 
> > > Do we have an example that would cover the following case:
> > > 
> > > ```
> > > // main.cpp
> > > void bar1();
> > > void bar2();
> > > void baz();
> > > void foo1() {
> > >   bar1();
> > > }
> > > void foo2() {
> > >   bar2();
> > > }
> > > int main() {
> > >   baz()
> > > }
> > > ````
> > > 
> > > And 
> > > 
> > > ````
> > > // bar.cpp
> > > void bar1() {
> > > }
> > > void bar2() {
> > >   foo2(); // The cycle shouldn't prevent dead-stripping.
> > > }
> > > void baz() {
> > >    // Adding a call to bar() here should not prevent internalization.
> > > }
> > > ````
> > > 
> > > With only main exported by the linker, I expect us to be able to:
> > > 
> > > 1) dead-strip from the index foo() and bar()
> > > 2) "internalize" (drop) foo() and bar() eagerly from the IR
> > > 
> > > I think the current logic should be OK. But I still find the partition thing and the `ExportedGUIDs` confusing.
> > > Do we have an example that would cover the following case:
> > >
> > > ```
> > > // main.cpp
> > > void bar1();
> > > void bar2();
> > > void baz();
> > > void foo1() {
> > >   bar1();
> > > }
> > > void foo2() {
> > >   bar2();
> > > }
> > > int main() {
> > >   baz()
> > > }
> > > ````
> > >
> > > And
> > >
> > > ````
> > > // bar.cpp
> > > void bar1() {
> > > }
> > > void bar2() {
> > >   foo2(); // The cycle shouldn't prevent dead-stripping.
> > > }
> > > void baz() {
> > >    // Adding a call to bar() here should not prevent internalization.
> > 
> > bar1 or bar2? Adding a call to bar2 would prevent its dead stripping and prevent internalization. Adding a call to bar1 should not prevent its internalization.
> > 
> > > }
> > > ````
> > >
> > > With only main exported by the linker, I expect us to be able to:
> > >
> > > 1) dead-strip from the index foo() and bar()
> > 
> > I don't have a cycle case like foo2/bar2, will add. The existing test case covers the foo1/bar1 case. Although note if calls to either bar2 or bar1 are added to baz as you suggest above, it will prevent their dead stripping.
> > 
> > > 2) "internalize" (drop) foo() and bar() eagerly from the IR
> > 
> > Yes the existing test case ensures we internalize where appropriate, even when there were cross-module references (but where both caller/callee were dead per new index-based analysis), and that we remove them as appropriate. Other than the cycle case, which I'll add, the above cases are covered by the new test case in the patch.
> > 
> > >
> > > I think the current logic should be OK. But I still find the partition thing and the `ExportedGUIDs` confusing.
> > bar1 or bar2? Adding a call to bar2 would prevent its dead stripping and prevent internalization. Adding a call to bar1 should not prevent its internalization.
> 
> Sorry, bar1() (I wrote the comment before adding `bar2()`, and yes we should only internalize `bar1()` when called from `baz()`.
Added the circular dependence case to the new test.

================
Comment at: lib/LTO/LTO.cpp:879
           Res.second.Partition == GlobalResolution::External)
         ExportedGUIDs.insert(GlobalValue::getGUID(Res.second.IRName));
     }
----------------
tejohnson wrote:
> mehdi_amini wrote:
> > Could we *not* add DeadSymbols to `ExportedGUIDs` here and remove the `DeadSymbols.count(GUID)` from the test below?
> Good idea.
Done.

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:727
             ExportList->second.count(GUID)) ||
-           GUIDPreservedSymbols.count(GUID);
+           (!DeadSymbols.count(GUID) && GUIDPreservedSymbols.count(GUID));
   };
----------------
mehdi_amini wrote:
> tejohnson wrote:
> > mehdi_amini wrote:
> > > tejohnson wrote:
> > > > Actually, in ThinLTOCodeGenerator checks here and elsewhere it looks like this test can just be !DeadSymbols.count(GUID) rather than (!DeadSymbols.count(GUID) && GUIDPreservedSymbols.count(GUID)). Here the same GUIDPreservedSymbols set was passed computeDeadSymbols, so they would already be excluded from the DeadSymbol set. ThinLTOCodeGenerator does not seem to distinguish between references outside of ThinLTO vs cross-references between ThinLTO modules, which in fact means that the dead analysis will be conservative here...
> > > I think the expectation is that GUIDPreservedSymbols is only supposed to contain symbols visible outside of the ThinLTO partition.
> > > 
> > > From the header:
> > > 
> > > ```
> > >   /// Set of symbols that need to be preserved outside of the set of bitcode
> > >   /// files.
> > >   StringSet<> PreservedSymbols;
> > > ```
> > > 
> > > Even though in the current impl. file:
> > > 
> > > ```
> > > void ThinLTOCodeGenerator::crossReferenceSymbol(StringRef Name) {
> > >   // FIXME: At the moment, we don't take advantage of this extra information,
> > >   // we're conservatively considering cross-references as preserved.
> > >   //  CrossReferencedSymbols.insert(Name);
> > >   PreservedSymbols.insert(Name);
> > > }
> > > ```
> > > Even though in the current impl. file:
> > 
> > Right, that's what I was referring to. Any reason crossReferenceSymbol() doesn't add to a different set (the commented out CrossReferencedSymbols?) - although the isExported checks here and elsewhere in ThinLTOCodeGenerator would need to change if that happened. Maybe better to just switch this to the new LTO API?
> Switching to the new LTO API is one reason I haven't fixed this yet.
> 
> Actually, in ThinLTOCodeGenerator checks here and elsewhere it looks like this test can just be !DeadSymbols.count(GUID) rather than (!DeadSymbols.count(GUID) && GUIDPreservedSymbols.count(GUID)). Here the same GUIDPreservedSymbols set was passed computeDeadSymbols, so they would already be excluded from the DeadSymbol set. ThinLTOCodeGenerator does not seem to distinguish between references outside of ThinLTO vs cross-references between ThinLTO modules, which in fact means that the dead analysis will be conservative here...

I had this backwards: because DeadSymbols was computed from GUIDPreservedSymbols, we only need to check GUIDPreservedSymbols here (anything in GUIDPreservedSymbols will necessarily not be in DeadSymbols). I've made that change here and elsewhere in this file (which basically puts us back to your original version of the patch for these checks...now I know why the isExported lambda checks didn't change here originally...).

https://reviews.llvm.org/D23488