[PATCH] D20268: Resolution-based LTO API.

Fri Jul 15 18:21:09 PDT 2016

pcc added inline comments.

================
Comment at: include/llvm/LTO/LTO.h:94
@@ +93,3 @@
+class InputFile {
+  // FIXME: Remove LTO class friendship once we have bitcode symbol tables.
+  friend LTO;
----------------
mehdi_amini wrote:
> > I agree with you, but until we have a symbol table, there needs to be some way for LTO to access IR entities in the input file.
> 
> It is not clear to me why LTO needs to access any IR entity? If you need access an *IR entity* now, how could we don't need access to them tomorrow (<- read: with a symbol table)?
> 
> It seems to me that anywhere you're accessing an IR entity directly through `Obj` right now is a place that should have an API on the InputFile or the Symbol class (like the one that exists: `InputFile::getSourceFileName()`).
> 
> Assuming "Symbol" is abstracting the entries in the symbol table, for instance:
> 
>  -  `Input->Obj->getMemoryBufferRef().getBufferIdentifier();` ->  `Input->getIdentifier()`
>  -  `GlobalValue *GV = Obj->getSymbolGV(Sym.I->getRawDataRefImpl());` -> `Symbol *S = Obj->getSymbol(...)`
>  - `GV->hasGlobalUnnamedAddr();` -> needs an API on class Symbol
>  - `GV->getName();` -> needs an API on class Symbol
>  - `if (Res.VisibleToRegularObj || (GV && Used.count(GV)) ||`  -> Used contains a `GlobalVariable *`, how do we go to a symbol table?
>  - `Module &M = Input->Obj->getModule();` -> used for: 
> 
> ```
>   for (GlobalVariable &GV : M.globals())
>     if (GV.hasAppendingLinkage())
>       Keep.push_back(&GV);
> ```
> which should iterate on `Symbol`s on the `InputFile`.
> 
> - `LTO::addRegularLto` does some symbol resolution using `Obj`. At this point you probably want to operate on real IR construct anyway, and that's the point where you leave the "InputFile" to go to the IR. I think `InputFile` should just expose something like a factory `std::unique_ptr<Module> parseModule(/* options */);`. If it was just for full LTO, that would solve many of the above items as `addSymbolToGlobalRes()` could take a `Module &` instead of operating on the private `IRObjectFile` inside `InputFile`. However it won't help for ThinLTO.
> 
> ```
>     GlobalValue *GV = Input->Obj->getSymbolGV(Sym.I->getRawDataRefImpl());
>     if (Res.Prevailing && GV)
>       ThinLto.PrevailingModuleForGUID[GV->getGUID()] =
>           MBRef.getBufferIdentifier();
> ```
> 
> - `getGUID()` should be implemented on the Symbol directly.
> If you need access an *IR entity* now, how could we don't need access to them tomorrow (<- read: with a symbol table)?

The way I see this evolving is that any uses of IR constructs here will be replaced with uses of APIs for accessing the bitcode symbol table (except in places such as `LTO::addRegularLto` which actually need a `Module`). The functionality exposed via InputFile and Symbol will remain the primary way by which clients will access the object (i.e. they wouldn't access the symbol table directly).

I would prefer not to expose the properties you propose to expose via InputFile/Symbol because they aren't strictly needed by clients for symbol resolution.

================
Comment at: include/llvm/LTO/LTO.h:103
@@ +102,3 @@
+  /// needed to supply an owning LLVMContext for the IRObjectFile.
+  static ErrorOr<std::unique_ptr<InputFile>> create(MemoryBufferRef Object,
+                                                    LTO &Lto);
----------------
mehdi_amini wrote:
> mehdi_amini wrote:
> > I'm not sure, it seems easy to turn it around: "you are the one proposing an API that is targeted for performance reason with a specific use case (and client) in mind, so the burden should be on you to justify it if you want this to get it in.", otherwise write a cleaner API.
> Running ld64 on the 701 bitcode files that make an LTO build of llc (X86 only), and exiting right before the LTO optimizer starts, i.e. after the LTO merge happens (best of 10 times, on a 4-cores laptop):
> 
> - Parallel resolution (dedicated context, double parsing): 7.362s
> - Sequential resolution (shared context, single parsing): 8.834s
> 
> 
Thanks. To me it isn't clear how much of that is due to parsing as opposed to parallel resolution, but I can accept that this would simplify clients. So I suppose I can try to make the change you suggested.

================
Comment at: lib/LTO/LTO.cpp:255
@@ +254,3 @@
+
+    GlobalValue *GV = Input->Obj->getSymbolGV(Sym.I->getRawDataRefImpl());
+    if (Res.Prevailing && GV) {
----------------
mehdi_amini wrote:
> Is is possible for GV to be null here?
Yes, if the symbol is defined by inline asm.

================
Comment at: lib/LTO/LTO.cpp:280
@@ +279,3 @@
+Error LTO::addThinLto(std::unique_ptr<InputFile> Input,
+                      SmallPtrSet<GlobalValue *, 8> &Used,
+                      ArrayRef<SymbolResolution> Res) {
----------------
mehdi_amini wrote:
> > Seems much simpler to always have a module with index 0.
> 
> "Simpler" from the gold point of view, my point of view is that when adding 1000 ThinLTO modules we'll copy 1000 DataLayout around for no reason.
> Which reminds me that I don't expect any object being emitted for this module for a "pure" ThinLTO link.
I suppose we could avoid creating an empty regular LTO object unless a hook is set.

================
Comment at: lib/LTO/LTO.cpp:305
@@ +304,3 @@
+
+    GlobalValue *GV = Input->Obj->getSymbolGV(Sym.I->getRawDataRefImpl());
+    if (Res.Prevailing && GV)
----------------
mehdi_amini wrote:
> (Same question here, can GV be null?)
Ditto

https://reviews.llvm.org/D20268