[PATCH] D27313: LTO: Add support for multi-module bitcode files.

Tue Dec 13 17:37:37 PST 2016

mehdi_amini added inline comments.

================
Comment at: llvm/lib/LTO/LTO.cpp:456

-  ThinLTO.ModuleMap[MBRef.getBufferIdentifier()] = MBRef;
+  ThinLTO.ModuleMap.insert({BM.getModuleIdentifier(), BM});
   return Error::success();
----------------
pcc wrote:
> mehdi_amini wrote:
> > pcc wrote:
> > > mehdi_amini wrote:
> > > > We'll arrive here from ` for (InputFile::InputModule &IM : Input->Mods)` in `LTO::add()`, how are we distinguishing `BM.getModuleIdentifier()` when there are multiple modules per file?
> > > > I assume we don't expect multiple modules with a summary in the same input file, but in this case we should check here that the module has been inserted and error otherwise.
> > > `BM.getModuleIdentifier()` will be the same for each `BitcodeModule` in the same input file by construction, so we can just test the return value of `insert` here.
> > > 
> > > (I feel obliged to point out that this is yet another scenario where (path, byte slice) pairs would work better as a representation of ThinLTO modules.)
> > What is the other scenario? I don't remember what was specific to ThinLTO for this?
> > 
> > Also a pair `path/byte slice` doesn't fit the API in general, does it? We don't necessarily have a "file".
> > That said, instead of path the identifier could be a `<BufferID, offset>`, with `BufferID` intended to be a unique identifier provided by the client of the API per-buffer. I'm not sure how deep we'd have to thread this though. I think that as long as when we create a llvm::Module the identifier is consistent with whatever is used here, everything should be OK.
> > 
> > What is the other scenario? I don't remember what was specific to ThinLTO for this?
> 
> The main other scenario is (non-thin) archive files, which is already a problem as we don't handle that correctly in the distributed case, and we have workarounds in both the gold plugin and lld to give `-save-temps` temporaries appropriate names. This is ThinLTO-specific because we don't create temporaries for each input file under regular LTO.
> 
> > Also a pair path/byte slice doesn't fit the API in general, does it? We don't necessarily have a "file".
> 
> True, but it's also the case that we don't necessarily have a "file" now and we're already passing paths around. I imagine that clients which don't use files could just make something up for the offsets as they would make something up for file paths.
> 
> > That said, instead of path the identifier could be a <BufferID, offset>, with BufferID intended to be a unique identifier provided by the client of the API per-buffer.
> I think that could work as well.
In the C API (and thus in ld64), the expectation is that every input buffer has to be provided with a unique id. It happens that this ID is usually the path on file (with `libFoo.a(member.o)` when a static archive is involved). 
Because the linker is supplying one member at a time, the unique ID in the API is enough. 

Having multiple modules per file is new though and would break this "unique ID" provided by the linker, we could suffix it when loading each individual bitcode/module though.

The --save-temps is always using integer right now I believe (files are 1.thin.o, 2.thin.o, etc.).

> True, but it's also the case that we don't necessarily have a "file" now and we're already passing paths around.

That's just us being inconsistent, I tried to express "ModuleID" instead of path as much as possible.

Repository:
  rL LLVM

https://reviews.llvm.org/D27313