[PATCH] D27313: LTO: Add support for multi-module bitcode files.

Tue Dec 13 18:16:38 PST 2016

pcc added inline comments.

================
Comment at: llvm/lib/LTO/LTO.cpp:456

-  ThinLTO.ModuleMap[MBRef.getBufferIdentifier()] = MBRef;
+  ThinLTO.ModuleMap.insert({BM.getModuleIdentifier(), BM});
   return Error::success();
----------------
mehdi_amini wrote:
> pcc wrote:
> > mehdi_amini wrote:
> > > pcc wrote:
> > > > mehdi_amini wrote:
> > > > > pcc wrote:
> > > > > > mehdi_amini wrote:
> > > > > > > We'll arrive here from ` for (InputFile::InputModule &IM : Input->Mods)` in `LTO::add()`, how are we distinguishing `BM.getModuleIdentifier()` when there are multiple modules per file?
> > > > > > > I assume we don't expect multiple modules with a summary in the same input file, but in this case we should check here that the module has been inserted and error otherwise.
> > > > > > `BM.getModuleIdentifier()` will be the same for each `BitcodeModule` in the same input file by construction, so we can just test the return value of `insert` here.
> > > > > > 
> > > > > > (I feel obliged to point out that this is yet another scenario where (path, byte slice) pairs would work better as a representation of ThinLTO modules.)
> > > > > What is the other scenario? I don't remember what was specific to ThinLTO for this?
> > > > > 
> > > > > Also a pair `path/byte slice` doesn't fit the API in general, does it? We don't necessarily have a "file".
> > > > > That said, instead of path the identifier could be a `<BufferID, offset>`, with `BufferID` intended to be a unique identifier provided by the client of the API per-buffer. I'm not sure how deep we'd have to thread this though. I think that as long as when we create a llvm::Module the identifier is consistent with whatever is used here, everything should be OK.
> > > > > 
> > > > > What is the other scenario? I don't remember what was specific to ThinLTO for this?
> > > > 
> > > > The main other scenario is (non-thin) archive files, which is already a problem as we don't handle that correctly in the distributed case, and we have workarounds in both the gold plugin and lld to give `-save-temps` temporaries appropriate names. This is ThinLTO-specific because we don't create temporaries for each input file under regular LTO.
> > > > 
> > > > > Also a pair path/byte slice doesn't fit the API in general, does it? We don't necessarily have a "file".
> > > > 
> > > > True, but it's also the case that we don't necessarily have a "file" now and we're already passing paths around. I imagine that clients which don't use files could just make something up for the offsets as they would make something up for file paths.
> > > > 
> > > > > That said, instead of path the identifier could be a <BufferID, offset>, with BufferID intended to be a unique identifier provided by the client of the API per-buffer.
> > > > I think that could work as well.
> > > In the C API (and thus in ld64), the expectation is that every input buffer has to be provided with a unique id. It happens that this ID is usually the path on file (with `libFoo.a(member.o)` when a static archive is involved). 
> > > Because the linker is supplying one member at a time, the unique ID in the API is enough. 
> > > 
> > > Having multiple modules per file is new though and would break this "unique ID" provided by the linker, we could suffix it when loading each individual bitcode/module though.
> > > 
> > > The --save-temps is always using integer right now I believe (files are 1.thin.o, 2.thin.o, etc.).
> > > 
> > > > True, but it's also the case that we don't necessarily have a "file" now and we're already passing paths around.
> > > 
> > > That's just us being inconsistent, I tried to express "ModuleID" instead of path as much as possible.
> > > Because the linker is supplying one member at a time, the unique ID in the API is enough.
> > 
> > I think @davide had a case where an archive had two members with the same name, and they were only distinguishable by offset.
> > 
> > > The --save-temps is always using integer right now I believe (files are 1.thin.o, 2.thin.o, etc.).
> > 
> > Not always: http://llvm-cs.pcc.me.uk/lib/LTO/LTOBackend.cpp#77
> > gold and lld both pass `UseInputModulePath == true` here.
> > 
> > We probably want to overhaul how the filenames look for `save-temps`, but I think in the end they should contain the module ID in some form.
> > I think @davide had a case where an archive had two members with the same name, and they were only distinguishable by offset
> 
> I remember this, I wouldn't be too much concerned *here* with this and push to the client the responsibility to provide unique IDs. (Nothing prevent the linker to build an id that always include the offset for instance)
Well, in the distributed case some component needs to understand the module IDs in order to distribute the work properly. Part of that may involve translating the module IDs into "file paths" (which may be actual file paths or conceptual ones). If we can arrange to use the same ("file path", offset) scheme in all linkers, that component can be shared between linkers.

But to a certain extent this is all hypothetical, I think I'd want to prototype before being sure that this is the right design.

Repository:
  rL LLVM

https://reviews.llvm.org/D27313