[PATCH] D27313: LTO: Add support for multi-module bitcode files.

Peter Collingbourne via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 13 17:52:00 PST 2016


pcc added a subscriber: davide.
pcc added inline comments.


================
Comment at: llvm/lib/LTO/LTO.cpp:456
 
-  ThinLTO.ModuleMap[MBRef.getBufferIdentifier()] = MBRef;
+  ThinLTO.ModuleMap.insert({BM.getModuleIdentifier(), BM});
   return Error::success();
----------------
mehdi_amini wrote:
> pcc wrote:
> > mehdi_amini wrote:
> > > pcc wrote:
> > > > mehdi_amini wrote:
> > > > > We'll arrive here from ` for (InputFile::InputModule &IM : Input->Mods)` in `LTO::add()`, how are we distinguishing `BM.getModuleIdentifier()` when there are multiple modules per file?
> > > > > I assume we don't expect multiple modules with a summary in the same input file, but in this case we should check here that the module has been inserted and error otherwise.
> > > > `BM.getModuleIdentifier()` will be the same for each `BitcodeModule` in the same input file by construction, so we can just test the return value of `insert` here.
> > > > 
> > > > (I feel obliged to point out that this is yet another scenario where (path, byte slice) pairs would work better as a representation of ThinLTO modules.)
> > > What is the other scenario? I don't remember what was specific to ThinLTO for this?
> > > 
> > > Also a pair `path/byte slice` doesn't fit the API in general, does it? We don't necessarily have a "file".
> > > That said, instead of path the identifier could be a `<BufferID, offset>`, with `BufferID` intended to be a unique identifier provided by the client of the API per-buffer. I'm not sure how deep we'd have to thread this though. I think that as long as when we create a llvm::Module the identifier is consistent with whatever is used here, everything should be OK.
> > > 
> > > What is the other scenario? I don't remember what was specific to ThinLTO for this?
> > 
> > The main other scenario is (non-thin) archive files, which is already a problem as we don't handle that correctly in the distributed case, and we have workarounds in both the gold plugin and lld to give `-save-temps` temporaries appropriate names. This is ThinLTO-specific because we don't create temporaries for each input file under regular LTO.
> > 
> > > Also a pair path/byte slice doesn't fit the API in general, does it? We don't necessarily have a "file".
> > 
> > True, but it's also the case that we don't necessarily have a "file" now and we're already passing paths around. I imagine that clients which don't use files could just make something up for the offsets as they would make something up for file paths.
> > 
> > > That said, instead of path the identifier could be a <BufferID, offset>, with BufferID intended to be a unique identifier provided by the client of the API per-buffer.
> > I think that could work as well.
> In the C API (and thus in ld64), the expectation is that every input buffer has to be provided with a unique id. It happens that this ID is usually the path on file (with `libFoo.a(member.o)` when a static archive is involved). 
> Because the linker is supplying one member at a time, the unique ID in the API is enough. 
> 
> Having multiple modules per file is new though and would break this "unique ID" provided by the linker, we could suffix it when loading each individual bitcode/module though.
> 
> The --save-temps is always using integer right now I believe (files are 1.thin.o, 2.thin.o, etc.).
> 
> > True, but it's also the case that we don't necessarily have a "file" now and we're already passing paths around.
> 
> That's just us being inconsistent, I tried to express "ModuleID" instead of path as much as possible.
> Because the linker is supplying one member at a time, the unique ID in the API is enough.

I think @davide had a case where an archive had two members with the same name, and they were only distinguishable by offset.

> The --save-temps is always using integer right now I believe (files are 1.thin.o, 2.thin.o, etc.).

Not always: http://llvm-cs.pcc.me.uk/lib/LTO/LTOBackend.cpp#77
gold and lld both pass `UseInputModulePath == true` here.

We probably want to overhaul how the filenames look for `save-temps`, but I think in the end they should contain the module ID in some form.


Repository:
  rL LLVM

https://reviews.llvm.org/D27313





More information about the llvm-commits mailing list