[PATCH] D27313: LTO: Add support for multi-module bitcode files.

Tue Dec 13 17:16:06 PST 2016

pcc added inline comments.

================
Comment at: clang/lib/CodeGen/BackendUtil.cpp:783
+      if (HasSummary && *HasSummary) {
+        ModuleMap.insert({I.first(), BM});
+        FoundModule = true;
----------------
mehdi_amini wrote:
> What if `I.first` was already present in the map? It seems the same case as below for ThinLTO where we should check that the returned value of `insert().second` .
That doesn't seem like it can happen. The range for loop starting on line 758 enumerates elements of a map, so each `I.first()` will be unique. We also break out of this loop as soon as we see a module with a summary.

================
Comment at: llvm/lib/LTO/LTO.cpp:456

-  ThinLTO.ModuleMap[MBRef.getBufferIdentifier()] = MBRef;
+  ThinLTO.ModuleMap.insert({BM.getModuleIdentifier(), BM});
   return Error::success();
----------------
mehdi_amini wrote:
> pcc wrote:
> > mehdi_amini wrote:
> > > We'll arrive here from ` for (InputFile::InputModule &IM : Input->Mods)` in `LTO::add()`, how are we distinguishing `BM.getModuleIdentifier()` when there are multiple modules per file?
> > > I assume we don't expect multiple modules with a summary in the same input file, but in this case we should check here that the module has been inserted and error otherwise.
> > `BM.getModuleIdentifier()` will be the same for each `BitcodeModule` in the same input file by construction, so we can just test the return value of `insert` here.
> > 
> > (I feel obliged to point out that this is yet another scenario where (path, byte slice) pairs would work better as a representation of ThinLTO modules.)
> What is the other scenario? I don't remember what was specific to ThinLTO for this?
> 
> Also a pair `path/byte slice` doesn't fit the API in general, does it? We don't necessarily have a "file".
> That said, instead of path the identifier could be a `<BufferID, offset>`, with `BufferID` intended to be a unique identifier provided by the client of the API per-buffer. I'm not sure how deep we'd have to thread this though. I think that as long as when we create a llvm::Module the identifier is consistent with whatever is used here, everything should be OK.
> 
> What is the other scenario? I don't remember what was specific to ThinLTO for this?

The main other scenario is (non-thin) archive files, which is already a problem as we don't handle that correctly in the distributed case, and we have workarounds in both the gold plugin and lld to give `-save-temps` temporaries appropriate names. This is ThinLTO-specific because we don't create temporaries for each input file under regular LTO.

> Also a pair path/byte slice doesn't fit the API in general, does it? We don't necessarily have a "file".

True, but it's also the case that we don't necessarily have a "file" now and we're already passing paths around. I imagine that clients which don't use files could just make something up for the offsets as they would make something up for file paths.

> That said, instead of path the identifier could be a <BufferID, offset>, with BufferID intended to be a unique identifier provided by the client of the API per-buffer.
I think that could work as well.

https://reviews.llvm.org/D27313