[PATCH] LTO: introduce object file-based on-disk module format.

Thu Jul 3 18:31:35 PDT 2014

On Thu, Jul 03, 2014 at 10:41:07AM -0400, Rafael Espíndola wrote:
> +  /// Returns 'true' if the bitcode bc is for the specified target triple.
> +  static bool isTargetMatch(StringRef bc, const char *triplePrefix);
> 
> Using a StringRef is a nice independent improvement in here
> (isTargetMatch), please commit it first. Nit: take the opportunity to
> rename the argument names to match the llvm style.

Done, r212305.

>  LTOModule *LTOModule::createFromBuffer(const void *mem, size_t length,
>                                         TargetOptions options,
>                                         std::string &errMsg, StringRef path) {
>    std::unique_ptr<MemoryBuffer> buffer(makeBuffer(mem, length, path));
>    if (!buffer)
>      return nullptr;
> -  return makeLTOModule(buffer.release(), options, errMsg);
> +  return makeLTOModule(StringRef((const char *)mem, length), options, errMsg);
> 
> The buffer is not need in here anymore, is it?

No, I'll remove it.

> -  return makeLTOModule(buffer.release(), options, errMsg);
> +  return makeLTOModule(buffer->getBuffer(), options, errMsg);
> 
> The MemoryBuffer is now destroyed immediately instead of being
> transferred into the produced LTOModule. This would work with the
> current setup, except you also do
> 
> -  m->materializeAllPermanently();
> 
> Which means the IRReader will be pointed to unmapped memory, no?

The code now uses parseBitcodeFile instead of getLazyBitcodeModule, which
as far as I can tell causes the reference to the memory buffer in the
BitcodeReader to be released.

> BTW, the materializeAllPermanently is a very unfortunate but necessary
> thing for the C api. The reason it is there is the way ld64 implements
> an optimization:
> 
> * We want to avoid putting linkonce_odr symbols in the symbol table
> unless their address is significant.
> * The way ld64 implements this is being told for every symbol if it
> can be dropped from the symbol table.
> * To tell ld64 about that we need to look at every use of a symbol.
> * To look at every use of a symbol we have to materialize all function
> bodies since they may contain an use.

I see. I wonder if it would be possible to improve on this by storing def-use
information, maybe in the bitcode or in another section of the object file,
and only materializing the entities that are needed.

> Two higher level questions about this approach:
> 
> * It is effectively a way of doing fat object files. GCC does
> something similar. One small but recurring annoyance I had when
> working with GCC LTO is that it can end up falling back silently to
> non-LTO since tho outer container is ELF, with IR in a section. With
> LLVM right now we get an error, since without our help the linker is
> clueless as to what LLVM IR is.

I can imagine that this could be a problem if all symbol definitions are always
duplicated in the object file, as GCC does (?). But if the symbol definitions
in the object file are normally a strict subset of those that would be emitted
without LTO (trivially true in the case where the object file contains no
allocatable sections, or less trivially if we try to choose functions to emit
directly into the object file which we think would not benefit from LTO),
we should normally detect this with undefined symbol errors at link time.

> Can't you invert this? Put the ELF in
> the IR (or even represent .go_export directly in some metadata) and
> extent the gold plugin API to handle this? 

I decided not to use IR as the outer layer because I think that it would be
useful for external tools to be able to understand the contents of metadata
without needing to understand the ever changing bitcode format.

In the specific case of Go, there exist libraries and tools such as [1]
which understand ELF and can read the metadata in the object files, and I'd
prefer not to have to teach those tools about bitcode.

> Another advantage is that
> it should be easier to support fat COFF and MachO objects too.

The patch is already testing that COFF and Mach-O work. Or did you mean
something else?

> * If we do need to go with IR in ELF instead of ELF in IR, we also
> need to update lib/Object to handle it. We still want to get an
> IRObjectFile (by default at least) when given one of those files so
> that llvm-nm/llvm-ar do the right thing.

Maybe all that is needed is a function that "interprets" SymbolicFiles,
i.e. that takes a SymbolicFile and returns back another SymbolicFile (which
would be an IRObjectFile in the IR-in-ELF case). This looks like it would be
sufficient for llvm-nm/llvm-ar. Later I guess we could extend this function
to return multiple SymbolicFiles if we want to pursue the IR-in-object ==
IR+object idea.

Thanks,
-- 
Peter

[1] http://godoc.org/code.google.com/p/go.tools/cmd/godex