[PATCH] D12205: LTO: Simplify merged module ownership.

Peter Collingbourne via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 20 22:04:24 PDT 2015


On Thu, Aug 20, 2015 at 05:25:32PM -0700, Duncan P. N. Exon Smith wrote:
> > --- tools/llvm-lto/llvm-lto.cpp
> > +++ tools/llvm-lto/llvm-lto.cpp
> > @@ -207,26 +207,24 @@
> >        return 1;
> >      }
> >  
> > -    LTOModule *LTOMod = Module.get();
> > -
> > -    // We use the first input module as the destination module when
> > -    // SetMergedModule is true.
> > -    if (SetMergedModule && i == BaseArg) {
> > -      // Transfer ownership to the code generator.
> > -      CodeGen.setModule(Module.release());
> > -    } else if (!CodeGen.addModule(Module.get()))
> > -      return 1;
> > -
> > -    unsigned NumSyms = LTOMod->getSymbolCount();
> > +    unsigned NumSyms = Module->getSymbolCount();
> >      for (unsigned I = 0; I < NumSyms; ++I) {
> > -      StringRef Name = LTOMod->getSymbolName(I);
> > +      StringRef Name = Module->getSymbolName(I);
> >        if (!DSOSymbolsSet.count(Name))
> >          continue;
> > -      lto_symbol_attributes Attrs = LTOMod->getSymbolAttributes(I);
> > +      lto_symbol_attributes Attrs = Module->getSymbolAttributes(I);
> >        unsigned Scope = Attrs & LTO_SYMBOL_SCOPE_MASK;
> >        if (Scope != LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN)
> >          KeptDSOSyms.push_back(Name);
> >      }
> > +
> > +    // We use the first input module as the destination module when
> > +    // SetMergedModule is true.
> > +    if (SetMergedModule && i == BaseArg) {
> > +      // Transfer ownership to the code generator.
> > +      CodeGen.setModule(std::move(Module));
> > +    } else if (!CodeGen.addModule(Module.get()))
> > +      return 1;
> 
> Why have you moved this code?

This was needed because of the ownership change -- see below.

> >  LTOCodeGenerator::~LTOCodeGenerator() {
> > -  destroyMergedModule();
> > -
> 
> Could the ordering between tearing down the `LTOModule` and the
> `TargetMach` matter?

>From a quick scan I couldn't see anything inside any derived class of
TargetMachine that preserves a reference to anything inside the Module,
so I don't think it should matter.

>  I suggest moving `TargetMach` to a
> `std::unique_ptr<>` first in a separate commit (looks like it would be
> easy?), and placing it such that the ordering won't change.

r245670. (Note that this wouldn't change the order anyway, as the members
would be destroyed after the destructor is called.)

> Seems like you could through the CodegenOptions into a unique_ptr as
> well (using a custom deleter that calls `free()`).  Might be nice to
> clean that up as well (also in a separate commit).

Or I could use std::vector<std::string> and build the char * vector
on demand, which seems like the simplest approach. r245671.

> > +
> > +  // Finally destroy the module as we no longer need it
> > +  OwnedModule.reset();
> 
> Yes we do!  API users can call `lto_codegen_write_merged_modules()`
> after this.

So they can. Hrm.

So the problem I have is that in order to implement parallel LTO code
generation efficiently, I would like all code generation in this class (and
the gold plugin) to go through a function whose signature is something like

void codegen(std::unique_ptr<Module> M, std::vector<ostream> outputs, x foo, y bar, z baz, ...);

so that the codegen function is free to mangle M arbitrarily then
destroy it.  (This would be useful if, say, the work is partitioned
among multiple modules; M could be reused by stripping it of all but the
contents of a single partition.) This would prevent clients from calling
lto_codegen_write_merged_modules() and getting what they want even if we
did change the function to not take ownership of the module.

I suppose that I could make this function's signature look like this:

std::unique_ptr<Module> codegen(std::unique_ptr<Module> M, std::vector<ostream> outputs, x foo, y bar, z baz, ...);

where the result is equal to M if the parallelism level is 1, or
std::unique_ptr<Module>() otherwise. The code in LTOCodeGenerator
would then look like this:

OwnedModule = codegen(std::move(OwnedModule), ...);

This would mean that clients would need to change to doing something like
this if they want to save IR before parallel CG:

lto_codegen_set_module();
lto_codegen_add_module();
lto_codegen_add_module();
lto_codegen_optimize();
lto_codegen_write_merged_modules();
lto_codegen_compile_optimized();

but the semantics for non-parallel CG are preserved.

> 
> >  
> >    return true;
> >  }
> > Index: include/llvm/Linker/Linker.h
> > ===================================================================
> > --- include/llvm/Linker/Linker.h
> > +++ include/llvm/Linker/Linker.h
> > @@ -62,7 +62,6 @@
> >  
> >    Linker(Module *M, DiagnosticHandlerFunction DiagnosticHandler);
> >    Linker(Module *M);
> > -  ~Linker();
> 
> This seems unrelated?

r245672.

> > --- include/llvm-c/lto.h
> > +++ include/llvm-c/lto.h
> > @@ -40,7 +40,7 @@
> >   * @{
> >   */
> >  
> > -#define LTO_API_VERSION 17
> > +#define LTO_API_VERSION 18
> 
> There's no new API, so the version shouldn't be changing.
> 
> >  
> >  /**
> >   * \since prior to LTO_API_VERSION=3
> > @@ -374,8 +374,7 @@
> >  lto_codegen_add_module(lto_code_gen_t cg, lto_module_t mod);
> >  
> >  /**
> > - * Sets the object module for code generation. This will transfer the ownship of
> > - * the module to code generator.
> > + * Sets the object module for code generation. This will destroy the module.
> 
> Was this poorly documented, or is your patch intended to change the
> semantics of this call?  The latter would be a problem.

The latter. How important is it that we preserve these semantics? It looks
like this function was introduced by Manman in r230290, so presumably this
is being used by some version of ld64, but I cannot see any uses of this
function in ld64-242.

How difficult would it be to change ld64 to not rely on the module staying
around? There would be no real ownership changes as such, so I think all
you would need to do is make sure that the linker does not touch the module
after calling lto_codegen_set_module.

The only other linker I am aware of that uses libLTO is Sony's. Yunzhong,
would this change affect you significantly?

There are other things we can do if we absolutely have to preseve these
semantics, but it would add complexity to LTOCodeGenerator.

Thanks,
-- 
Peter


More information about the llvm-commits mailing list