[PATCH] D12205: LTO: Simplify merged module ownership.

Mehdi Amini via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 21 11:17:09 PDT 2015


> On Aug 21, 2015, at 10:43 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
> 
>> On 2015-Aug-21, at 10:30, Mehdi Amini <mehdi.amini at apple.com> wrote:
>> 
>> 
>>> On Aug 20, 2015, at 5:25 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>> 
>>> 
>>>> On 2015-Aug-20, at 11:59, Peter Collingbourne <peter at pcc.me.uk> wrote:
>>>> 
>>>> pcc created this revision.
>>>> pcc added reviewers: joker.eph, dexonsmith.
>>>> pcc added a subscriber: llvm-commits.
>>>> 
>>>> This change moves LTOCodeGenerator's ownership of the merged module to a
>>>> field of type std::unique_ptr<Module>. This helps simplify parts of the code
>>>> and clears the way for the module to be consumed by LLVM CodeGen (see D12132
>>>> review comments).
>>>> 
>>>> This changes the C API semantics (lto_codegen_set_module now destroys its
>>>> argument) so I have incremented the API version.
>>>> 
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D12205&d=BQIFaQ&c=eEvniauFctOgLOKGJOplqw&r=vftFLnHiqThJHdL0qZWd_Vo12qdMVOZDFnNVBhP9GKA&m=_zVM4BoXx0GRxMplFnNx5FofEf2wEsAaofSQfEkSslM&s=ojW34xkzXBCPaubA1Wka6wzn-R8zXhkqa_uiKK89e4k&e= 
>>>> 
>>>> Files:
>>>> include/llvm-c/lto.h
>>>> include/llvm/LTO/LTOCodeGenerator.h
>>>> include/llvm/LTO/LTOModule.h
>>>> include/llvm/Linker/Linker.h
>>>> lib/LTO/LTOCodeGenerator.cpp
>>>> lib/Linker/LinkModules.cpp
>>>> tools/llvm-lto/llvm-lto.cpp
>>>> tools/lto/lto.cpp
>>>> 
>>>> <D12205.32721.patch>
>>> 
>>> This is a great idea, thanks.  A few problems/inconsistencies documented
>>> inline.
>>> 
>>>> Index: tools/lto/lto.cpp
>>>> ===================================================================
>>>> --- tools/lto/lto.cpp
>>>> +++ tools/lto/lto.cpp
>>>> @@ -260,7 +260,7 @@
>>>> }
>>>> 
>>>> void lto_codegen_set_module(lto_code_gen_t cg, lto_module_t mod) {
>>>> -  unwrap(cg)->setModule(unwrap(mod));
>>>> +  unwrap(cg)->setModule(std::unique_ptr<LTOModule>(unwrap(mod)));
>>>> }
>>>> 
>>>> bool lto_codegen_set_debug_model(lto_code_gen_t cg, lto_debug_model debug) {
>>>> Index: tools/llvm-lto/llvm-lto.cpp
>>>> ===================================================================
>>>> --- tools/llvm-lto/llvm-lto.cpp
>>>> +++ tools/llvm-lto/llvm-lto.cpp
>>>> @@ -207,26 +207,24 @@
>>>>     return 1;
>>>>   }
>>>> 
>>>> -    LTOModule *LTOMod = Module.get();
>>>> -
>>>> -    // We use the first input module as the destination module when
>>>> -    // SetMergedModule is true.
>>>> -    if (SetMergedModule && i == BaseArg) {
>>>> -      // Transfer ownership to the code generator.
>>>> -      CodeGen.setModule(Module.release());
>>>> -    } else if (!CodeGen.addModule(Module.get()))
>>>> -      return 1;
>>>> -
>>>> -    unsigned NumSyms = LTOMod->getSymbolCount();
>>>> +    unsigned NumSyms = Module->getSymbolCount();
>>>>   for (unsigned I = 0; I < NumSyms; ++I) {
>>>> -      StringRef Name = LTOMod->getSymbolName(I);
>>>> +      StringRef Name = Module->getSymbolName(I);
>>>>     if (!DSOSymbolsSet.count(Name))
>>>>       continue;
>>>> -      lto_symbol_attributes Attrs = LTOMod->getSymbolAttributes(I);
>>>> +      lto_symbol_attributes Attrs = Module->getSymbolAttributes(I);
>>>>     unsigned Scope = Attrs & LTO_SYMBOL_SCOPE_MASK;
>>>>     if (Scope != LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN)
>>>>       KeptDSOSyms.push_back(Name);
>>>>   }
>>>> +
>>>> +    // We use the first input module as the destination module when
>>>> +    // SetMergedModule is true.
>>>> +    if (SetMergedModule && i == BaseArg) {
>>>> +      // Transfer ownership to the code generator.
>>>> +      CodeGen.setModule(std::move(Module));
>>>> +    } else if (!CodeGen.addModule(Module.get()))
>>>> +      return 1;
>>> 
>>> Why have you moved this code?
>>> 
>>>> }
>>>> 
>>>> // Add all the exported symbols to the table of symbols to preserve.
>>>> Index: lib/Linker/LinkModules.cpp
>>>> ===================================================================
>>>> --- lib/Linker/LinkModules.cpp
>>>> +++ lib/Linker/LinkModules.cpp
>>>> @@ -1754,9 +1754,6 @@
>>>> });
>>>> }
>>>> 
>>>> -Linker::~Linker() {
>>>> -}
>>>> -
>>> 
>>> This seems independent.  You should do this separately.
>>> 
>>>> void Linker::deleteModule() {
>>>> delete Composite;
>>>> Composite = nullptr;
>>>> Index: lib/LTO/LTOCodeGenerator.cpp
>>>> ===================================================================
>>>> --- lib/LTO/LTOCodeGenerator.cpp
>>>> +++ lib/LTO/LTOCodeGenerator.cpp
>>>> @@ -64,29 +64,20 @@
>>>> }
>>>> 
>>>> LTOCodeGenerator::LTOCodeGenerator()
>>>> -    : Context(getGlobalContext()), IRLinker(new Module("ld-temp.o", Context)) {
>>>> +    : Context(getGlobalContext()),
>>>> +      OwnedModule(new Module("ld-temp.o", Context)),
>>>> +      IRLinker(OwnedModule.get()) {
>>>> initializeLTOPasses();
>>>> }
>>>> 
>>>> LTOCodeGenerator::LTOCodeGenerator(std::unique_ptr<LLVMContext> Context)
>>>>   : OwnedContext(std::move(Context)), Context(*OwnedContext),
>>>> -      IRLinker(new Module("ld-temp.o", *OwnedContext)) {
>>>> +      OwnedModule(new Module("ld-temp.o", *OwnedContext)),
>>>> +      IRLinker(OwnedModule.get()) {
>>>> initializeLTOPasses();
>>>> }
>>>> 
>>>> -void LTOCodeGenerator::destroyMergedModule() {
>>>> -  if (OwnedModule) {
>>>> -    assert(IRLinker.getModule() == &OwnedModule->getModule() &&
>>>> -           "The linker's module should be the same as the owned module");
>>>> -    delete OwnedModule;
>>>> -    OwnedModule = nullptr;
>>>> -  } else if (IRLinker.getModule())
>>>> -    IRLinker.deleteModule();
>>>> -}
>>>> -
>>>> LTOCodeGenerator::~LTOCodeGenerator() {
>>>> -  destroyMergedModule();
>>>> -
>>> 
>>> Could the ordering between tearing down the `LTOModule` and the
>>> `TargetMach` matter?  I suggest moving `TargetMach` to a
>>> `std::unique_ptr<>` first in a separate commit (looks like it would be
>>> easy?), and placing it such that the ordering won't change.
>>> 
>>> Seems like you could through the CodegenOptions into a unique_ptr as
>>> well (using a custom deleter that calls `free()`).  Might be nice to
>>> clean that up as well (also in a separate commit).
>>> 
>>>> delete TargetMach;
>>>> TargetMach = nullptr;
>>>> 
>>>> @@ -139,16 +130,14 @@
>>>> return !ret;
>>>> }
>>>> 
>>>> -void LTOCodeGenerator::setModule(LTOModule *Mod) {
>>>> +void LTOCodeGenerator::setModule(std::unique_ptr<LTOModule> Mod) {
>>>> assert(&Mod->getModule().getContext() == &Context &&
>>>>        "Expected module in same context");
>>>> 
>>>> -  // Delete the old merged module.
>>>> -  destroyMergedModule();
>>>> AsmUndefinedRefs.clear();
>>>> 
>>>> -  OwnedModule = Mod;
>>>> -  IRLinker.setModule(&Mod->getModule());
>>>> +  OwnedModule = Mod->takeModule();
>>>> +  IRLinker.setModule(OwnedModule.get());
>>>> 
>>>> const std::vector<const char*> &Undefs = Mod->getAsmUndefinedRefs();
>>>> for (int I = 0, E = Undefs.size(); I != E; ++I)
>>>> @@ -551,8 +540,6 @@
>>>> if (!this->determineTarget(errMsg))
>>>>   return false;
>>>> 
>>>> -  Module *mergedModule = IRLinker.getModule();
>>>> -
>>>> legacy::PassManager codeGenPasses;
>>>> 
>>>> // If the bitcode files contain ARC code and were compiled with optimization,
>>>> @@ -566,7 +553,10 @@
>>>> }
>>>> 
>>>> // Run the code generator, and write assembly file
>>>> -  codeGenPasses.run(*mergedModule);
>>>> +  codeGenPasses.run(*OwnedModule.get());
>>> 
>>> You don't need `.get()` here.
>>> 
>>>> +
>>>> +  // Finally destroy the module as we no longer need it
>>>> +  OwnedModule.reset();
>>> 
>>> Yes we do!  API users can call `lto_codegen_write_merged_modules()`
>>> after this.
>>> 
>>>> 
>>>> return true;
>>>> }
>>>> Index: include/llvm/Linker/Linker.h
>>>> ===================================================================
>>>> --- include/llvm/Linker/Linker.h
>>>> +++ include/llvm/Linker/Linker.h
>>>> @@ -62,7 +62,6 @@
>>>> 
>>>> Linker(Module *M, DiagnosticHandlerFunction DiagnosticHandler);
>>>> Linker(Module *M);
>>>> -  ~Linker();
>>> 
>>> This seems unrelated?
>>> 
>>>> 
>>>> Module *getModule() const { return Composite; }
>>>> void deleteModule();
>>>> Index: include/llvm/LTO/LTOModule.h
>>>> ===================================================================
>>>> --- include/llvm/LTO/LTOModule.h
>>>> +++ include/llvm/LTO/LTOModule.h
>>>> @@ -113,6 +113,10 @@
>>>>   return IRFile->getModule();
>>>> }
>>>> 
>>>> +  std::unique_ptr<Module> takeModule() {
>>>> +    return IRFile->takeModule();
>>>> +  }
>>>> +
>>> 
>>> I think `clang-format` would prefer this on a single line.
>>> 
>>>> /// Return the Module's target triple.
>>>> const std::string &getTargetTriple() {
>>>>   return getModule().getTargetTriple();
>>>> Index: include/llvm/LTO/LTOCodeGenerator.h
>>>> ===================================================================
>>>> --- include/llvm/LTO/LTOCodeGenerator.h
>>>> +++ include/llvm/LTO/LTOCodeGenerator.h
>>>> @@ -69,7 +69,7 @@
>>>> bool addModule(struct LTOModule *);
>>> 
>>> It's a shame we can't make this take ownership as well.  Alas.
>>> 
>>>> 
>>>> // Set the destination module.
>>>> -  void setModule(struct LTOModule *);
>>>> +  void setModule(std::unique_ptr<LTOModule> M);
>>>> 
>>>> void setTargetOptions(TargetOptions options);
>>>> void setDebugInfo(lto_debug_model);
>>>> @@ -155,9 +155,9 @@
>>>> 
>>>> typedef StringMap<uint8_t> StringSet;
>>>> 
>>>> -  void destroyMergedModule();
>>>> std::unique_ptr<LLVMContext> OwnedContext;
>>>> LLVMContext &Context;
>>>> +  std::unique_ptr<Module> OwnedModule;
>>>> Linker IRLinker;
>>>> TargetMachine *TargetMach = nullptr;
>>>> bool EmitDwarfDebugInfo = false;
>>>> @@ -173,7 +173,6 @@
>>>> unsigned OptLevel = 2;
>>>> lto_diagnostic_handler_t DiagHandler = nullptr;
>>>> void *DiagContext = nullptr;
>>>> -  LTOModule *OwnedModule = nullptr;
>>>> bool ShouldInternalize = true;
>>>> bool ShouldEmbedUselists = false;
>>>> };
>>>> Index: include/llvm-c/lto.h
>>>> ===================================================================
>>>> --- include/llvm-c/lto.h
>>>> +++ include/llvm-c/lto.h
>>>> @@ -40,7 +40,7 @@
>>>> * @{
>>>> */
>>>> 
>>>> -#define LTO_API_VERSION 17
>>>> +#define LTO_API_VERSION 18
>>> 
>>> There's no new API, so the version shouldn't be changing.
>>> 
>>>> 
>>>> /**
>>>> * \since prior to LTO_API_VERSION=3
>>>> @@ -374,8 +374,7 @@
>>>> lto_codegen_add_module(lto_code_gen_t cg, lto_module_t mod);
>>>> 
>>>> /**
>>>> - * Sets the object module for code generation. This will transfer the ownship of
>>>> - * the module to code generator.
>>>> + * Sets the object module for code generation. This will destroy the module.
>>> 
>>> Was this poorly documented, or is your patch intended to change the
>>> semantics of this call?  The latter would be a problem.
>> 
>> It seems to me that if you transfer ownership of an object to another entity, you can’t assume anything about a prior reference you had to the object.
>> If an API client assumes an implementation behavior that is not documented at the API level, it can’t really complain when it breaks. Just like your code that invokes UB and worked for year will break with the next compiler update.
> 
> Regardless of how it gets documented (or whether it was wrong to implement
> the API this way), ld64 relies on lto_dispose_module() not crashing after
> lto_codegen_set_module().

I’m not saying it is wrong to implement the API this way.
I’m saying that it sounds to me like ld64 relies on (kind of) UB and that you consider a problem changing the behavior of a UB.

Looking at the API, LTOCodeGenerator destroy the Owned LTOModule in its destructor. How isn't there a double-free when calling lto_dispose_module()?

Looking at the ld64 source code, it clearly says that it won’t dispose the Module with API 13 and above because the ownership is transferred.

— 
Mehdi



More information about the llvm-commits mailing list