[LLVMdev] llvm::Linker incorrectly fails to link in all aspects of the source module

DeadMG wolfeinstein at gmail.com
Sun Jul 19 03:48:35 PDT 2015


I've got some code using the LLVM linker. When I link one module into
another, the linker fails to correctly represent all the aspects of the
source module. Specifically, I've observed that types whch are structurally
equivalent get merged together, even though they're explicitly named types
and not unnamed structural types.

Here's my reproducing case. I have the source and the output IR.

#pragma warning(push, 0)
#include <llvm/ExecutionEngine/GenericValue.h>
#include <llvm/ExecutionEngine/MCJIT.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
#include <llvm/Support/Program.h>
#include <llvm/Support/FileSystem.h>
#include <llvm/Support/DynamicLibrary.h>
#include <llvm/IR/Verifier.h>
#include <llvm/IR/Type.h>
#include <llvm/IR/DerivedTypes.h>
#include <llvm/IR/IRBuilder.h>
#include <llvm/Transforms/Utils/Cloning.h>
#include <llvm/Linker/Linker.h>
#include <llvm/Support/raw_ostream.h>
#pragma warning(pop)

std::string printModule(llvm::Module& module) {
    std::string mod_ir;
    llvm::raw_string_ostream stream(mod_ir);
    module.print(stream, nullptr);
    stream.flush();
    return mod_ir;
}
int main() {
    llvm::LLVMContext con;
    llvm::Module src("in", con);
    llvm::Module dest("out", con);
    auto srcb1 = llvm::StructType::create(con, std::vector<llvm::Type*>{
llvm::PointerType::getInt8PtrTy(con) }, "srcb1");
    auto srcb2 = llvm::StructType::create(con, std::vector<llvm::Type*>{
llvm::PointerType::getInt8PtrTy(con) }, "srcb2");
    auto srcty = llvm::StructType::create(con, std::vector<llvm::Type*>{
srcb1, srcb2 }, "srcty");
    auto func = llvm::Function::Create(llvm::FunctionType::get(srcty, {},
false), llvm::GlobalValue::LinkageTypes::ExternalLinkage, "srcfunc", &src);
    llvm::BasicBlock* entries =
llvm::BasicBlock::Create(func->getParent()->getContext(), "entry", func);
    llvm::IRBuilder<> allocabuilder(entries);
    auto insert =
allocabuilder.CreateInsertValue(llvm::ConstantAggregateZero::get(srcty),
llvm::ConstantAggregateZero::get(srcb1), { 0 });
    allocabuilder.CreateRet(insert);

    auto before = printModule(src);
    auto clone = std::unique_ptr<llvm::Module>(llvm::CloneModule(&src));
    llvm::Linker::LinkModules(&dest, clone.get());
    auto after = printModule(dest);
    if (before != after)
        __debugbreak();
}

// Before:

; ModuleID = 'in'

%srcty = type { %srcb1, %srcb2 }
%srcb1 = type { i8* }
%srcb2 = type { i8* }

define %srcty @srcfunc() {
entry:
  ret %srcty zeroinitializer
}

// After:

; ModuleID = 'out'

%srcty = type { %srcb1, %srcb1 }
%srcb1 = type { i8* }

define %srcty @srcfunc() {
entry:
  ret %srcty zeroinitializer
}

You can see in before and after that the two structurally equivalent but
distinct named types, srcb1 and srcb2, were merged. After a bit of
discussion on #llvm, it was suggested that this is intended behaviour. If
so, this is terribly broken.

For one thing, my code depends on looking up types from the module by name.
So far it just so happens that I don't have any test cases that look up
structurally equivalent types after linking by name, but it certainly could
occur for some user inputs for my compiler.

Secondly, it's much more difficult for me to determine what is going on in
this IR. In my compiler then I strictly generate one LLVM type for various
types in the source code. If the compiler is broken for any reason, and I
look at the IR output, then I expect to see this. If I don't see this, then
I think the compiler is broken. I just spent three days trying to figure
out why on earth my compiler was not generating the types correctly, when
it was all along. And it's much more difficult to interpret the outcome
when the IR no longer distinguishes between the two logically completely
distinct types that just happen to have the same IR representation.

Fundamentally, LLVM should never mutate the contents of the module unless
it's explicitly requested, because the programmer depends on properties of
the IR that are more than just binary equivalence. Moving the contents of
one module into another module is no exception.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150719/c57a3012/attachment.html>


More information about the llvm-dev mailing list