[LLVMdev] llvm::Linker incorrectly fails to link in all aspects of the source module
Jason Koenig
jrkoenig at google.com
Mon Jul 20 10:25:39 PDT 2015
>From http://blog.llvm.org/2011/11/llvm-30-type-system-rewrite.html:
As in LLVM 2.9, type names are not really designed to be used as semantic
> information in IR: we expect everything to continue working if the -strip
> pass is used to remove all extraneous names from the IR. However, for
> research and other purposes, it can sometimes be a convenient hack to
> propagate information from a front-end into LLVM IR by using type names.
> This will work reliably in LLVM 3.0 (so long as you don't run the strip
> pass or something equivalent) because identified types aren't uniqued.
> However, be aware that the suffix can be added and write your code to
> tolerate it.
> A more robust way to be able to identify a specific type in the optimizer
> (or some other point after the frontend has run) is to use a named metadata
> node to find the type. For example, if you want to find the %foo type, you
> could generate IR that looks like this:
> %foo = type { ... }
> ...
> !magic.types = !{ %foo zeroinitializer }
> Then to find the "foo" type, you'd just look up the "magic.types" named
> metadata, and get the type of the first element. Even if type names are
> stripped or types get auto-renamed, the type of the first element will
> always be correct and stable.
-Jason Koenig
On Sun, Jul 19, 2015 at 3:48 AM, DeadMG <wolfeinstein at gmail.com> wrote:
> I've got some code using the LLVM linker. When I link one module into
> another, the linker fails to correctly represent all the aspects of the
> source module. Specifically, I've observed that types whch are structurally
> equivalent get merged together, even though they're explicitly named types
> and not unnamed structural types.
>
> Here's my reproducing case. I have the source and the output IR.
>
> #pragma warning(push, 0)
> #include <llvm/ExecutionEngine/GenericValue.h>
> #include <llvm/ExecutionEngine/MCJIT.h>
> #include <llvm/ExecutionEngine/ExecutionEngine.h>
> #include <llvm/Support/Program.h>
> #include <llvm/Support/FileSystem.h>
> #include <llvm/Support/DynamicLibrary.h>
> #include <llvm/IR/Verifier.h>
> #include <llvm/IR/Type.h>
> #include <llvm/IR/DerivedTypes.h>
> #include <llvm/IR/IRBuilder.h>
> #include <llvm/Transforms/Utils/Cloning.h>
> #include <llvm/Linker/Linker.h>
> #include <llvm/Support/raw_ostream.h>
> #pragma warning(pop)
>
> std::string printModule(llvm::Module& module) {
> std::string mod_ir;
> llvm::raw_string_ostream stream(mod_ir);
> module.print(stream, nullptr);
> stream.flush();
> return mod_ir;
> }
> int main() {
> llvm::LLVMContext con;
> llvm::Module src("in", con);
> llvm::Module dest("out", con);
> auto srcb1 = llvm::StructType::create(con, std::vector<llvm::Type*>{
> llvm::PointerType::getInt8PtrTy(con) }, "srcb1");
> auto srcb2 = llvm::StructType::create(con, std::vector<llvm::Type*>{
> llvm::PointerType::getInt8PtrTy(con) }, "srcb2");
> auto srcty = llvm::StructType::create(con, std::vector<llvm::Type*>{
> srcb1, srcb2 }, "srcty");
> auto func = llvm::Function::Create(llvm::FunctionType::get(srcty, {},
> false), llvm::GlobalValue::LinkageTypes::ExternalLinkage, "srcfunc", &src);
> llvm::BasicBlock* entries =
> llvm::BasicBlock::Create(func->getParent()->getContext(), "entry", func);
> llvm::IRBuilder<> allocabuilder(entries);
> auto insert =
> allocabuilder.CreateInsertValue(llvm::ConstantAggregateZero::get(srcty),
> llvm::ConstantAggregateZero::get(srcb1), { 0 });
> allocabuilder.CreateRet(insert);
>
> auto before = printModule(src);
> auto clone = std::unique_ptr<llvm::Module>(llvm::CloneModule(&src));
> llvm::Linker::LinkModules(&dest, clone.get());
> auto after = printModule(dest);
> if (before != after)
> __debugbreak();
> }
>
> // Before:
>
> ; ModuleID = 'in'
>
> %srcty = type { %srcb1, %srcb2 }
> %srcb1 = type { i8* }
> %srcb2 = type { i8* }
>
> define %srcty @srcfunc() {
> entry:
> ret %srcty zeroinitializer
> }
>
> // After:
>
> ; ModuleID = 'out'
>
> %srcty = type { %srcb1, %srcb1 }
> %srcb1 = type { i8* }
>
> define %srcty @srcfunc() {
> entry:
> ret %srcty zeroinitializer
> }
>
> You can see in before and after that the two structurally equivalent but
> distinct named types, srcb1 and srcb2, were merged. After a bit of
> discussion on #llvm, it was suggested that this is intended behaviour. If
> so, this is terribly broken.
>
> For one thing, my code depends on looking up types from the module by
> name. So far it just so happens that I don't have any test cases that look
> up structurally equivalent types after linking by name, but it certainly
> could occur for some user inputs for my compiler.
>
> Secondly, it's much more difficult for me to determine what is going on in
> this IR. In my compiler then I strictly generate one LLVM type for various
> types in the source code. If the compiler is broken for any reason, and I
> look at the IR output, then I expect to see this. If I don't see this, then
> I think the compiler is broken. I just spent three days trying to figure
> out why on earth my compiler was not generating the types correctly, when
> it was all along. And it's much more difficult to interpret the outcome
> when the IR no longer distinguishes between the two logically completely
> distinct types that just happen to have the same IR representation.
>
> Fundamentally, LLVM should never mutate the contents of the module unless
> it's explicitly requested, because the programmer depends on properties of
> the IR that are more than just binary equivalence. Moving the contents of
> one module into another module is no exception.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150720/15210a13/attachment.html>
More information about the llvm-dev
mailing list