[LLVMdev] New Type System Questions

Mon Jul 25 21:43:56 PDT 2011

On Jul 24, 2011, at 10:59 PM, Talin wrote:
> So far I'm really liking the new type system -- I've been able to simplify my code generator in a number of areas. And the IR is now vastly more readable, both in the debugger (using dump()) and when printing modules via llvm-dis. It's a tremendous improvement.

Great!  It was long overdue.  Someone should have done it right back in 2002. ;-)

> I do have a few comments / questions:
> 
> -- I think I may be misunderstanding how named structs are supposed to be combined in the linker. Say we have a type that is defined in two modules with the same name, however in one of the modules the type is abstract and in the other module it has a body. The behavior I would expect is that it would merge the two definitions, so that now you have one type with a body. However, instead what I am getting is a lot of renamed types - %tart.reflect.NameTable.3562 and so on. This is puzzling, as I shouldn't have any renamed types in my modules at all.

As I responded on the other thread, name preservation is best-effort but not guaranteed.

Consider if you linked these two modules:

%a = type { i32 }
%G1 = internal global %a ...
...and...
 %a = type { float }
%G2 = internal global %a ...

G1 and G2 are just "static" globals with no relation and no linkage to each other. When the linker produces a result file, it needs both versions of "%a", so one *must* be renamed.  There are also issues when modules have conflicting definitions and there *is* linkage.

Beyond this inherent issue, the place that type uniquing happens is at the LLVM Context level.  This is the place that holds "the one true i32" and thus "the one true i32*" etc.  Because this is where uniquing happens, this is now also where named struct uniquing happens.  This means that you can't have two different types named the same thing in the same context.

Linking bitcode necessarily requires loading multiple modules into the same context, so when the second module is loaded (but before linking happens) any conflicts in the second module are auto-renamed.  The linker then tries to (best effort) rewrite the second modules types in terms of the first module's types where possible.

> -- I notice that BitReader now catches some errors that are missed by the module verifier. (I submitted a bug report on this). Basically, you can create an abstract type and have a GEP instruction that uses that type - and it will pass through the module verifier and the bitcode writer, but the bitcode reader will assert when it tries to load it in. Yes, I can now create modules that cause llvm-dis to abort :)

Cool, I'll take a look at the PR when I get some cycles.  Thanks for the advise on building strings!

-Chris