[LLVMdev] New Type System Questions

Talin viridia at gmail.com
Sun Jul 24 22:59:06 PDT 2011


So far I'm really liking the new type system -- I've been able to simplify
my code generator in a number of areas. And the IR is now vastly more
readable, both in the debugger (using dump()) and when printing modules via
llvm-dis. It's a tremendous improvement.

I do have a few comments / questions:

-- I think I may be misunderstanding how named structs are supposed to be
combined in the linker. Say we have a type that is defined in two modules
with the same name, however in one of the modules the type is abstract and
in the other module it has a body. The behavior I would expect is that it
would merge the two definitions, so that now you have one type with a body.
However, instead what I am getting is a lot of renamed types
- %tart.reflect.NameTable.3562 and so on. This is puzzling, as I shouldn't
have any renamed types in my modules at all.

-- I notice that BitReader now catches some errors that are missed by the
module verifier. (I submitted a bug report on this). Basically, you can
create an abstract type and have a GEP instruction that uses that type - and
it will pass through the module verifier and the bitcode writer, but the
bitcode reader will assert when it tries to load it in. Yes, I can now
create modules that cause llvm-dis to abort :)

-- Self-referential vs. anonymous types. This is more of a comment than a
question: in my language, String literals are implemented as anonymous types
because the string data follows header struct in memory. So basically
there's a named type with the format:

   tart.core.String = { %ObjectHeader, %tart.core.String*, int32, [0 x char]
}

And then for a string literal of length N there's an anonymous type:

   { %ObjectHeader, %tart.core.String*, int32, [N x char] }

It's anonymous because it doesn't make sense to generate a new named type
for each different length of string. Now, the reason for the
%tart.core.String* field in the middle there is to support substring
references - substrings point to the orignal string, whereas non-substrings
point to themselves (I've left out a few fields for purposes of this
example.)

So to make the string literal we need to have create a Constant, whose type
is an anonymous struct, which has a pointer to itself embedded within it.
Turns out that you can do this with UndefValue, as long as when you refine
the undef, you pointer-cast the anon struct to the named struct.

I only mention this because it took me a while to figure out, and it's the
kind of recipe that you might want to consider mentioning in the programmers
manual, along with the recipe for creating self-referential named structs.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110724/3adcb0f0/attachment.html>


More information about the llvm-dev mailing list