[LLVMdev] Disjoint types after reading several modules

Tue Jan 31 06:16:13 PST 2012

Dear community,

we are currently facing a problem related to the new type system in llvm 
3.0.
Our setting is the following: We have two or more modules, all in the 
same LLVMContext. They are sharing some types, meaning that for example 
functions in different modules are referencing the same (meaning pointer 
identical) type.
Now we write the different modules to the disk, and read them back from 
another program (again into the same LLVMContext).
The problem now is that named structs get duplicated for the different 
modules, meaning that when the second module is read, a new named struct 
is created in the context, and its name gets suffixed by a number.
This is because each module contains its own type table with all the 
types used in that module. When reading in the corresponding bitcode, 
the BitcodeReader explicitly calls StructType::create, without looking 
up in the context whether an equivalent type (even with the same name) 
already exists.
So I think that llvm is behaving correctly here, according to the new 
type system. But for us, the problem is that previously identical types 
are not identical any more after deserialization, which leads to 
problems when copying code between the modules.

So did anyone already stumble across that problem, and solved it? Or is 
there a known solution to it?

Our idea for solving this is to add a named metadata node to each module 
before serializing it to bitcode, in order to identify previously 
identical types after deserialization. The metadata consists of a list 
of constants, where each even entry is a ConstantAggregateZero of a 
named struct, and the succeeding entry is a constant integer uniquely 
identifying that type. We plan to just use the Type* casted to i64.
Then after reading in all modules, we could find the named metadata, 
iterate over its elements and unify all Types which have the same number 
assigned. This would involve recreating and replacing global values, if 
their type changed.
Does this approach sound reasonable to you?

Another option would be to merge all modules together in a new module 
before serialization, prefixing all global values. After deserialization 
of this single module, the types would still be correct, and the module 
could be split up again. But this would require some rearrangements in 
our code since the modules would have to be written out at one single 
point. That's why we discarded that idea for now.

If anything is unclear, I can provide examples.

Thanks for any comments,
Clemens

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6392 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120131/3c65ed6f/attachment.bin>