[LLVMdev] Disjoint types after reading several modules

Wed Feb 1 16:15:19 PST 2012

On Jan 31, 2012, at 6:16 AM, Clemens Hammacher wrote:
> This is because each module contains its own type table with all the types used in that module. When reading in the corresponding bitcode, the BitcodeReader explicitly calls StructType::create, without looking up in the context whether an equivalent type (even with the same name) already exists.
> So I think that llvm is behaving correctly here, according to the new type system. But for us, the problem is that previously identical types are not identical any more after deserialization, which leads to problems when copying code between the modules.
> 
> So did anyone already stumble across that problem, and solved it? Or is there a known solution to it?

I'm familiar with the scenario, but haven't heard of anyone trying to do something quite like this.  The linker has to solve the exact same problem (read multiple .bc files and unify types across them).  This is the impetus behind TypeMapTy in lib/Linker/LinkModules.cpp.  You'll probably need to do something like that.

> Our idea for solving this is to add a named metadata node to each module before serializing it to bitcode, in order to identify previously identical types after deserialization. The metadata consists of a list of constants, where each even entry is a ConstantAggregateZero of a named struct, and the succeeding entry is a constant integer uniquely identifying that type. We plan to just use the Type* casted to i64.
> Then after reading in all modules, we could find the named metadata, iterate over its elements and unify all Types which have the same number assigned. This would involve recreating and replacing global values, if their type changed.
> Does this approach sound reasonable to you?

I have to ask: why are you writing these modules out as separate bc files?  A more typical approach would be to write out one big .bc file, and then lazily read in functions as you need them.  This avoids problems like you're seeing, and has the advantage of sharing types and constants as well.

-Chris