[LLVMdev] IR type safety

Tue Sep 21 09:40:36 PDT 2010

Renato Golin wrote:
> Hi folks,
>
> I have a few questions I was saving for later and never got around to
> ask them, so I'll send a few emails to the list, one with each
> question, to ease the further discussions that may come from them...
>
> The first question is:
>
> According to the language reference, LLVM IR is type safe. It means,
> for instance, that you won't be able to perform ADD operations in two
> different types or call functions with the wrong arguments, etc.
>   

First, this is only partially correct.  LLVM IR is typed, and most 
operations are type-safe.  However, LLVM can represent type-unsafe code 
through at least the following:

1) LLVM has a cast instruction (and cast constant expression) that can 
cast one type to another.  It's possible to take a float, cast it to an 
int, and add it to another int.

2) LLVM does not require garbage collection or region-based memory 
management.  You can get implicit casting of values if you dereference a 
dangling pointer.

3) LLVM does not prevent a function from returning a pointer to 
stack-allocated memory.  Dangling pointers to stack-allocated objects is 
possible.

That said, you can generate type-safe LLVM IR, and if you force your 
front-end to generate IR with certain restrictions, you can probably 
prove that it is type-safe.

> But, when declaring two types that happen to (supposedly) have the
> same layout, LLVM ignores the second type and use the first's name
> instead.
>
> In one module, it doesn't matter, but once you join different modules
> with, possibly, different data layouts, the data types are not the
> same any more.
>
> Is this a declaration that you will never be able (with an error
> message, assert or whatever) to join two IRs with different data
> layouts? Or it was never thought that you could mix them?
>   

I think linking two LLVM bitcode files with different data layouts would 
be hard (especially given different endians); I think LLVM 2.7 prints a 
warning when data layout doesn't match.  However, I'll let people more 
knowledgeable of LLVM data layout answer this part of your question.

-- John T.

> In my view, that is the precise reason why we have the data layout.
> Unions can't rely on them (why we don't have unions any more) and
> compiler data (RTTI, VT, VTT, etc) are all statically created with the
> correct size.
>
>
>