[llvm-dev] [RFC] Introducing a byte type to LLVM

Ralf Jung via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 15 12:15:52 PDT 2021


> The semantics you seem to want are that LLVM’s integer types cannot carry 
> information from pointers. But I can cast a pointer to an integer in C and 
> vice-versa, and compilers have de facto defined the behavior of subsequent 
> operations like breaking the integer up (and then putting it back together), 
> adding numbers to it, and so on. So no, as a C compiler writer, I do not have a 
> choice; I will have to use a type that can validly carry pointer information for 
> integers in C.

Integers demonstrably do not carry provenance; see 
<https://www.ralfj.de/blog/2020/12/14/provenance.html> for a detailed 
explanation of why.
As a consequence of this, ptr-int-ptr roundtrips are lossy: some of the original 
provenance information is lost. This means that optimizing away such roundtrips 
is incorrect, and indeed doing so leads to miscompilations 

The key difference between int and byte is that ptr-byte-ptr roundtrips are 
*lossless*, all the provenance is preserved. This means some extra optimizations 
(such as removing these roundtrips -- which implicitly happens when a 
redundant-store-after-load is removed), but also some lost optimizations (most 
notably, "x == y" does not mean x and y are equal in all respects; their 
provenance might still differ, so it is incorrect for GVN to replace one my the 

It's a classic tradeoff: we can *either* have lossless roundtrips *or* "x == y" 
implies full equality of the abstract values. Having both together leads to 
contradictions, which manifest as miscompilations. "byte" and "int" represent 
the two possible choices here; therefore, by adding "byte", LLVM would close a 
gap in the expressive power of its IR.

Kind regards,

More information about the llvm-dev mailing list