[llvm-dev] [RFC] Introducing a byte type to LLVM

Ralf Jung via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 15 12:15:52 PDT 2021


Hi,

> The semantics you seem to want are that LLVM’s integer types cannot carry 
> information from pointers. But I can cast a pointer to an integer in C and 
> vice-versa, and compilers have de facto defined the behavior of subsequent 
> operations like breaking the integer up (and then putting it back together), 
> adding numbers to it, and so on. So no, as a C compiler writer, I do not have a 
> choice; I will have to use a type that can validly carry pointer information for 
> integers in C.

Integers demonstrably do not carry provenance; see 
<https://www.ralfj.de/blog/2020/12/14/provenance.html> for a detailed 
explanation of why.
As a consequence of this, ptr-int-ptr roundtrips are lossy: some of the original 
provenance information is lost. This means that optimizing away such roundtrips 
is incorrect, and indeed doing so leads to miscompilations 
(https://bugs.llvm.org/show_bug.cgi?id=34548).

The key difference between int and byte is that ptr-byte-ptr roundtrips are 
*lossless*, all the provenance is preserved. This means some extra optimizations 
(such as removing these roundtrips -- which implicitly happens when a 
redundant-store-after-load is removed), but also some lost optimizations (most 
notably, "x == y" does not mean x and y are equal in all respects; their 
provenance might still differ, so it is incorrect for GVN to replace one my the 
other).

It's a classic tradeoff: we can *either* have lossless roundtrips *or* "x == y" 
implies full equality of the abstract values. Having both together leads to 
contradictions, which manifest as miscompilations. "byte" and "int" represent 
the two possible choices here; therefore, by adding "byte", LLVM would close a 
gap in the expressive power of its IR.

Kind regards,
Ralf


More information about the llvm-dev mailing list