[llvm-dev] [RFC] Introducing a byte type to LLVM
Ralf Jung via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 15 12:15:52 PDT 2021
> The semantics you seem to want are that LLVM’s integer types cannot carry
> information from pointers. But I can cast a pointer to an integer in C and
> vice-versa, and compilers have de facto defined the behavior of subsequent
> operations like breaking the integer up (and then putting it back together),
> adding numbers to it, and so on. So no, as a C compiler writer, I do not have a
> choice; I will have to use a type that can validly carry pointer information for
> integers in C.
Integers demonstrably do not carry provenance; see
<https://www.ralfj.de/blog/2020/12/14/provenance.html> for a detailed
explanation of why.
As a consequence of this, ptr-int-ptr roundtrips are lossy: some of the original
provenance information is lost. This means that optimizing away such roundtrips
is incorrect, and indeed doing so leads to miscompilations
The key difference between int and byte is that ptr-byte-ptr roundtrips are
*lossless*, all the provenance is preserved. This means some extra optimizations
(such as removing these roundtrips -- which implicitly happens when a
redundant-store-after-load is removed), but also some lost optimizations (most
notably, "x == y" does not mean x and y are equal in all respects; their
provenance might still differ, so it is incorrect for GVN to replace one my the
It's a classic tradeoff: we can *either* have lossless roundtrips *or* "x == y"
implies full equality of the abstract values. Having both together leads to
contradictions, which manifest as miscompilations. "byte" and "int" represent
the two possible choices here; therefore, by adding "byte", LLVM would close a
gap in the expressive power of its IR.
More information about the llvm-dev