[cfe-dev] [RFC] Introducing a byte type to LLVM
Ralf Jung via cfe-dev
cfe-dev at lists.llvm.org
Tue Jun 15 12:15:52 PDT 2021
Hi,
> The semantics you seem to want are that LLVM’s integer types cannot carry
> information from pointers. But I can cast a pointer to an integer in C and
> vice-versa, and compilers have de facto defined the behavior of subsequent
> operations like breaking the integer up (and then putting it back together),
> adding numbers to it, and so on. So no, as a C compiler writer, I do not have a
> choice; I will have to use a type that can validly carry pointer information for
> integers in C.
Integers demonstrably do not carry provenance; see
<https://www.ralfj.de/blog/2020/12/14/provenance.html> for a detailed
explanation of why.
As a consequence of this, ptr-int-ptr roundtrips are lossy: some of the original
provenance information is lost. This means that optimizing away such roundtrips
is incorrect, and indeed doing so leads to miscompilations
(https://bugs.llvm.org/show_bug.cgi?id=34548).
The key difference between int and byte is that ptr-byte-ptr roundtrips are
*lossless*, all the provenance is preserved. This means some extra optimizations
(such as removing these roundtrips -- which implicitly happens when a
redundant-store-after-load is removed), but also some lost optimizations (most
notably, "x == y" does not mean x and y are equal in all respects; their
provenance might still differ, so it is incorrect for GVN to replace one my the
other).
It's a classic tradeoff: we can *either* have lossless roundtrips *or* "x == y"
implies full equality of the abstract values. Having both together leads to
contradictions, which manifest as miscompilations. "byte" and "int" represent
the two possible choices here; therefore, by adding "byte", LLVM would close a
gap in the expressive power of its IR.
Kind regards,
Ralf
More information about the cfe-dev
mailing list