[llvm-dev] [RFC] Introducing a byte type to LLVM

Ralf Jung via llvm-dev llvm-dev at lists.llvm.org
Sun Jun 20 03:22:42 PDT 2021

Hi David,

>     Integers demonstrably do not carry provenance; see
>     <https://www.ralfj.de/blog/2020/12/14/provenance.html
>     <https://www.ralfj.de/blog/2020/12/14/provenance.html>> for a detailed
>     explanation of why.
>     As a consequence of this, ptr-int-ptr roundtrips are lossy: some of the
>     original
>     provenance information is lost. This means that optimizing away such roundtrips
>     is incorrect, and indeed doing so leads to miscompilations
>     (https://bugs.llvm.org/show_bug.cgi?id=34548
>     <https://bugs.llvm.org/show_bug.cgi?id=34548>).
>     The key difference between int and byte is that ptr-byte-ptr roundtrips are
>     *lossless*, all the provenance is preserved. This means some extra
>     optimizations
>     (such as removing these roundtrips -- which implicitly happens when a
>     redundant-store-after-load is removed), but also some lost optimizations (most
>     notably, "x == y" does not mean x and y are equal in all respects; their
>     provenance might still differ, so it is incorrect for GVN to replace one my the
>     other).
>     It's a classic tradeoff: we can *either* have lossless roundtrips
> I think an important part of explaining the motivation for "byte" would be an 
> explanation/demonstration of what the cost of losing "lossless roundtrips" would be.

I am not entirely sure where you are going with this question. Currently LLVM 
assumes *both* lossless ptr-int-ptr roundtrips *and* it goes GVN based on "x == 
y" (on integers). This is simply inconsistent and demonstrably leads to 
miscompilations. One of them needs to be lost.
Consensus in LLVM seems to be that one would rather lose lossless roundtrip than 
GCN based on "==". I am not an expert in these trade-offs among optimizations. 
All I can do is cast some light on where the edge of the design space for a 
correct set of optimizations lies. I will leave it to others to decide where in 
that design space they'd rather be.

Kind regards,

>     *or* "x == y"
>     implies full equality of the abstract values. Having both together leads to
>     contradictions, which manifest as miscompilations. "byte" and "int" represent
>     the two possible choices here; therefore, by adding "byte", LLVM would close a
>     gap in the expressive power of its IR.
>     Kind regards,
>     Ralf
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>

Website: https://people.mpi-sws.org/~jung/

More information about the llvm-dev mailing list