[llvm-dev] [RFC] Introducing a byte type to LLVM

Ralf Jung via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 15 12:07:39 PDT 2021

Hi Nicolai,

>      > 6. (How) are pointer types fundamentally different from b<N> types of the
>      > correct size? (By this I mean: is there any interesting difference in the
>     values
>      > that these types can carry? Ignore surface differences like the fact that
>     GEP
>      > traditionally goes with pointers while `add` goes with integer types --
>     we could
>      > have a GEP instruction on a correctly sized b<N>)
>     I'm not saying I have the answer here, but one possible difference might arise
>     with "mixing bytes from different pointers". Say we are storing pointer "ptr1"
>     directly followed by "ptr2" on a 64bit machine, and now we are doing an
>     (unalinged) 8-byte load covering the last 4 bytes of ptr1 and the first 4 bytes
>     of ptr2. This is certainly a valid value for b64. Is it also a valid value at
>     pointer type, and if yes, which provenance does it have?
> This kind of example is why I was implicitly assuming that we must have a 
> "provenance union" operation anyway, whether we like it or not. I suppose the 
> alternative is to say that pointers formed in this way, whether directly or 
> indirectly, are poison, but I have my doubts whether this is feasible. What 
> happens with pointer arithmetic where you start out with two pointers of 
> different provenance, convert to integer in the source language, subtract them, 
> use the result further in some way, and for some reason all steps are performed 
> with "byte" types in LLVM IR?

My personal model here is that every *byte* independently can carry provenance.
When a pointer is loaded and the bytes have different provenance, the load is 
fine -- but any load/store with that pointer will be UB. This is because once 
there are different provenances, i.e. different allocated objects that this 
pointer is supposed to point to, then at least one of them will be violated by 
the access, so this is an "incorrect provenance" access. In other words, such 
pointers are very similar to those created by going out-of-bounds with 
"getelementptr" (no inbounds!): not poison, but still UB to load/store.

Regarding your question, there is no operation that takes two pointers as input, 
so there is never a need to "union" provenance. Your sequence of operation will 
involve integer roundtrips, and pointers cast from integers have a kind of 
"wildcard" provenance; they may access any allocation. (getelementptr inbounds 
on such pointers can restrict that; see 
https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf for details. And now I am 
wondering what happens when one mixes the bytes of several of these pointers 
that have 'GEPi on integer pointer provenance'... but saying this is an invalid 
provenance that may not to *any* loads/stores seems fine.)

Kind regards,

> Cheers,
> Nicolai
>     Kind regards,
>     Ralf
> -- 
> Lerne, wie die Welt wirklich ist,
> aber vergiss niemals, wie sie sein sollte.

Website: https://people.mpi-sws.org/~jung/

More information about the llvm-dev mailing list