[llvm-dev] [RFC] Introducing a byte type to LLVM

Nicolai Hähnle via llvm-dev llvm-dev at lists.llvm.org
Sun Jun 13 23:29:10 PDT 2021

Hi Ralf,

On Sun, Jun 13, 2021 at 5:22 PM Ralf Jung <jung at mpi-sws.org> wrote:

> > 1. Forbidding arithmetic and bitwise operations in b<N> seems pointless.
> Just
> > define them as the corresponding i<N> op plus the union of provenance of
> the
> > operands. This allows consistent implementation of char/unsigned char as
> b8,
> > without having to jump back and forth between b8 and i8 all the time.
> FWIW, "char" addition happens at "int" type due to integer promotion. So
> there
> is no problem with back and forth here.
> "Union" of provenance is currently not an operation that is required to
> model
> LLVM IR, so your proposal would necessitate adding such a concept. It'll
> be
> interesting to figure out how "getelementptr inbounds" behaves on
> multi-provenance pointers...

True, something needs to be said about that. The main question is whether
"jumping" between different objects that are both in the provenance set is
poison or not. Ultimately, the goal of provenance is to help alias
analysis, so that's what should be driving that choice.

> 6. (How) are pointer types fundamentally different from b<N> types of the
> > correct size? (By this I mean: is there any interesting difference in
> the values
> > that these types can carry? Ignore surface differences like the fact
> that GEP
> > traditionally goes with pointers while `add` goes with integer types --
> we could
> > have a GEP instruction on a correctly sized b<N>)
> I'm not saying I have the answer here, but one possible difference might
> arise
> with "mixing bytes from different pointers". Say we are storing pointer
> "ptr1"
> directly followed by "ptr2" on a 64bit machine, and now we are doing an
> (unalinged) 8-byte load covering the last 4 bytes of ptr1 and the first 4
> bytes
> of ptr2. This is certainly a valid value for b64. Is it also a valid value
> at
> pointer type, and if yes, which provenance does it have?

This kind of example is why I was implicitly assuming that we must have a
"provenance union" operation anyway, whether we like it or not. I suppose
the alternative is to say that pointers formed in this way, whether
directly or indirectly, are poison, but I have my doubts whether this is
feasible. What happens with pointer arithmetic where you start out with two
pointers of different provenance, convert to integer in the source
language, subtract them, use the result further in some way, and for some
reason all steps are performed with "byte" types in LLVM IR?


> Kind regards,
> Ralf

Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210614/c1202ada/attachment.html>

More information about the llvm-dev mailing list