[llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM
John McCall via llvm-dev
llvm-dev at lists.llvm.org
Fri Jun 4 12:53:54 PDT 2021
On 4 Jun 2021, at 15:06, Nuno Lopes wrote:
> On 4 Jun 2021, at 11:24, George Mitenkov wrote:
> Hi all,
> Together with Nuno Lopes and Juneyoung Lee we propose to add a new
> type to LLVM to fix miscompilations due to load type punning. Please
> the proposal below. It would be great to hear the
> char and unsigned char are considered to be universal holders in C.
> can access raw memory and are used to implement memcpy. i8 is the
> counterpart but it does not have such semantics, which is also not
> desirable as it would disable many optimizations.
> I don’t believe this is correct. LLVM does not have an innate
> concept of typed memory. The type of a global or local allocation
> is just a roundabout way of giving it a size and default alignment,
> and similarly the type of a load or store just determines the width
> and default alignment of the access. There are no restrictions on
> what types can be used to load or store from certain objects.
> C-style type aliasing restrictions are imposed using tbaa
> metadata, which are unrelated to the IR type of the access.
> It’s debatable whether LLVM considers memory to be typed or not. If
> we don’t consider memory to be typed, then *all* integer load
> operations have to be considered as potentially escaping pointers.
> store i32* %p, i32** %q
> %q2 = bitcast i32** %q to i64*
> %v = load i64* %q2
> This program stores a pointer and then loads it back as an integer. So
> there’s an implicit pointer-to-integer cast, which escapes the
> pointer. If we allow this situation to happen, then the alias analysis
> code is broken, as well as several optimizations. LLVM doesn’t
> consider loads as potential pointer escape sites. It would probably be
> a disaster (performance wise) if it did!
Huh? It’s not the load that’s the problem in your example, it’s
the store. If we have a pointer that’s currently known to not alias
anything, and that pointer gets stored somewhere, and we can’t analyze
all the uses of where it’s stored, then yes, anything that could
potentially load from that memory might load the pointer, and so the
pointer can’t just be assumed to not alias anything anymore. We
don’t need subtle reasoning about bitcasts for that, that’s just
basic conservative code analysis.
If you want to be doing optimizations where we opaquely assume that
certain loads don’t carry pointers, then you need frontend involvement
for that; you can’t just assume that some random i64 doesn’t carry a
pointer. And I’ve gotta tell you, some random i64 is allowed to carry
a pointer in C, so you’re not likely to get much mileage out of that.
If you make this change, all that’s going to happen is that C-ish
frontends will have to make all their integer types `b8`, `b32`, `b64`,
etc. to indicate that they can carry pointer data, and then you’ll be
right back where you were in terms of optimizability. The frontends
that *can* make stronger statements about integers can probably all make
much stronger promises about aliasing than you’d get from this
> The introduction of the byte type allow us to make all pointer <->
> integer casts explicit, so that we don’t have to make all integer
> loads as escaping. It also allow us to say that LLVM optimizations are
> correct, and we “just” need to create a few new optimization to
> get rid of the extra bytecast instructions when they are provably not
It sounds an awful lot like you’re trying to optimize a different
language than the one you’ve got. I can certainly see how this
constraint on `inttoptr` might be useful, but it’s not a constraint
that LLVM has historically documented or enforced, and you haven’t
made a very compelling case for why it’s worth such a major and
incompatible change to the language.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev