<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 6/4/2021 2:25 PM, John McCall via
cfe-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:3F1E5D6A-DE05-46C9-A4E9-4C575EF66DE1@apple.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div style="font-family:sans-serif">I don’t believe this is
correct. LLVM does not have an innate<br>
<div style="white-space:normal">
<p dir="auto">
concept of typed memory. The type of a global or local
allocation<br>
is just a roundabout way of giving it a size and default
alignment,<br>
and similarly the type of a load or store just determines
the width<br>
and default alignment of the access. There are no
restrictions on<br>
what types can be used to load or store from certain
objects.</p>
<p dir="auto">C-style type aliasing restrictions are imposed
using <code>tbaa</code><br>
metadata, which are unrelated to the IR type of the access.</p>
<p dir="auto">John.</p>
</div>
</div>
</blockquote>
<p>I've never been thoroughly involved in any of the actual
optimizations here, but it seems to me that there is a soundness
hole in the LLVM semantics that we gloss over when we say that
LLVM doesn't have typed memory.</p>
<p>Working backwards from what a putative operational semantics of
LLVM might look like (and I'm going to ignore poison/undef because
it's not relevant), I think there is agreement that integer types
in LLVM are purely bitvectors. Any value of i64 5 can be replaced
with any other value of i64 5 no matter where it came from. At the
same time, when we have pointers involved, this is not true. Two
pointers may have the same numerical value (e.g., when cast to
integers), but one might not be replaceable with the other because
there's other data that might not be the same. So in operational
terms, pointers have both a numerical value and a bag of
provenance data (probably other stuff, but let's be simple and
call it provenance).</p>
<p>Now we have to ask what the semantics of converting between
integers and pointers are. Integers, as we've defined, don't have
provenance data. So an inttoptr instruction has to synthesize that
provenance somehow. Ideally, we'd want to grab that data from the
ptrtoint instruction that generated the integer, but the semantics
of integers means we can only launder that data globally, so that
an inttoptr has the union of all of the provenance data that was
ever fed into an inttoptr (I suspect the actual semantics we use
is somewhat more precise than this in that it only considers those
pointers that point to still-live data, which doesn't invalidate
anything I'm about to talk about).</p>
<p>Okay, what about memory? I believe what most people intend to
mean when they say that LLVM's memory is untyped is that a load or
store of any type is equivalent to first converting it to an
integer and then storing the integer into memory. E.g. these two
functions are semantically equivalent:<br>
</p>
<pre>define void @foo(ptr %mem, i8* %foo) {
store i8* %foo, ptr %mem
}
define void @bar(ptr %mem, i8* %foo) {
%asint = ptrtoint i8* %foo to i64 ; Or whatever pointer size you have
store i64 %asint, ptr %mem
}
</pre>
<p>In other words, we are to accept that every load and store
instruction of a pointer has an implicit inttoptr or ptrtoint
attached to it. But as I mentioned earlier, pointers have this
extra metadata attached to it that is lost when converting to an
integer. Under this strict interpretation of memory, we *lose*
that metadata every time a pointer is stored in memory, as if we
did an inttoptr(ptrtoint x). Thus, the following two functions are
*not* semantically equivalent in that model:</p>
<pre>define i8* @basic(i8* %in) {
ret i8* %in
}
define i8* @via_alloc(i8* %in) {
%mem = alloca i8*
store i8* %in, i8** %mem
%out = load i8*, i8** %mem
ret i8* %out
}
</pre>
<p>In order to allow these two functions to be equivalent, we have
to let the load of a pointer recover the provenance data stored by
the store of the pointer, and nothing more general. If either one
of those were instead an integer load or store, then no provenance
data can be communicated, so the integer and the pointer loads
*must* be nonequivalent (although loading an integer instead of a
pointer would presumably be a pessimistic transformation).</p>
<p>In short, pointers have pointery bits that aren't reflected in a
bitvector representation an integer has. LLVM has some
optimizations that assume that loads and stores only have
bitvector manipulation semantics, while other optimizations (and
most of the frontends) expect that loads and stores will preserve
the pointery bits. And when these interact with each other, it's
undoubtedly possible that the pointery bits get lost along the
way.<br>
</p>
<pre class="moz-signature" cols="72">--
Joshua Cranmer
</pre>
</body>
</html>