<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">
</head>
<body>
<div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">On 21 Jun 2021, at 2:15, Juneyoung Lee wrote:</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid #3983C4; color:#3983C4; margin:0 0 5px; padding-left:5px"><p dir="auto">Hi,<br>
Sorry for my late reply, and thank you for sharing great summaries & ideas.<br>
I'll leave my thoughts below.<br>
<br>
On Wed, Jun 16, 2021 at 8:56 AM John McCall <rjmccall@apple.com> wrote:</p>
<blockquote style="border-left:2px solid #3983C4; color:#7CBF0C; margin:0 0 5px; padding-left:5px; border-left-color:#7CBF0C"><p dir="auto">Now, that rule as I’ve stated it would be really bad. Allowing a<br>
lucky guess to resolve to absolutely anything would almost<br>
completely block the optimizer from optimizing memory. For example,<br>
if a local variable came into scope, and then we called a function<br>
that returned a different pointer, we’d have to conservatively<br>
assume that that pointer might alias the local, even if the address<br>
of the local was never even taken, much less escaped:<br>
<br>
int x = 0;<br>
int *p = guess_address_of_x();<br>
*p = 15;<br>
printf(“%d\n”, x); // provably 0?<br>
<br>
So the currently favored proposal adds a really important caveat:<br>
this blessing of provenance only works when a pointer with the<br>
correct provenance has been “exposed”. There are several ways to<br>
expose a pointer, including I/O, but the most important is casting<br>
it to an integer.<br>
</p>
</blockquote><p dir="auto">This is a valid point. If one wants to formally show the correctness of<br>
this kind of memory optimization this problem should be tackled.<br>
I think n2676's 'Allocation-address nondeterminism' (p. 27) paragraph<br>
addresses this issue.<br>
The underlying idea is that the address of an allocated object is assumed<br>
to be non-deterministically chosen, causing any guessed accesses to raise<br>
undefined behavior in at least one execution.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Ah, that’s an interesting idea. I must have missed that section; I’m<br>
afraid I only skimmed N2676 looking for the key points, because it’s<br>
quite a long document.</p>
<p dir="auto">To be clear, under the <code>PVNI-ae</code> family of semantics, that rule would<br>
not be necessary to optimize <code>x</code> in my example because int-to-pointer<br>
casts are not allowed to recreate provenance for <code>x</code> because it has<br>
not been exposed. That rule does theoretically allow optimization of<br>
some related examples where the address of <code>x</code> <em>has</em> been exposed,<br>
because it lets the compiler try to reason about what happens after<br>
exposure; it’s no longer true that exposure implies guessing are okay.</p>
<p dir="auto">Unfortunately, though, I this non-determinism still doesn’t allow LLVM<br>
to be anywhere near as naive about pointer-to-int casts as it is today.<br>
The rule is intended to allow the compiler to start doing use-analysis<br>
of exposures; let’s assume that this analysis doesn’t see any<br>
un-analyzable uses, since of course it would need to conservatively<br>
treat them as escapes. But if we can optimize uses of integers as if<br>
they didn’t carry pointer data — say, in a function that takes integer<br>
parameters — and then we can apply those optimized uses to integers<br>
that concretely result from pointer-to-int casts — say, by inlining<br>
that function into one of its callers — can’t we end up with a use<br>
pattern for one or more of those pointer-to-int casts that no longer<br>
reflects the fact that it’s been exposed? It seems to me that either<br>
(1) we cannot do those optimizations on opaque integers or (2) we<br>
need to record that we did them in a way that, if it turns out that<br>
they were created by a pointer-to-int casts, forces other code to<br>
treat that pointer as opaquely exposed.</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid #3983C4; color:#3983C4; margin:0 0 5px; padding-left:5px"><blockquote style="border-left:2px solid #3983C4; color:#7CBF0C; margin:0 0 5px; padding-left:5px; border-left-color:#7CBF0C"><p dir="auto"><snip><br>
<br>
Everything I’ve been talking about so far is a C-level concept:<br>
an int-to-pointer cast is e.g. (float*) myInt, not inttoptr<br>
in LLVM IR. But I think people have an expectation of what these<br>
things mean in LLVM IR, and I haven’t seen it written out explicitly,<br>
so let’s do that now.<br>
<br>
The first assumption here is that int-to-pointer and pointer-to-int<br>
casts in C will translate to inttoptr and ptrtoint operations<br>
in IR. Now, this is already problematic, because those operations<br>
do not currently have the semantics they need to have to make the<br>
proposed optimization model sound. In particular:<br>
<br>
-<br>
<br>
ptrtoint does not have side-effects and can be dead-stripped<br>
when unused, which as discussed above is not okay.<br>
-<br>
<br>
ptrtoint on a constant is folded to a constant expression,<br>
not an instruction, which is therefore no longer anchored in the<br>
code and does not reliably record that the global may have escaped.<br>
(Unused constant expressions do not really exist, and we absolutely<br>
cannot allow them to affect the semantics of the IR.)<br>
<br>
Of course, this is only significant for globals that don’t already<br>
have to be treated conservatively because they might have other<br>
uses. That is, it only really matters for globals with, say,<br>
internal or private linkage.<br>
-<br>
<br>
inttoptr can be reordered with other instructions, which is<br>
not allowed because different points in a function may have<br>
different sets of storage with escaped provenance.<br>
-<br>
<br>
inttoptr(ptrtoint) can be peepholed; ignoring the dead-stripping<br>
aspects of removing the inttoptr, this also potentially<br>
introduces UB because the original inttoptr “launders” the<br>
provenance of the pointer to the current provenance of the<br>
storage, whereas the original pointer may have stale provenance.<br>
<br>
All of these concerns are valid.</p>
</blockquote><p dir="auto">(I'm not sure whether this is a good place to introduce this, but) we<br>
actually have semantics for pointer castings tailored to LLVM (link<br>
<<a href="https://sf.snu.ac.kr/publications/llvmtwin.pdf">https://sf.snu.ac.kr/publications/llvmtwin.pdf</a>>).<br>
In this proposal, ptrtoint does not have an escaping side effect; ptrtoint<br>
and inttoptr are scalar operations.<br>
inttoptr simply returns a pointer which can access any object.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Skimming your paper, I can see how this works <em>except</em> that I don’t<br>
see any way not to treat <code>ptrtoint</code> as an escape. And really I think<br>
you’re already partially acknowledging that, because that’s the only<br>
real sense of saying that <code>inttoptr(ptrtoint p)</code> can’t be reduced to<br>
<code>p</code>. If those are really just scalar operations that don’t expose<br>
<code>p</code> in ways that might be disconnected from the uses of the <code>inttoptr</code><br>
then that reduction ought to be safe.</p>
<p dir="auto">John.</p>
</div>
</div>
</body>
</html>