<!DOCTYPE html>

<html>

<head>

<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">

</head>

<body>

<div style="font-family:sans-serif"><div style="white-space:normal">

<p dir="auto">On 21 Jun 2021, at 2:15, Juneyoung Lee wrote:</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #3983C4; color:#3983C4; margin:0 0 5px; padding-left:5px"><p dir="auto">Hi,<br>

Sorry for my late reply, and thank you for sharing great summaries & ideas.<br>

I'll leave my thoughts below.<br>

<br>

On Wed, Jun 16, 2021 at 8:56 AM John McCall <rjmccall@apple.com> wrote:</p>

<blockquote style="border-left:2px solid #3983C4; color:#7CBF0C; margin:0 0 5px; padding-left:5px; border-left-color:#7CBF0C"><p dir="auto">Now, that rule as I’ve stated it would be really bad. Allowing a<br>

lucky guess to resolve to absolutely anything would almost<br>

completely block the optimizer from optimizing memory. For example,<br>

if a local variable came into scope, and then we called a function<br>

that returned a different pointer, we’d have to conservatively<br>

assume that that pointer might alias the local, even if the address<br>

of the local was never even taken, much less escaped:<br>

<br>

  int x = 0;<br>

  int *p = guess_address_of_x();<br>

  *p = 15;<br>

  printf(“%d\n”, x); // provably 0?<br>

<br>

So the currently favored proposal adds a really important caveat:<br>

this blessing of provenance only works when a pointer with the<br>

correct provenance has been “exposed”. There are several ways to<br>

expose a pointer, including I/O, but the most important is casting<br>

it to an integer.<br>

</p>

</blockquote><p dir="auto">This is a valid point. If one wants to formally show the correctness of<br>

this kind of memory optimization this problem should be tackled.<br>

I think n2676's 'Allocation-address nondeterminism' (p. 27) paragraph<br>

addresses this issue.<br>

The underlying idea is that the address of an allocated object is assumed<br>

to be non-deterministically chosen, causing any guessed accesses to raise<br>

undefined behavior in at least one execution.</p>

</blockquote></div>

<div style="white-space:normal">


<p dir="auto">Ah, that’s an interesting idea.  I must have missed that section; I’m<br>

afraid I only skimmed N2676 looking for the key points, because it’s<br>

quite a long document.</p>


<p dir="auto">To be clear, under the <code>PVNI-ae</code> family of semantics, that rule would<br>

not be necessary to optimize <code>x</code> in my example because int-to-pointer<br>

casts are not allowed to recreate provenance for <code>x</code> because it has<br>

not been exposed.  That rule does theoretically allow optimization of<br>

some related examples where the address of <code>x</code> <em>has</em> been exposed,<br>

because it lets the compiler try to reason about what happens after<br>

exposure; it’s no longer true that exposure implies guessing are okay.</p>


<p dir="auto">Unfortunately, though, I this non-determinism still doesn’t allow LLVM<br>

to be anywhere near as naive about pointer-to-int casts as it is today.<br>

The rule is intended to allow the compiler to start doing use-analysis<br>

of exposures; let’s assume that this analysis doesn’t see any<br>

un-analyzable uses, since of course it would need to conservatively<br>

treat them as escapes.  But if we can optimize uses of integers as if<br>

they didn’t carry pointer data — say, in a function that takes integer<br>

parameters — and then we can apply those optimized uses to integers<br>

that concretely result from pointer-to-int casts — say, by inlining<br>

that function into one of its callers — can’t we end up with a use<br>

pattern for one or more of those pointer-to-int casts that no longer<br>

reflects the fact that it’s been exposed?  It seems to me that either<br>

(1) we cannot do those optimizations on opaque integers or (2) we<br>

need to record that we did them in a way that, if it turns out that<br>

they were created by a pointer-to-int casts, forces other code to<br>

treat that pointer as opaquely exposed.</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #3983C4; color:#3983C4; margin:0 0 5px; padding-left:5px"><blockquote style="border-left:2px solid #3983C4; color:#7CBF0C; margin:0 0 5px; padding-left:5px; border-left-color:#7CBF0C"><p dir="auto"><snip><br>

<br>

Everything I’ve been talking about so far is a C-level concept:<br>

an int-to-pointer cast is e.g. (float*) myInt, not inttoptr<br>

in LLVM IR. But I think people have an expectation of what these<br>

things mean in LLVM IR, and I haven’t seen it written out explicitly,<br>

so let’s do that now.<br>

<br>

The first assumption here is that int-to-pointer and pointer-to-int<br>

casts in C will translate to inttoptr and ptrtoint operations<br>

in IR. Now, this is already problematic, because those operations<br>

do not currently have the semantics they need to have to make the<br>

proposed optimization model sound. In particular:<br>

<br>

   -<br>

<br>

   ptrtoint does not have side-effects and can be dead-stripped<br>

   when unused, which as discussed above is not okay.<br>

   -<br>

<br>

   ptrtoint on a constant is folded to a constant expression,<br>

   not an instruction, which is therefore no longer anchored in the<br>

   code and does not reliably record that the global may have escaped.<br>

   (Unused constant expressions do not really exist, and we absolutely<br>

   cannot allow them to affect the semantics of the IR.)<br>

<br>

   Of course, this is only significant for globals that don’t already<br>

   have to be treated conservatively because they might have other<br>

   uses. That is, it only really matters for globals with, say,<br>

   internal or private linkage.<br>

   -<br>

<br>

   inttoptr can be reordered with other instructions, which is<br>

   not allowed because different points in a function may have<br>

   different sets of storage with escaped provenance.<br>

   -<br>

<br>

   inttoptr(ptrtoint) can be peepholed; ignoring the dead-stripping<br>

   aspects of removing the inttoptr, this also potentially<br>

   introduces UB because the original inttoptr “launders” the<br>

   provenance of the pointer to the current provenance of the<br>

   storage, whereas the original pointer may have stale provenance.<br>

<br>

All of these concerns are valid.</p>

</blockquote><p dir="auto">(I'm not sure whether this is a good place to introduce this, but) we<br>

actually have semantics for pointer castings tailored to LLVM (link<br>

<<a href="https://sf.snu.ac.kr/publications/llvmtwin.pdf">https://sf.snu.ac.kr/publications/llvmtwin.pdf</a>>).<br>

In this proposal, ptrtoint does not have an escaping side effect; ptrtoint<br>

and inttoptr are scalar operations.<br>

inttoptr simply returns a pointer which can access any object.</p>

</blockquote></div>

<div style="white-space:normal">


<p dir="auto">Skimming your paper, I can see how this works <em>except</em> that I don’t<br>

see any way not to treat <code>ptrtoint</code> as an escape.  And really I think<br>

you’re already partially acknowledging that, because that’s the only<br>

real sense of saying that <code>inttoptr(ptrtoint p)</code> can’t be reduced to<br>

<code>p</code>.  If those are really just scalar operations that don’t expose<br>

<code>p</code> in ways that might be disconnected from the uses of the <code>inttoptr</code><br>

then that reduction ought to be safe.</p>


<p dir="auto">John.</p>

</div>

</div>

</body>

</html>