[llvm-dev] [RFC] Introducing a byte type to LLVM

Tue Jun 22 09:02:44 PDT 2021

On Tue, Jun 22, 2021 at 4:08 AM John McCall <rjmccall at apple.com> wrote:

> On 21 Jun 2021, at 2:15, Juneyoung Lee wrote:
>
> Hi,
> Sorry for my late reply, and thank you for sharing great summaries & ideas.
> I'll leave my thoughts below.
>
> On Wed, Jun 16, 2021 at 8:56 AM John McCall <rjmccall at apple.com> wrote:
>
> Now, that rule as I’ve stated it would be really bad. Allowing a
> lucky guess to resolve to absolutely anything would almost
> completely block the optimizer from optimizing memory. For example,
> if a local variable came into scope, and then we called a function
> that returned a different pointer, we’d have to conservatively
> assume that that pointer might alias the local, even if the address
> of the local was never even taken, much less escaped:
>
> int x = 0;
> int *p = guess_address_of_x();
> *p = 15;
> printf(“%d\n”, x); // provably 0?
>
> So the currently favored proposal adds a really important caveat:
> this blessing of provenance only works when a pointer with the
> correct provenance has been “exposed”. There are several ways to
> expose a pointer, including I/O, but the most important is casting
> it to an integer.
>
> This is a valid point. If one wants to formally show the correctness of
> this kind of memory optimization this problem should be tackled.
> I think n2676's 'Allocation-address nondeterminism' (p. 27) paragraph
> addresses this issue.
> The underlying idea is that the address of an allocated object is assumed
> to be non-deterministically chosen, causing any guessed accesses to raise
> undefined behavior in at least one execution.
>
> Ah, that’s an interesting idea. I must have missed that section; I’m
> afraid I only skimmed N2676 looking for the key points, because it’s
> quite a long document.
>
> To be clear, under the PVNI-ae family of semantics, that rule would
> not be necessary to optimize x in my example because int-to-pointer
> casts are not allowed to recreate provenance for x because it has
> not been exposed. That rule does theoretically allow optimization of
> some related examples where the address of x *has* been exposed,
> because it lets the compiler try to reason about what happens after
> exposure; it’s no longer true that exposure implies guessing are okay.
>
> Unfortunately, though, I this non-determinism still doesn’t allow LLVM
> to be anywhere near as naive about pointer-to-int casts as it is today.
> The rule is intended to allow the compiler to start doing use-analysis
> of exposures; let’s assume that this analysis doesn’t see any
> un-analyzable uses, since of course it would need to conservatively
> treat them as escapes. But if we can optimize uses of integers as if
> they didn’t carry pointer data — say, in a function that takes integer
> parameters — and then we can apply those optimized uses to integers
> that concretely result from pointer-to-int casts — say, by inlining
> that function into one of its callers — can’t we end up with a use
> pattern for one or more of those pointer-to-int casts that no longer
> reflects the fact that it’s been exposed? It seems to me that either
> (1) we cannot do those optimizations on opaque integers or (2) we
> need to record that we did them in a way that, if it turns out that
> they were created by a pointer-to-int casts, forces other code to
> treat that pointer as opaquely exposed.
>
IIUC the corresponding example would be:
int p;
intptr_t i = (intptr_t)&p;
f(i);
// Initially f(i) exposes p by storing i into somewhere, but optimization
changed f to not store i anymore (e.g. i was replaced with a constant)

In this case, p was already exposed by '(intptr_t)p' I guess; the body of
f(i) won't affect the exposedness of p.

> <snip>
>
> Everything I’ve been talking about so far is a C-level concept:
> an int-to-pointer cast is e.g. (float*) myInt, not inttoptr
> in LLVM IR. But I think people have an expectation of what these
> things mean in LLVM IR, and I haven’t seen it written out explicitly,
> so let’s do that now.
>
> The first assumption here is that int-to-pointer and pointer-to-int
> casts in C will translate to inttoptr and ptrtoint operations
> in IR. Now, this is already problematic, because those operations
> do not currently have the semantics they need to have to make the
> proposed optimization model sound. In particular:
>
> -
>
> ptrtoint does not have side-effects and can be dead-stripped
> when unused, which as discussed above is not okay.
> -
>
> ptrtoint on a constant is folded to a constant expression,
> not an instruction, which is therefore no longer anchored in the
> code and does not reliably record that the global may have escaped.
> (Unused constant expressions do not really exist, and we absolutely
> cannot allow them to affect the semantics of the IR.)
>
> Of course, this is only significant for globals that don’t already
> have to be treated conservatively because they might have other
> uses. That is, it only really matters for globals with, say,
> internal or private linkage.
> -
>
> inttoptr can be reordered with other instructions, which is
> not allowed because different points in a function may have
> different sets of storage with escaped provenance.
> -
>
> inttoptr(ptrtoint) can be peepholed; ignoring the dead-stripping
> aspects of removing the inttoptr, this also potentially
> introduces UB because the original inttoptr “launders” the
> provenance of the pointer to the current provenance of the
> storage, whereas the original pointer may have stale provenance.
>
> All of these concerns are valid.
>
> (I'm not sure whether this is a good place to introduce this, but) we
> actually have semantics for pointer castings tailored to LLVM (link
> <https://sf.snu.ac.kr/publications/llvmtwin.pdf>).
> In this proposal, ptrtoint does not have an escaping side effect; ptrtoint
> and inttoptr are scalar operations.
> inttoptr simply returns a pointer which can access any object.
>
> Skimming your paper, I can see how this works *except* that I don’t
> see any way not to treat ptrtoint as an escape. And really I think
> you’re already partially acknowledging that, because that’s the only
> real sense of saying that inttoptr(ptrtoint p) can’t be reduced to
> p. If those are really just scalar operations that don’t expose
> p in ways that might be disconnected from the uses of the inttoptr
> then that reduction ought to be safe.
>
> John.
>
It again relies on nondeterminism of the address of an object.

For example:
p = call malloc(4) ; p's address is assumed to be nondeterministically
chosen
i = ptrtoint p
; i is never used
ret void

Since i is never used, this program cannot safely access p via address
guessing + inttoptr.
This allows (more precise) escape analysis to conclude that malloc with one
unused ptrtoint is never escaped.

Juneyoung
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210623/b19a6051/attachment.html>