[llvm-dev] [RFC] Introducing a byte type to LLVM

Tue Jun 22 10:10:39 PDT 2021

On 22 Jun 2021, at 12:02, Juneyoung Lee wrote:
> On Tue, Jun 22, 2021 at 4:08 AM John McCall <rjmccall at apple.com> 
> wrote:
>> On 21 Jun 2021, at 2:15, Juneyoung Lee wrote:
>>
>> Hi,
>> Sorry for my late reply, and thank you for sharing great summaries & 
>> ideas.
>> I'll leave my thoughts below.
>>
>> On Wed, Jun 16, 2021 at 8:56 AM John McCall <rjmccall at apple.com> 
>> wrote:
>>
>> Now, that rule as I’ve stated it would be really bad. Allowing a
>> lucky guess to resolve to absolutely anything would almost
>> completely block the optimizer from optimizing memory. For example,
>> if a local variable came into scope, and then we called a function
>> that returned a different pointer, we’d have to conservatively
>> assume that that pointer might alias the local, even if the address
>> of the local was never even taken, much less escaped:
>>
>> int x = 0;
>> int *p = guess_address_of_x();
>> *p = 15;
>> printf(“%d\n”, x); // provably 0?
>>
>> So the currently favored proposal adds a really important caveat:
>> this blessing of provenance only works when a pointer with the
>> correct provenance has been “exposed”. There are several ways to
>> expose a pointer, including I/O, but the most important is casting
>> it to an integer.
>>
>> This is a valid point. If one wants to formally show the correctness 
>> of
>> this kind of memory optimization this problem should be tackled.
>> I think n2676's 'Allocation-address nondeterminism' (p. 27) paragraph
>> addresses this issue.
>> The underlying idea is that the address of an allocated object is 
>> assumed
>> to be non-deterministically chosen, causing any guessed accesses to 
>> raise
>> undefined behavior in at least one execution.
>>
>> Ah, that’s an interesting idea. I must have missed that section; 
>> I’m
>> afraid I only skimmed N2676 looking for the key points, because 
>> it’s
>> quite a long document.
>>
>> To be clear, under the PVNI-ae family of semantics, that rule would
>> not be necessary to optimize x in my example because int-to-pointer
>> casts are not allowed to recreate provenance for x because it has
>> not been exposed. That rule does theoretically allow optimization of
>> some related examples where the address of x *has* been exposed,
>> because it lets the compiler try to reason about what happens after
>> exposure; it’s no longer true that exposure implies guessing are 
>> okay.
>>
>> Unfortunately, though, I this non-determinism still doesn’t allow 
>> LLVM
>> to be anywhere near as naive about pointer-to-int casts as it is 
>> today.
>> The rule is intended to allow the compiler to start doing 
>> use-analysis
>> of exposures; let’s assume that this analysis doesn’t see any
>> un-analyzable uses, since of course it would need to conservatively
>> treat them as escapes. But if we can optimize uses of integers as if
>> they didn’t carry pointer data — say, in a function that takes 
>> integer
>> parameters — and then we can apply those optimized uses to integers
>> that concretely result from pointer-to-int casts — say, by inlining
>> that function into one of its callers — can’t we end up with a 
>> use
>> pattern for one or more of those pointer-to-int casts that no longer
>> reflects the fact that it’s been exposed? It seems to me that 
>> either
>> (1) we cannot do those optimizations on opaque integers or (2) we
>> need to record that we did them in a way that, if it turns out that
>> they were created by a pointer-to-int casts, forces other code to
>> treat that pointer as opaquely exposed.
>>
> IIUC the corresponding example would be:
> int p;
> intptr_t i = (intptr_t)&p;
> f(i);
> // Initially f(i) exposes p by storing i into somewhere, but 
> optimization
> changed f to not store i anymore (e.g. i was replaced with a constant)
>
> In this case, p was already exposed by '(intptr_t)p' I guess; the body 
> of
> f(i) won't affect the exposedness of p.

Right, formally I think that’s true.

>> All of these concerns are valid.
>>
>> (I'm not sure whether this is a good place to introduce this, but) we
>> actually have semantics for pointer castings tailored to LLVM (link
>> <https://sf.snu.ac.kr/publications/llvmtwin.pdf>).
>> In this proposal, ptrtoint does not have an escaping side effect; 
>> ptrtoint
>> and inttoptr are scalar operations.
>> inttoptr simply returns a pointer which can access any object.
>>
>> Skimming your paper, I can see how this works *except* that I don’t
>> see any way not to treat ptrtoint as an escape. And really I think
>> you’re already partially acknowledging that, because that’s the 
>> only
>> real sense of saying that inttoptr(ptrtoint p) can’t be reduced to
>> p. If those are really just scalar operations that don’t expose
>> p in ways that might be disconnected from the uses of the inttoptr
>> then that reduction ought to be safe.
>>
>> John.
>>
> It again relies on nondeterminism of the address of an object.
>
> For example:
> p = call malloc(4) ; p's address is assumed to be nondeterministically
> chosen
> i = ptrtoint p
> ; i is never used
> ret void
>
> Since i is never used, this program cannot safely access p via address
> guessing + inttoptr.
> This allows (more precise) escape analysis to conclude that malloc 
> with one
> unused ptrtoint is never escaped.

I think this is too local of an analysis.

Your proposal generally relies on certain optimizations not applying
to pointers because they mess up provenance as represented in
transitive use-dependencies.  If those optimizations can be applied
to integers, you can lose use-dependencies in exactly the same way as
you can with pointers.  Not doing the `inttoptr(ptrtoint p)) -> p`
reduction doesn’t matter at all, because in either case that value
has a use-dependency on `p`, whereas the actual problem is that there
is no longer a use-dependency on some other value.

For example, you have compellingly argued that it’s problematic to
do the reduction `a == b ? a : b` to `b` for pointer types.  Suppose
I instead do this optimization on integers where `a = ptrtoint A`.
The result is now simply `b`.  If I `inttoptr` that result and access
the memory, there will be no record that that access may validly
be to `A`.  It does not help that the access may be represented
as `inttoptr (ptrtoint B)` for some `B` rather than just directly
to `B`, because there is no use-dependence on `A`.  All there is
is an apparently unrelated and unused `ptrtoint A`.

Obviously we can avoid doing this optimization locally when we
see that the inputs result from `ptrtoint`, but that’s no more
than best-effort: we can do this optimization in a function which
we later inline in a caller that performs all the `ptrtoint` and
`inttoptr` casts.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210622/38a7ce16/attachment.html>