[llvm-dev] [RFC] Introducing a byte type to LLVM

Mon Jun 21 01:07:56 PDT 2021

Hi,

>     Now, that rule as I’ve stated it would be really bad. Allowing a
>     lucky guess to resolve to absolutely anything would almost
>     completely block the optimizer from optimizing memory. For example,
>     if a local variable came into scope, and then we called a function
>     that returned a different pointer, we’d have to conservatively
>     assume that that pointer might alias the local, even if the address
>     of the local was never even taken, much less escaped:
> 
>     |int x = 0; int *p = guess_address_of_x(); *p = 15; printf(“%d\n”, x); //
>     provably 0? |
> 
>     So the currently favored proposal adds a really important caveat:
>     this blessing of provenance only works when a pointer with the
>     correct provenance has been “exposed”. There are several ways to
>     expose a pointer, including I/O, but the most important is casting
>     it to an integer.
> 
> This is a valid point. If one wants to formally show the correctness of this 
> kind of memory optimization this problem should be tackled.
> I think n2676's 'Allocation-address nondeterminism' (p. 27) paragraph addresses 
> this issue.
> The underlying idea is that the address of an allocated object is assumed to be 
> non-deterministically chosen, causing any guessed accesses to raise undefined 
> behavior in at least one execution.

I am confused -- that optimization is allowed without any reasoning about 
non-determinism, because the address of x has never been exposed, right?

There are some optimizations that still require reasoning about non-determinism, 
namely cases where the address *has* been exposed. This was recently discussed 
on <https://lists.cam.ac.uk/mailman/listinfo/cl-c-memory-object-model> and at 
least one WG14 member expressed the opinion that in this case, the optimization 
is actually not allowed.
FWIW, I personally prefer a model that always uses non-determinism to justify 
such optimizations, avoiding the need for any kind of "exposed" flag (such as 
the paper Juneyoung referenced -- unsurprisingly so, since I am a coauthor of 
that paper ;).

However, I should also add that the trade-offs in language design are different 
for a surface language such as C and for an optimized IR such as LLVM's.

>     Again, there’s no requirement of a data dependence between the
>     exposure and the int-to-pointer cast. If the program casts a
>     pointer to an integer, and an independently-produced integer
>     that happens to be the same value is later cast to a pointer,
>     and the storage hasn’t been reallocated in the meantime, the
>     resulting pointer will have the right provenance for the memory
>     and will be valid to use. This implies that pointer-to-int casts
>     (and other exposures) are semantically significant events in the
>     program. They don’t have side effects in the normal sense, but
>     they must be treated by the compiler just like things that do have
>     side effects: e.g. unless I’m missing something in the TR,
>     eliminating a completely unused pointer-to-int cast may make
>     later code UB.
> 
>     And in fact, it turns out that this is crucially important for
>     optimization. If the optimizer wants to allow arbitrary
>     replacement of integers based on conditional equality, like
>     in GVN, then replacement totally breaks direct data dependence,
>     and you can easily be left with no remaining uses of a pointer-to-int
>     cast when the original code would have had a data dependence. So
>     you cannot reason about provenance through int-to-pointer casts:
>     the pointer can alias any storage whose provenance has potentially
>     been exposed, and the optimizer must be conservative about optimizing
>     memory that has potentially been exposed.
> 
> +1, due to this reason the casting semantics cannot be directly used for LLVM's 
> ptrtoint/inttoptr.

However, note that GVN integer replacement is a problem even without the 
"exposed" flag, as demonstrated by the long-standing issues 
https://bugs.llvm.org/show_bug.cgi?id=34548 and 
https://bugs.llvm.org/show_bug.cgi?id=35229.

Kind regards,
Ralf