[llvm-dev] [RFC] Introducing a byte type to LLVM
Ralf Jung via llvm-dev
llvm-dev at lists.llvm.org
Mon Jun 21 01:07:56 PDT 2021
> Now, that rule as I’ve stated it would be really bad. Allowing a
> lucky guess to resolve to absolutely anything would almost
> completely block the optimizer from optimizing memory. For example,
> if a local variable came into scope, and then we called a function
> that returned a different pointer, we’d have to conservatively
> assume that that pointer might alias the local, even if the address
> of the local was never even taken, much less escaped:
> |int x = 0; int *p = guess_address_of_x(); *p = 15; printf(“%d\n”, x); //
> provably 0? |
> So the currently favored proposal adds a really important caveat:
> this blessing of provenance only works when a pointer with the
> correct provenance has been “exposed”. There are several ways to
> expose a pointer, including I/O, but the most important is casting
> it to an integer.
> This is a valid point. If one wants to formally show the correctness of this
> kind of memory optimization this problem should be tackled.
> I think n2676's 'Allocation-address nondeterminism' (p. 27) paragraph addresses
> this issue.
> The underlying idea is that the address of an allocated object is assumed to be
> non-deterministically chosen, causing any guessed accesses to raise undefined
> behavior in at least one execution.
I am confused -- that optimization is allowed without any reasoning about
non-determinism, because the address of x has never been exposed, right?
There are some optimizations that still require reasoning about non-determinism,
namely cases where the address *has* been exposed. This was recently discussed
on <https://lists.cam.ac.uk/mailman/listinfo/cl-c-memory-object-model> and at
least one WG14 member expressed the opinion that in this case, the optimization
is actually not allowed.
FWIW, I personally prefer a model that always uses non-determinism to justify
such optimizations, avoiding the need for any kind of "exposed" flag (such as
the paper Juneyoung referenced -- unsurprisingly so, since I am a coauthor of
that paper ;).
However, I should also add that the trade-offs in language design are different
for a surface language such as C and for an optimized IR such as LLVM's.
> Again, there’s no requirement of a data dependence between the
> exposure and the int-to-pointer cast. If the program casts a
> pointer to an integer, and an independently-produced integer
> that happens to be the same value is later cast to a pointer,
> and the storage hasn’t been reallocated in the meantime, the
> resulting pointer will have the right provenance for the memory
> and will be valid to use. This implies that pointer-to-int casts
> (and other exposures) are semantically significant events in the
> program. They don’t have side effects in the normal sense, but
> they must be treated by the compiler just like things that do have
> side effects: e.g. unless I’m missing something in the TR,
> eliminating a completely unused pointer-to-int cast may make
> later code UB.
> And in fact, it turns out that this is crucially important for
> optimization. If the optimizer wants to allow arbitrary
> replacement of integers based on conditional equality, like
> in GVN, then replacement totally breaks direct data dependence,
> and you can easily be left with no remaining uses of a pointer-to-int
> cast when the original code would have had a data dependence. So
> you cannot reason about provenance through int-to-pointer casts:
> the pointer can alias any storage whose provenance has potentially
> been exposed, and the optimizer must be conservative about optimizing
> memory that has potentially been exposed.
> +1, due to this reason the casting semantics cannot be directly used for LLVM's
However, note that GVN integer replacement is a problem even without the
"exposed" flag, as demonstrated by the long-standing issues
More information about the llvm-dev