[llvm-dev] [RFC] Introducing a byte type to LLVM

Juneyoung Lee via llvm-dev llvm-dev at lists.llvm.org
Wed Jun 23 01:00:57 PDT 2021

About the impact of disabling GVN for pointers - for SPEC CPU and
llvm-test-suite the slowdown wasn't significant (avg. <0.1%),
but I'm concerned that they are only a small number of programs written in
C/C++ on and my experiment was done on only one architecture.
Certainly for programs using many pointers disabling GVN is likely to
be problematic.

If we have an operation that is similar to launder, then the optimization
can be almost salvaged.
Let's call it 'q = wildcard_provenance(p)'; it means that q can be used to
access an object that is at (intptr_t)p but not necessarily p's object.
(In our memory model, wildcard_provenance(p) is equivalent to
'inttoptr(ptrtoint p)').

Replacing p with 'wildcard_provenance(p)' is correct because it makes the
program more defined.
For example, load p raises UB if p is freed, but load
wildcard_provenance(p) is still well-defined if a new live malloc is
exactly placed at (intptr_t)p.
When lowered to MachineIR, wildcard_provenance(p) is simply a register copy.

Here is the list of valid optimizations:

1. GVN
if (p == q) {
if (p == q) {


store v, wildcard_provenance(p);
w = load p;
store v, wildcard_provenance(p);
w = v;

must-alias(p, wildcard_provenance(p))? // answer: true

I don't have a performance number for this yet because the operation did
not exist when designing the memory model.


On Wed, Jun 23, 2021 at 1:07 AM Hal Finkel <hal.finkel.llvm at gmail.com>

> On 6/22/21 05:58, Ralf Jung via llvm-dev wrote:
> > Hi John,
> >
> >> Unfortunately, though, I this non-determinism still doesn’t allow LLVM
> >> to be anywhere near as naive about pointer-to-int casts as it is today.
> >
> > Definitely. There are limits to how naive one can be; beyond those
> > limits, miscompilations lurk.
> > <https://www.ralfj.de/blog/2020/12/14/provenance.html> explains this
> > by showing such a miscompilation arising from three naive
> > optimizations being chained together.
> >
> >> The rule is intended to allow the compiler to start doing use-analysis
> >> of exposures; let’s assume that this analysis doesn’t see any
> >> un-analyzable uses, since of course it would need to conservatively
> >> treat them as escapes. But if we can optimize uses of integers as if
> >> they didn’t carry pointer data — say, in a function that takes integer
> >> parameters — and then we can apply those optimized uses to integers
> >> that concretely result from pointer-to-int casts — say, by inlining
> >> that function into one of its callers — can’t we end up with a use
> >> pattern for one or more of those pointer-to-int casts that no longer
> >> reflects the fact that it’s been exposed? It seems to me that either
> >> (1) we cannot do those optimizations on opaque integers or (2) we
> >> need to record that we did them in a way that, if it turns out that
> >> they were created by a pointer-to-int casts, forces other code to
> >> treat that pointer as opaquely exposed.
> >
> > There is a third option: don't optimize away ptr-int-ptr roundtrips.
> > Then you can still do all the same optimizations on integers that LLVM
> > does today, completely naively -- the integer world remains "sane".
> > Only the pointer world has to be "strange".
> > (You can also not do things like GVN replacement of *pointer-typed*
> > values, but for values of integer types this remains unproblematic.)
> Do we have any idea how large of an effect this might be? If we disable
> GVN for all pointer-typed values? And is it really all GVN, or just
> cases where you unify the equivalence classes based on some dominating
> comparison operation? We should be careful here, perhaps, because LLVM's
> GVN does a lot of plain-old CSE, store-to-load forwarding, etc. and we
> should say specifically what would need to be disabled and in what
> contexts.
>   -Hal
> >
> > I don't think it makes sense for LLVM to adopt an explicit "exposed"
> > flag in its semantics. Reasoning based on non-determinism works fine,
> > and has the advantage of keeping ptr-to-int casts a pure,
> > side-effect-free operation. This is the model we explored in
> > <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>, and we were
> > able to show quite a few of LLVM's standard optimizations correct
> > formally. Some changes are still needed as you noted, but those
> > changes will be required anyway even if LLVM were to adopt PNVI-ae:
> > - No removal of ptr-int-ptr roundtrips.
> > (https://bugs.llvm.org/show_bug.cgi?id=34548)
> > - No GVN replacement of pointer-typed values.
> > (https://bugs.llvm.org/show_bug.cgi?id=35229)
> >
> >>     (I'm not sure whether this is a good place to introduce this,
> >> but) we
> >>     actually have semantics for pointer castings tailored to LLVM (link
> >>     <https://sf.snu.ac.kr/publications/llvmtwin.pdf
> >>     <https://sf.snu.ac.kr/publications/llvmtwin.pdf>>).
> >>     In this proposal, ptrtoint does not have an escaping side effect;
> >> ptrtoint
> >>     and inttoptr are scalar operations.
> >>     inttoptr simply returns a pointer which can access any object.
> >>
> >> Skimming your paper, I can see how this works /except/ that I don’t
> >> see any way not to treat |ptrtoint| as an escape. And really I think
> >> you’re already partially acknowledging that, because that’s the only
> >> real sense of saying that |inttoptr(ptrtoint p)| can’t be reduced to
> >> |p|. If those are really just scalar operations that don’t expose
> >> |p| in ways that might be disconnected from the uses of the |inttoptr|
> >> then that reduction ought to be safe.
> >
> > They are indeed just scalar operations, but the reduction is not safe.
> > The reason is that pointer-typed variables have values of the form
> > "(addr, provenance)". There is essentially an 'invisible' component in
> > each pointer value that tracks some additional information -- the
> > "provenance" of the pointer. Casting a ptr to an int removes that
> > provenance. Casting an int to a ptr picks a "default" provenance. So
> > the overall effect of inttoptr(ptrtoint p) is to turn "(addr,
> > provenance)" into "(addr, DEFAULT_PROVENANCE)".
> > Clearly that is *not* a NOP, and hence performing the reduction
> > actually changes the result of this operation. Before the reduction,
> > the resulting pointer had DEFAULT_PROVENANCE; after the reduction, it
> > maintains the original provenance of "p". This can introduce UB into
> > previously UB-free programs.
> >
> > Kind regards,
> > Ralf
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210623/f8855395/attachment.html>

More information about the llvm-dev mailing list