[llvm-dev] The semantics of nonnull attribute

Juneyoung Lee via llvm-dev llvm-dev at lists.llvm.org
Tue Feb 18 18:50:48 PST 2020


Hello all,

I think not_poison (Johannes's used keyword) makes sense. We can simulate
the original UB semantics by simply attaching it, as explained.
For the attributes other than nonnull, we may need more discussion; align
attribute seems to be okay with defining it as poison, dereferenceable may
need UB even without nonnull (because it needs to be non-poison as shown
Nuno's hoisting example).

The reason why I think not_poison is good even if it may cause divergence
of attribute semantics is that it brings a few more opportunities to
optimizations.


(1) We can do optimizations that require certain operands to be never
poison.

This is important for, say, hoisting of division:

void f(int x, int y) {
int x0 = x > 0 ? x : 1; // x0 is never zero *or poison(undef)*
loop() { f(y / x0); }
}
=>
void f(int x, int y) { ; x shouldn't be undef/poison
int x0 = x > 0 ? x : 1;
int tmp = y / x0; // this may raise UB
loop() { f(tmp); }
}

This optimization is incorrect if x is undef or poison. If x has
non_poison, this becomes valid.

I guess in many cases Rust/Swift functions can be lowered into IR functions
with not_poison flags attached (though I'm not an Rust/Swift expert, so
just a guess)
For C, people may want to allow idioms like below, so further investigation
is needed:

int x, y;
if (cond) x = 1; else y = 1;
f(cond, x, y);


(2) Hoisting of function call becomes easier and semantically clearer

https://godbolt.org/z/ThYt29
f() call is hoisted out of the loop thanks to `speculatable` attribute, but
this is problematic if nonnull was defined to raise UB.
We should have syntactically prohibited having nonnull and speculatable in
a single function declaration, but this may block optimization
opportunities.
By adding not_poison and making nonnull to have poison semantics, this
problem is resolved.


Thanks,
Juneyoung Lee


On Wed, Feb 19, 2020 at 6:37 AM Eli Friedman <efriedma at quicinc.com> wrote:

> I think calling the attribute "used" is confusing.  I'd suggest the
> following:
>
> "not_poison": If an argument is marked not_poison, and the argument is
> poison at runtime, the call is instant UB.  Whether an argument is poison
> is checked after the rules for other attributes like "nonnull" and "align"
> are applied.
>
> This makes it clear that the IR semantics don't depend on the
> implementation of the called function.  And it's the logical extension of
> the way we define the relationship between UB and poison for instructions.
>
> -Eli
>
> > -----Original Message-----
> > From: Doerfert, Johannes <jdoerfert at anl.gov>
> > Sent: Tuesday, February 18, 2020 12:14 PM
> > To: Nuno Lopes <nuno.lopes at ist.utl.pt>
> > Cc: 'Juneyoung Lee' <juneyoung.lee at sf.snu.ac.kr>; 'llvm-dev' <llvm-
> > dev at lists.llvm.org>; 'Roman Lebedev' <lebedev.ri at gmail.com>; llvm-
> > dev at redking.me.uk; Eli Friedman <efriedma at quicinc.com>; 'Philip Reames'
> > <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at anl.gov>;
> 'Chung-Kil Hur'
> > <gil.hur at sf.snu.ac.kr>; 'John Regehr' <regehr at cs.utah.edu>
> > Subject: [EXT] Re: [llvm-dev] The semantics of nonnull attribute
> >
> > I think we are actually in agreement. "Used" should be removable, it
> would
> > potentially define UB but that is valid and happening anyway everywhere.
> > Maybe my initial mail might have been confusing but I tried to show that
> with
> > this snippet:
> >
> > >   void @foo(nonnull);
> > >   void @bar(nonnull used);
> > >
> > > > foo(NULL)  // <- argument is poison inside of @foo
> > > > bar(NULL)  // <- UB at the call site of @bar
> >
> > If we remove `used`, thus go from @bar to @foo, we get a poison argument
> > instead of UB, which is fine.
> >
> > WDYT?
> >
> >
> > > Even if the function is just a load of the pointer argument.
> >
> > The Attributor does infer `nonnull` and `dereferenceable` in this case
> already so
> > `used` would follow the same logic (+ look for uses in branches etc)
> >
> >
> > FWIW, frontends that want `nonnull`, `align`, `dereferenceable`, etc. to
> keep the
> > UB behavior can just add `used` as well.
> >
> > ________________________________________
> > From: Nuno Lopes <nuno.lopes at ist.utl.pt>
> > Sent: Tuesday, February 18, 2020 12:29
> > To: Doerfert, Johannes
> > Cc: 'Juneyoung Lee'; 'llvm-dev'; 'Roman Lebedev'; llvm-dev at redking.me.uk;
> 'Eli
> > Friedman'; 'Philip Reames'; Finkel, Hal J.; 'Chung-Kil Hur'; 'John
> Regehr'
> > Subject: RE: [llvm-dev] The semantics of nonnull attribute
> >
> > Hi Johannes,
> >
> > >> Not sure the semantics of "used" you propose is sufficient. AFAIU the
> > >> proposal, "used" could only be used in cases where the function will
> > >> always trigger UB if poison is passed as argument.  The semantics of
> > >> attributes is usually the other way around, since function calls need
> > >> to have UB as strong as the worst behavior of the function. If a
> > >> function may for some reason trigger UB with a given set of arguments,
> > >> then the function call has to trigger UB as well.
> > >
> > >"Used", as other attributes are "best effort". We can drop them any
> time, so
> > having "insufficient information" is not a correctness problem but an
> > optimization problem. I mean, even if the function always causes UB if a
> `null`
> > argument is given we do not need to know/annotate/exploit that. In fact,
> we
> > can, and do, throw away that information, e.g., by removing the
> instruction
> > known to cause UB which effectively defines some semantic for the UB
> program
> > (which is then "w/o UB"). Long story short, I don't think "as strong as
> the worst
> > behavior" is plausible (nor possible TBH).
> >
> > I agree with the motivation, but not with the exact semantics. The
> reason is that
> > it seems that removing "used" wouldn't be possible. This is because a
> function
> > with "used" gives poison, and without gives UB. It should the other way
> around,
> > such that optimizers can freely remove "used".
> > To make it clear, what I understood was:
> >  - %v = call @fn(nonnull null) --  %v == poison
> >  - %v = call @fn(nonnull used null) --  UB
> >
> > Even if the function is just a load of the pointer argument.
> >
> > Nuno
> >
> >
> >
> > > Imagine something like:
> > > Call f(null, 1)
> > >
> > > int f(nonnull %ptr, int c) {
> > >   if (c)
> > >     deref %ptr
> > > }
> > >
> > > You can't use the "used" attribute, since UB is only conditional.
> >
> > Agreed, for the callee (f) you cannot annotate "used". You can place
> "used" on
> > the call site though*. And you can place it on the callee (f) if it was
> internal or
> > you copy it to make it internal. So the optimization potential is still
> there, even if
> > it is a little trickier.
> >
> > * FWIW, one-level of selective context sensitive propagaton is planned
> >   for the Attributor.
> >
> >
> > > we would say the return value of f(null, 1) is poison.  But if you
> > > inline the function you get UB. Inlining has to continue to be correct
> > > without exceptions.
> >
> > The attributes will just never represent the possible UB exhaustively.
> > I mean, inlining is still correct and the UB that it will expose,
> together with SCCP,
> > was there all along. With one-level of context sensitivity we could have
> even
> > known, without we just didn't know but that is fine too.
> >
> > > ...
> > [ moved to the front ]
> >
> >
> > I hope this makes some sense.
> >
> > Cheers,
> >   Johannes
> >
> > P.S.
> >   I think I'll write "used" deduction later so we can take a look at how
> >   this would look like.
> >
> >
> > > Thanks,
> > > Nuno
> > >
> > >
> > > -----Original Message-----
> > > From: Johannes Doerfert <jdoerfert at anl.gov>
> > > Sent: 18 de fevereiro de 2020 02:35
> > > Subject: Re: [llvm-dev] The semantics of nonnull attribute
> > >
> > > Hi Juneyoung,
> > >
> > > thank you for starting this discussion again, the last one [0] did not
> > > really end in a solution but the issue remains and will only become
> > > worse, e.g., enable the Attributor and you might see all of the
> > > problems below exposed in a single pass [1-4].
> > >
> > > As I argued in [0], UB is problematic and I am strongly in favor of
> > > poison. In some sense I see UB as a very "strong" guarantee and poison
> > > as a fairly "week" one. We will have a hard time deriving the strong
> > > one and it might cause us to break things along the way, I mean your
> > > examples show that we already miscompile codes.
> > >
> > > I agree that poison is on its own not "strong enough" for certain
> > > optimization, as you pointed out below, but we could reasonably
> > > augment it with another attribute, let's call it `used`, that
> > > basically bridges the gap:
> > >
> > >   void @foo(nonnull);
> > >   void @bar(nonnull used);
> > >
> > >   foo(NULL)  // <- argument is poison inside of @foo
> > >   bar(NULL)  // <- UB at the call site of @bar
> > >
> > > A value is `used` if it would inevitably cause UB if it was poison at
> > > this program point. I hope this makes some sense if not I can explain
> > > in more detail.
> > >
> > > Btw. I think this also integrates well with other attribute problems
> > > we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in
> > > this example were "not used" (in some naive implementation of memcpy
> > > at least).
> > >
> > > Maybe I have overlooked a problem with this solution so let me know
> > > what you think.
> > >
> > > Cheers,
> > >   Johannes
> > >
> > >
> > > [0] https://reviews.llvm.org/D70749
> > > [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below:
> > > https://godbolt.org/z/TAkSC6 [3] 3. below:
> > > https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4.
> > > below: https://godbolt.org/z/Qkkc_E
> > >               https://godbolt.org/z/uXAfp_
> > >               ...
> > >
> > > On 02/18, Juneyoung Lee via llvm-dev wrote:
> > > > Hello all,
> > > >
> > > > LangRef says it is undefined behavior to pass null to a nonnull
> > > > argument (`call f(nonnull null);`), but the semantics is too strong
> > > > for a few existing optimizations.
> > > > To support these, we can relax the semantics so `f(nonnull null)` is
> > > > equivalent to `f(poison)`, but (A) it again blocks another set of
> > > > optimizations, and (B) this makes the semantics of nonnull deviate
> > > > from other attributes like dereferenceable.
> > > > I found that there was a similar discussion about this issue in the
> > > > past as well, but seems it is not settled yet.
> > > > What should the semantics of nonnull be?
> > > > I listed a few optimizations that are relevant with this issue.
> > > >
> > > >
> > > > 1. Propagating nonnull attribute to callee's arg (
> > > > https://godbolt.org/z/-cVsVP )
> > > >
> > > > g(i8* ptr) {
> > > > f(nonnull ptr);
> > > > }
> > > > =>
> > > > g(i8* nonnull ptr) {
> > > > f(nonnull ptr);
> > > > }
> > > >
> > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull
> > > > null) should have raised UB, so ptr shouldn't be null.
> > > > However, this transformation is incorrect if f(nonnull null) is
> > > > equivalent to f(poison).
> > > > If f was an empty function, f(nonnull null) never raises UB
> > > > regardless of ptr. So we can't guarantee ptr != null at other uses
> of ptr.
> > >
> > > For the given code snipped it is not even invalid to propagate nonnull
> > > in the poison case but your point is not wrong, if we have f(poison)
> > > we cannot derive anything from it if the argument might be unused.
> > >
> > >
> > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> > > >
> > > > %ptr_inb = gep inbounds %any_ptr, 1
> > > > f(%ptr_inb)
> > > > =>
> > > > %ptr_inb = .. (identical)
> > > > f(nonnull %ptr_inb)
> > > >
> > > > This optimization is incorrect if `f(nonnull null)` is UB. The
> > > > reason is as follows.
> > > > If `gep inbounds %any_ptr, 1` yields poison, the source is
> > > > `f(poison)` whereas the optimized one is `f(nonnull poison)`.
> > > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB.
> > > > So, the transformation introduced UB.
> > > > This optimization is correct if both `f(nonnull null)` and
> > > > `f(nonnull poison)` are equivalent to `f(poison)`.
> > > >
> > > >
> > > > 3. https://reviews.llvm.org/D69477
> > > >
> > > > f(nonnull ptr);
> > > > use(ptr);
> > > > =>
> > > > llvm.assume(ptr != null);
> > > > use(ptr);
> > > > f(nonnull ptr);
> > > >
> > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null,
> > > > the added llvm.assume(ptr != null) raises UB whereas the source may
> > > > not raise UB at all. (e.g. assume that f() was an empty function) If
> > > > f(nonnull null) is UB, this is correct.
> > > >
> > > >
> > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
> > > >
> > > > f(nonnull ptr); // f's argument is dead => f(nonnull undef);
> > > >
> > > > This is incorrect if f(nonnull null) is UB. To make this correct,
> > > > nonnull should be dropped. This becomes harder to fix if nonnull was
> > > > attached at the signature of a function (not at the callee site).
> > > > It is correct if f(nonnull null) is f(poison).
> > > >
> > > > Actually the D70749's thread had an end-to-end miscompilation
> > > > example due to the interaction between this DAE and the optimization
> > > > 3 (insertion of llvm.assume).
> > > >
> > > > Thanks,
> > > > Juneyoung Lee
> > >
> >
> > --
> >
> > Johannes Doerfert
> > Researcher
> >
> > Argonne National Laboratory
> > Lemont, IL 60439, USA
> >
> > jdoerfert at anl.gov
>
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200219/89caa217/attachment-0001.html>


More information about the llvm-dev mailing list