[llvm-dev] The semantics of nonnull attribute

Tue Feb 18 23:26:45 PST 2020

Hello,

>   I don't understand this. Why should the transformed code raise UB when
> the original did not? I assume the loop is executed for sure, if the
> loop is not executed for sure, the situation is different but then I
> don't think the hoisting is "sound". (Btw. `nonnull` should also be
> applicable to integers.)

Sorry, I missed to mention the assumption that the loop may not iterate at
all, leading to not raising undefined behavior.

>  I am unsure if there are opportunities where you need "poison" for
> `nonnull` and "UB" for the rest. With `non_poison` we get, poison for
> all or UB for all. We can think of other combinations but we should
> determine if we actually need them.

I think at least a dereferenceable argument cannot have poison. It should
guarantee that dereferencing it is not UB, so its memory access can be
freely code-motioned, as shown in Nuno's example.
It doesn't mean that all attributes should raise UB however, as discussed
so far;
Or, inversely, I gave a thought about having only a small set of attributes
that are allowed to have poison, such as a new 'nonnull_or_poison'.
This can be gracefully merged with 'not_poison' and be promoted to
'nonnull'.

On Wed, Feb 19, 2020 at 2:20 PM Doerfert, Johannes <
johannesdoerfert at gmail.com> wrote:

> On 02/19, Juneyoung Lee via llvm-dev wrote:
> > I think not_poison (Johannes's used keyword) makes sense. We can simulate
> > the original UB semantics by simply attaching it, as explained.
>
> Agreed.
>
>
> > For the attributes other than nonnull, we may need more discussion; align
> > attribute seems to be okay with defining it as poison, dereferenceable
> may
> > need UB even without nonnull (because it needs to be non-poison as shown
> > Nuno's hoisting example).
>
> I would strongly prefer not to diverge things here.
>
>
> > The reason why I think not_poison is good even if it may cause divergence
> > of attribute semantics is that it brings a few more opportunities to
> > optimizations.
>
> I am unsure if there are opportunities where you need "poison" for
> `nonnull` and "UB" for the rest. With `non_poison` we get, poison for
> all or UB for all. We can think of other combinations but we should
> determine if we actually need them.
>
>
> > (1) We can do optimizations that require certain operands to be never
> > poison.
> >
> > This is important for, say, hoisting of division:
> >
> > void f(int x, int y) {
> > int x0 = x > 0 ? x : 1; // x0 is never zero *or poison(undef)*
> > loop() { f(y / x0); }
> > }
> > =>
> > void f(int x, int y) { ; x shouldn't be undef/poison
> > int x0 = x > 0 ? x : 1;
> > int tmp = y / x0; // this may raise UB
> > loop() { f(tmp); }
> > }
> >
> > This optimization is incorrect if x is undef or poison. If x has
> > non_poison, this becomes valid.
>
> I don't understand this. Why should the transformed code raise UB when
> the original did not? I assume the loop is executed for sure, if the
> loop is not executed for sure, the situation is different but then I
> don't think the hoisting is "sound". (Btw. `nonnull` should also be
> applicable to integers.)
>
> Three cases for a loop that is always entered:
> 1) x = undef,
>     if each use of x allows a different value I pick 1 in the comparison
>     and 0 otherwise, this is UB in either case.
>     if each use (in the same instruction) has to be the same `x0 = max(x,
> 1)`
>     which should be save.
> 2) x = poison,
>     you should be able to propagate `x0 = poison` which is UB in either
>     case.
> 3) x is a "proper value",
>     `x0 = max(x, 1)` which should be save.
>
> Let me know what I am missing here.
>
>
> > I guess in many cases Rust/Swift functions can be lowered into IR
> functions
> > with not_poison flags attached (though I'm not an Rust/Swift expert, so
> > just a guess)
> > For C, people may want to allow idioms like below, so further
> investigation
> > is needed:
> >
> > int x, y;
> > if (cond) x = 1; else y = 1;
> > f(cond, x, y);
>
> I'm unsure what semantics you want here. Either x or y is uninitialized,
> right? From the C perspective that is totally fine so far (AFAIK).
>
>
> > (2) Hoisting of function call becomes easier and semantically clearer
> >
> > https://godbolt.org/z/ThYt29
> > f() call is hoisted out of the loop thanks to `speculatable` attribute,
> but
> > this is problematic if nonnull was defined to raise UB.
> > We should have syntactically prohibited having nonnull and speculatable
> in
> > a single function declaration, but this may block optimization
> > opportunities.
> > By adding not_poison and making nonnull to have poison semantics, this
> > problem is resolved.
>
> I don't see why nonnull (either with UB or poison semantics) is
> necessary incompatible with speculateable (which has UB semantics) but I
> agree that the transformation we do here become actually valid only with
> poison for nonnull.
>
> Cheers,
>   Johannes
>
>
> > Thanks,
> > Juneyoung Lee
> >
> >
> > On Wed, Feb 19, 2020 at 6:37 AM Eli Friedman <efriedma at quicinc.com>
> wrote:
> >
> > > I think calling the attribute "used" is confusing.  I'd suggest the
> > > following:
> > >
> > > "not_poison": If an argument is marked not_poison, and the argument is
> > > poison at runtime, the call is instant UB.  Whether an argument is
> poison
> > > is checked after the rules for other attributes like "nonnull" and
> "align"
> > > are applied.
> > >
> > > This makes it clear that the IR semantics don't depend on the
> > > implementation of the called function.  And it's the logical extension
> of
> > > the way we define the relationship between UB and poison for
> instructions.
> > >
> > > -Eli
> > >
> > > > -----Original Message-----
> > > > From: Doerfert, Johannes <jdoerfert at anl.gov>
> > > > Sent: Tuesday, February 18, 2020 12:14 PM
> > > > To: Nuno Lopes <nuno.lopes at ist.utl.pt>
> > > > Cc: 'Juneyoung Lee' <juneyoung.lee at sf.snu.ac.kr>; 'llvm-dev' <llvm-
> > > > dev at lists.llvm.org>; 'Roman Lebedev' <lebedev.ri at gmail.com>; llvm-
> > > > dev at redking.me.uk; Eli Friedman <efriedma at quicinc.com>; 'Philip
> Reames'
> > > > <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at anl.gov>;
> > > 'Chung-Kil Hur'
> > > > <gil.hur at sf.snu.ac.kr>; 'John Regehr' <regehr at cs.utah.edu>
> > > > Subject: [EXT] Re: [llvm-dev] The semantics of nonnull attribute
> > > >
> > > > I think we are actually in agreement. "Used" should be removable, it
> > > would
> > > > potentially define UB but that is valid and happening anyway
> everywhere.
> > > > Maybe my initial mail might have been confusing but I tried to show
> that
> > > with
> > > > this snippet:
> > > >
> > > > >   void @foo(nonnull);
> > > > >   void @bar(nonnull used);
> > > > >
> > > > > > foo(NULL)  // <- argument is poison inside of @foo
> > > > > > bar(NULL)  // <- UB at the call site of @bar
> > > >
> > > > If we remove `used`, thus go from @bar to @foo, we get a poison
> argument
> > > > instead of UB, which is fine.
> > > >
> > > > WDYT?
> > > >
> > > >
> > > > > Even if the function is just a load of the pointer argument.
> > > >
> > > > The Attributor does infer `nonnull` and `dereferenceable` in this
> case
> > > already so
> > > > `used` would follow the same logic (+ look for uses in branches etc)
> > > >
> > > >
> > > > FWIW, frontends that want `nonnull`, `align`, `dereferenceable`,
> etc. to
> > > keep the
> > > > UB behavior can just add `used` as well.
> > > >
> > > > ________________________________________
> > > > From: Nuno Lopes <nuno.lopes at ist.utl.pt>
> > > > Sent: Tuesday, February 18, 2020 12:29
> > > > To: Doerfert, Johannes
> > > > Cc: 'Juneyoung Lee'; 'llvm-dev'; 'Roman Lebedev';
> llvm-dev at redking.me.uk;
> > > 'Eli
> > > > Friedman'; 'Philip Reames'; Finkel, Hal J.; 'Chung-Kil Hur'; 'John
> > > Regehr'
> > > > Subject: RE: [llvm-dev] The semantics of nonnull attribute
> > > >
> > > > Hi Johannes,
> > > >
> > > > >> Not sure the semantics of "used" you propose is sufficient. AFAIU
> the
> > > > >> proposal, "used" could only be used in cases where the function
> will
> > > > >> always trigger UB if poison is passed as argument.  The semantics
> of
> > > > >> attributes is usually the other way around, since function calls
> need
> > > > >> to have UB as strong as the worst behavior of the function. If a
> > > > >> function may for some reason trigger UB with a given set of
> arguments,
> > > > >> then the function call has to trigger UB as well.
> > > > >
> > > > >"Used", as other attributes are "best effort". We can drop them any
> > > time, so
> > > > having "insufficient information" is not a correctness problem but an
> > > > optimization problem. I mean, even if the function always causes UB
> if a
> > > `null`
> > > > argument is given we do not need to know/annotate/exploit that. In
> fact,
> > > we
> > > > can, and do, throw away that information, e.g., by removing the
> > > instruction
> > > > known to cause UB which effectively defines some semantic for the UB
> > > program
> > > > (which is then "w/o UB"). Long story short, I don't think "as strong
> as
> > > the worst
> > > > behavior" is plausible (nor possible TBH).
> > > >
> > > > I agree with the motivation, but not with the exact semantics. The
> > > reason is that
> > > > it seems that removing "used" wouldn't be possible. This is because a
> > > function
> > > > with "used" gives poison, and without gives UB. It should the other
> way
> > > around,
> > > > such that optimizers can freely remove "used".
> > > > To make it clear, what I understood was:
> > > >  - %v = call @fn(nonnull null) --  %v == poison
> > > >  - %v = call @fn(nonnull used null) --  UB
> > > >
> > > > Even if the function is just a load of the pointer argument.
> > > >
> > > > Nuno
> > > >
> > > >
> > > >
> > > > > Imagine something like:
> > > > > Call f(null, 1)
> > > > >
> > > > > int f(nonnull %ptr, int c) {
> > > > >   if (c)
> > > > >     deref %ptr
> > > > > }
> > > > >
> > > > > You can't use the "used" attribute, since UB is only conditional.
> > > >
> > > > Agreed, for the callee (f) you cannot annotate "used". You can place
> > > "used" on
> > > > the call site though*. And you can place it on the callee (f) if it
> was
> > > internal or
> > > > you copy it to make it internal. So the optimization potential is
> still
> > > there, even if
> > > > it is a little trickier.
> > > >
> > > > * FWIW, one-level of selective context sensitive propagaton is
> planned
> > > >   for the Attributor.
> > > >
> > > >
> > > > > we would say the return value of f(null, 1) is poison.  But if you
> > > > > inline the function you get UB. Inlining has to continue to be
> correct
> > > > > without exceptions.
> > > >
> > > > The attributes will just never represent the possible UB
> exhaustively.
> > > > I mean, inlining is still correct and the UB that it will expose,
> > > together with SCCP,
> > > > was there all along. With one-level of context sensitivity we could
> have
> > > even
> > > > known, without we just didn't know but that is fine too.
> > > >
> > > > > ...
> > > > [ moved to the front ]
> > > >
> > > >
> > > > I hope this makes some sense.
> > > >
> > > > Cheers,
> > > >   Johannes
> > > >
> > > > P.S.
> > > >   I think I'll write "used" deduction later so we can take a look at
> how
> > > >   this would look like.
> > > >
> > > >
> > > > > Thanks,
> > > > > Nuno
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Johannes Doerfert <jdoerfert at anl.gov>
> > > > > Sent: 18 de fevereiro de 2020 02:35
> > > > > Subject: Re: [llvm-dev] The semantics of nonnull attribute
> > > > >
> > > > > Hi Juneyoung,
> > > > >
> > > > > thank you for starting this discussion again, the last one [0] did
> not
> > > > > really end in a solution but the issue remains and will only become
> > > > > worse, e.g., enable the Attributor and you might see all of the
> > > > > problems below exposed in a single pass [1-4].
> > > > >
> > > > > As I argued in [0], UB is problematic and I am strongly in favor of
> > > > > poison. In some sense I see UB as a very "strong" guarantee and
> poison
> > > > > as a fairly "week" one. We will have a hard time deriving the
> strong
> > > > > one and it might cause us to break things along the way, I mean
> your
> > > > > examples show that we already miscompile codes.
> > > > >
> > > > > I agree that poison is on its own not "strong enough" for certain
> > > > > optimization, as you pointed out below, but we could reasonably
> > > > > augment it with another attribute, let's call it `used`, that
> > > > > basically bridges the gap:
> > > > >
> > > > >   void @foo(nonnull);
> > > > >   void @bar(nonnull used);
> > > > >
> > > > >   foo(NULL)  // <- argument is poison inside of @foo
> > > > >   bar(NULL)  // <- UB at the call site of @bar
> > > > >
> > > > > A value is `used` if it would inevitably cause UB if it was poison
> at
> > > > > this program point. I hope this makes some sense if not I can
> explain
> > > > > in more detail.
> > > > >
> > > > > Btw. I think this also integrates well with other attribute
> problems
> > > > > we had in the past, e.g., memcpy(NULL, NULL, 0), since the
> pointers in
> > > > > this example were "not used" (in some naive implementation of
> memcpy
> > > > > at least).
> > > > >
> > > > > Maybe I have overlooked a problem with this solution so let me know
> > > > > what you think.
> > > > >
> > > > > Cheers,
> > > > >   Johannes
> > > > >
> > > > >
> > > > > [0] https://reviews.llvm.org/D70749
> > > > > [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below:
> > > > > https://godbolt.org/z/TAkSC6 [3] 3. below:
> > > > > https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4]
> 4.
> > > > > below: https://godbolt.org/z/Qkkc_E
> > > > >               https://godbolt.org/z/uXAfp_
> > > > >               ...
> > > > >
> > > > > On 02/18, Juneyoung Lee via llvm-dev wrote:
> > > > > > Hello all,
> > > > > >
> > > > > > LangRef says it is undefined behavior to pass null to a nonnull
> > > > > > argument (`call f(nonnull null);`), but the semantics is too
> strong
> > > > > > for a few existing optimizations.
> > > > > > To support these, we can relax the semantics so `f(nonnull
> null)` is
> > > > > > equivalent to `f(poison)`, but (A) it again blocks another set of
> > > > > > optimizations, and (B) this makes the semantics of nonnull
> deviate
> > > > > > from other attributes like dereferenceable.
> > > > > > I found that there was a similar discussion about this issue in
> the
> > > > > > past as well, but seems it is not settled yet.
> > > > > > What should the semantics of nonnull be?
> > > > > > I listed a few optimizations that are relevant with this issue.
> > > > > >
> > > > > >
> > > > > > 1. Propagating nonnull attribute to callee's arg (
> > > > > > https://godbolt.org/z/-cVsVP )
> > > > > >
> > > > > > g(i8* ptr) {
> > > > > > f(nonnull ptr);
> > > > > > }
> > > > > > =>
> > > > > > g(i8* nonnull ptr) {
> > > > > > f(nonnull ptr);
> > > > > > }
> > > > > >
> > > > > > This is correct if f(nonnull null) is UB. If ptr == null,
> f(nonnull
> > > > > > null) should have raised UB, so ptr shouldn't be null.
> > > > > > However, this transformation is incorrect if f(nonnull null) is
> > > > > > equivalent to f(poison).
> > > > > > If f was an empty function, f(nonnull null) never raises UB
> > > > > > regardless of ptr. So we can't guarantee ptr != null at other
> uses
> > > of ptr.
> > > > >
> > > > > For the given code snipped it is not even invalid to propagate
> nonnull
> > > > > in the poison case but your point is not wrong, if we have
> f(poison)
> > > > > we cannot derive anything from it if the argument might be unused.
> > > > >
> > > > >
> > > > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> > > > > >
> > > > > > %ptr_inb = gep inbounds %any_ptr, 1
> > > > > > f(%ptr_inb)
> > > > > > =>
> > > > > > %ptr_inb = .. (identical)
> > > > > > f(nonnull %ptr_inb)
> > > > > >
> > > > > > This optimization is incorrect if `f(nonnull null)` is UB. The
> > > > > > reason is as follows.
> > > > > > If `gep inbounds %any_ptr, 1` yields poison, the source is
> > > > > > `f(poison)` whereas the optimized one is `f(nonnull poison)`.
> > > > > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB.
> > > > > > So, the transformation introduced UB.
> > > > > > This optimization is correct if both `f(nonnull null)` and
> > > > > > `f(nonnull poison)` are equivalent to `f(poison)`.
> > > > > >
> > > > > >
> > > > > > 3. https://reviews.llvm.org/D69477
> > > > > >
> > > > > > f(nonnull ptr);
> > > > > > use(ptr);
> > > > > > =>
> > > > > > llvm.assume(ptr != null);
> > > > > > use(ptr);
> > > > > > f(nonnull ptr);
> > > > > >
> > > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was
> null,
> > > > > > the added llvm.assume(ptr != null) raises UB whereas the source
> may
> > > > > > not raise UB at all. (e.g. assume that f() was an empty
> function) If
> > > > > > f(nonnull null) is UB, this is correct.
> > > > > >
> > > > > >
> > > > > > 4. Dead argument elimination (from
> https://reviews.llvm.org/D70749 )
> > > > > >
> > > > > > f(nonnull ptr); // f's argument is dead => f(nonnull undef);
> > > > > >
> > > > > > This is incorrect if f(nonnull null) is UB. To make this correct,
> > > > > > nonnull should be dropped. This becomes harder to fix if nonnull
> was
> > > > > > attached at the signature of a function (not at the callee site).
> > > > > > It is correct if f(nonnull null) is f(poison).
> > > > > >
> > > > > > Actually the D70749's thread had an end-to-end miscompilation
> > > > > > example due to the interaction between this DAE and the
> optimization
> > > > > > 3 (insertion of llvm.assume).
> > > > > >
> > > > > > Thanks,
> > > > > > Juneyoung Lee
> > > > >
> > > >
> > > > --
> > > >
> > > > Johannes Doerfert
> > > > Researcher
> > > >
> > > > Argonne National Laboratory
> > > > Lemont, IL 60439, USA
> > > >
> > > > jdoerfert at anl.gov
> > >
> > >
> >
> > --
> >
> > Juneyoung Lee
> > Software Foundation Lab, Seoul National University
>
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
>
> Johannes Doerfert
> Researcher
>
> Argonne National Laboratory
> Lemont, IL 60439, USA
>
> jdoerfert at anl.gov
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200219/c9298b33/attachment.html>