[llvm-dev] The semantics of nonnull attribute

Doerfert, Johannes via llvm-dev llvm-dev at lists.llvm.org
Tue Feb 18 21:19:35 PST 2020


On 02/19, Juneyoung Lee via llvm-dev wrote:
> I think not_poison (Johannes's used keyword) makes sense. We can simulate
> the original UB semantics by simply attaching it, as explained.

Agreed.


> For the attributes other than nonnull, we may need more discussion; align
> attribute seems to be okay with defining it as poison, dereferenceable may
> need UB even without nonnull (because it needs to be non-poison as shown
> Nuno's hoisting example).

I would strongly prefer not to diverge things here.


> The reason why I think not_poison is good even if it may cause divergence
> of attribute semantics is that it brings a few more opportunities to
> optimizations.

I am unsure if there are opportunities where you need "poison" for
`nonnull` and "UB" for the rest. With `non_poison` we get, poison for
all or UB for all. We can think of other combinations but we should
determine if we actually need them.


> (1) We can do optimizations that require certain operands to be never
> poison.
> 
> This is important for, say, hoisting of division:
> 
> void f(int x, int y) {
> int x0 = x > 0 ? x : 1; // x0 is never zero *or poison(undef)*
> loop() { f(y / x0); }
> }
> =>
> void f(int x, int y) { ; x shouldn't be undef/poison
> int x0 = x > 0 ? x : 1;
> int tmp = y / x0; // this may raise UB
> loop() { f(tmp); }
> }
> 
> This optimization is incorrect if x is undef or poison. If x has
> non_poison, this becomes valid.

I don't understand this. Why should the transformed code raise UB when
the original did not? I assume the loop is executed for sure, if the
loop is not executed for sure, the situation is different but then I
don't think the hoisting is "sound". (Btw. `nonnull` should also be
applicable to integers.)

Three cases for a loop that is always entered:
1) x = undef,
    if each use of x allows a different value I pick 1 in the comparison
    and 0 otherwise, this is UB in either case.
    if each use (in the same instruction) has to be the same `x0 = max(x, 1)`
    which should be save.
2) x = poison,
    you should be able to propagate `x0 = poison` which is UB in either
    case.
3) x is a "proper value",
    `x0 = max(x, 1)` which should be save.

Let me know what I am missing here.


> I guess in many cases Rust/Swift functions can be lowered into IR functions
> with not_poison flags attached (though I'm not an Rust/Swift expert, so
> just a guess)
> For C, people may want to allow idioms like below, so further investigation
> is needed:
> 
> int x, y;
> if (cond) x = 1; else y = 1;
> f(cond, x, y);

I'm unsure what semantics you want here. Either x or y is uninitialized,
right? From the C perspective that is totally fine so far (AFAIK).


> (2) Hoisting of function call becomes easier and semantically clearer
> 
> https://godbolt.org/z/ThYt29
> f() call is hoisted out of the loop thanks to `speculatable` attribute, but
> this is problematic if nonnull was defined to raise UB.
> We should have syntactically prohibited having nonnull and speculatable in
> a single function declaration, but this may block optimization
> opportunities.
> By adding not_poison and making nonnull to have poison semantics, this
> problem is resolved.

I don't see why nonnull (either with UB or poison semantics) is
necessary incompatible with speculateable (which has UB semantics) but I
agree that the transformation we do here become actually valid only with
poison for nonnull.

Cheers,
  Johannes


> Thanks,
> Juneyoung Lee
> 
> 
> On Wed, Feb 19, 2020 at 6:37 AM Eli Friedman <efriedma at quicinc.com> wrote:
> 
> > I think calling the attribute "used" is confusing.  I'd suggest the
> > following:
> >
> > "not_poison": If an argument is marked not_poison, and the argument is
> > poison at runtime, the call is instant UB.  Whether an argument is poison
> > is checked after the rules for other attributes like "nonnull" and "align"
> > are applied.
> >
> > This makes it clear that the IR semantics don't depend on the
> > implementation of the called function.  And it's the logical extension of
> > the way we define the relationship between UB and poison for instructions.
> >
> > -Eli
> >
> > > -----Original Message-----
> > > From: Doerfert, Johannes <jdoerfert at anl.gov>
> > > Sent: Tuesday, February 18, 2020 12:14 PM
> > > To: Nuno Lopes <nuno.lopes at ist.utl.pt>
> > > Cc: 'Juneyoung Lee' <juneyoung.lee at sf.snu.ac.kr>; 'llvm-dev' <llvm-
> > > dev at lists.llvm.org>; 'Roman Lebedev' <lebedev.ri at gmail.com>; llvm-
> > > dev at redking.me.uk; Eli Friedman <efriedma at quicinc.com>; 'Philip Reames'
> > > <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at anl.gov>;
> > 'Chung-Kil Hur'
> > > <gil.hur at sf.snu.ac.kr>; 'John Regehr' <regehr at cs.utah.edu>
> > > Subject: [EXT] Re: [llvm-dev] The semantics of nonnull attribute
> > >
> > > I think we are actually in agreement. "Used" should be removable, it
> > would
> > > potentially define UB but that is valid and happening anyway everywhere.
> > > Maybe my initial mail might have been confusing but I tried to show that
> > with
> > > this snippet:
> > >
> > > >   void @foo(nonnull);
> > > >   void @bar(nonnull used);
> > > >
> > > > > foo(NULL)  // <- argument is poison inside of @foo
> > > > > bar(NULL)  // <- UB at the call site of @bar
> > >
> > > If we remove `used`, thus go from @bar to @foo, we get a poison argument
> > > instead of UB, which is fine.
> > >
> > > WDYT?
> > >
> > >
> > > > Even if the function is just a load of the pointer argument.
> > >
> > > The Attributor does infer `nonnull` and `dereferenceable` in this case
> > already so
> > > `used` would follow the same logic (+ look for uses in branches etc)
> > >
> > >
> > > FWIW, frontends that want `nonnull`, `align`, `dereferenceable`, etc. to
> > keep the
> > > UB behavior can just add `used` as well.
> > >
> > > ________________________________________
> > > From: Nuno Lopes <nuno.lopes at ist.utl.pt>
> > > Sent: Tuesday, February 18, 2020 12:29
> > > To: Doerfert, Johannes
> > > Cc: 'Juneyoung Lee'; 'llvm-dev'; 'Roman Lebedev'; llvm-dev at redking.me.uk;
> > 'Eli
> > > Friedman'; 'Philip Reames'; Finkel, Hal J.; 'Chung-Kil Hur'; 'John
> > Regehr'
> > > Subject: RE: [llvm-dev] The semantics of nonnull attribute
> > >
> > > Hi Johannes,
> > >
> > > >> Not sure the semantics of "used" you propose is sufficient. AFAIU the
> > > >> proposal, "used" could only be used in cases where the function will
> > > >> always trigger UB if poison is passed as argument.  The semantics of
> > > >> attributes is usually the other way around, since function calls need
> > > >> to have UB as strong as the worst behavior of the function. If a
> > > >> function may for some reason trigger UB with a given set of arguments,
> > > >> then the function call has to trigger UB as well.
> > > >
> > > >"Used", as other attributes are "best effort". We can drop them any
> > time, so
> > > having "insufficient information" is not a correctness problem but an
> > > optimization problem. I mean, even if the function always causes UB if a
> > `null`
> > > argument is given we do not need to know/annotate/exploit that. In fact,
> > we
> > > can, and do, throw away that information, e.g., by removing the
> > instruction
> > > known to cause UB which effectively defines some semantic for the UB
> > program
> > > (which is then "w/o UB"). Long story short, I don't think "as strong as
> > the worst
> > > behavior" is plausible (nor possible TBH).
> > >
> > > I agree with the motivation, but not with the exact semantics. The
> > reason is that
> > > it seems that removing "used" wouldn't be possible. This is because a
> > function
> > > with "used" gives poison, and without gives UB. It should the other way
> > around,
> > > such that optimizers can freely remove "used".
> > > To make it clear, what I understood was:
> > >  - %v = call @fn(nonnull null) --  %v == poison
> > >  - %v = call @fn(nonnull used null) --  UB
> > >
> > > Even if the function is just a load of the pointer argument.
> > >
> > > Nuno
> > >
> > >
> > >
> > > > Imagine something like:
> > > > Call f(null, 1)
> > > >
> > > > int f(nonnull %ptr, int c) {
> > > >   if (c)
> > > >     deref %ptr
> > > > }
> > > >
> > > > You can't use the "used" attribute, since UB is only conditional.
> > >
> > > Agreed, for the callee (f) you cannot annotate "used". You can place
> > "used" on
> > > the call site though*. And you can place it on the callee (f) if it was
> > internal or
> > > you copy it to make it internal. So the optimization potential is still
> > there, even if
> > > it is a little trickier.
> > >
> > > * FWIW, one-level of selective context sensitive propagaton is planned
> > >   for the Attributor.
> > >
> > >
> > > > we would say the return value of f(null, 1) is poison.  But if you
> > > > inline the function you get UB. Inlining has to continue to be correct
> > > > without exceptions.
> > >
> > > The attributes will just never represent the possible UB exhaustively.
> > > I mean, inlining is still correct and the UB that it will expose,
> > together with SCCP,
> > > was there all along. With one-level of context sensitivity we could have
> > even
> > > known, without we just didn't know but that is fine too.
> > >
> > > > ...
> > > [ moved to the front ]
> > >
> > >
> > > I hope this makes some sense.
> > >
> > > Cheers,
> > >   Johannes
> > >
> > > P.S.
> > >   I think I'll write "used" deduction later so we can take a look at how
> > >   this would look like.
> > >
> > >
> > > > Thanks,
> > > > Nuno
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Johannes Doerfert <jdoerfert at anl.gov>
> > > > Sent: 18 de fevereiro de 2020 02:35
> > > > Subject: Re: [llvm-dev] The semantics of nonnull attribute
> > > >
> > > > Hi Juneyoung,
> > > >
> > > > thank you for starting this discussion again, the last one [0] did not
> > > > really end in a solution but the issue remains and will only become
> > > > worse, e.g., enable the Attributor and you might see all of the
> > > > problems below exposed in a single pass [1-4].
> > > >
> > > > As I argued in [0], UB is problematic and I am strongly in favor of
> > > > poison. In some sense I see UB as a very "strong" guarantee and poison
> > > > as a fairly "week" one. We will have a hard time deriving the strong
> > > > one and it might cause us to break things along the way, I mean your
> > > > examples show that we already miscompile codes.
> > > >
> > > > I agree that poison is on its own not "strong enough" for certain
> > > > optimization, as you pointed out below, but we could reasonably
> > > > augment it with another attribute, let's call it `used`, that
> > > > basically bridges the gap:
> > > >
> > > >   void @foo(nonnull);
> > > >   void @bar(nonnull used);
> > > >
> > > >   foo(NULL)  // <- argument is poison inside of @foo
> > > >   bar(NULL)  // <- UB at the call site of @bar
> > > >
> > > > A value is `used` if it would inevitably cause UB if it was poison at
> > > > this program point. I hope this makes some sense if not I can explain
> > > > in more detail.
> > > >
> > > > Btw. I think this also integrates well with other attribute problems
> > > > we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in
> > > > this example were "not used" (in some naive implementation of memcpy
> > > > at least).
> > > >
> > > > Maybe I have overlooked a problem with this solution so let me know
> > > > what you think.
> > > >
> > > > Cheers,
> > > >   Johannes
> > > >
> > > >
> > > > [0] https://reviews.llvm.org/D70749
> > > > [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below:
> > > > https://godbolt.org/z/TAkSC6 [3] 3. below:
> > > > https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4.
> > > > below: https://godbolt.org/z/Qkkc_E
> > > >               https://godbolt.org/z/uXAfp_
> > > >               ...
> > > >
> > > > On 02/18, Juneyoung Lee via llvm-dev wrote:
> > > > > Hello all,
> > > > >
> > > > > LangRef says it is undefined behavior to pass null to a nonnull
> > > > > argument (`call f(nonnull null);`), but the semantics is too strong
> > > > > for a few existing optimizations.
> > > > > To support these, we can relax the semantics so `f(nonnull null)` is
> > > > > equivalent to `f(poison)`, but (A) it again blocks another set of
> > > > > optimizations, and (B) this makes the semantics of nonnull deviate
> > > > > from other attributes like dereferenceable.
> > > > > I found that there was a similar discussion about this issue in the
> > > > > past as well, but seems it is not settled yet.
> > > > > What should the semantics of nonnull be?
> > > > > I listed a few optimizations that are relevant with this issue.
> > > > >
> > > > >
> > > > > 1. Propagating nonnull attribute to callee's arg (
> > > > > https://godbolt.org/z/-cVsVP )
> > > > >
> > > > > g(i8* ptr) {
> > > > > f(nonnull ptr);
> > > > > }
> > > > > =>
> > > > > g(i8* nonnull ptr) {
> > > > > f(nonnull ptr);
> > > > > }
> > > > >
> > > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull
> > > > > null) should have raised UB, so ptr shouldn't be null.
> > > > > However, this transformation is incorrect if f(nonnull null) is
> > > > > equivalent to f(poison).
> > > > > If f was an empty function, f(nonnull null) never raises UB
> > > > > regardless of ptr. So we can't guarantee ptr != null at other uses
> > of ptr.
> > > >
> > > > For the given code snipped it is not even invalid to propagate nonnull
> > > > in the poison case but your point is not wrong, if we have f(poison)
> > > > we cannot derive anything from it if the argument might be unused.
> > > >
> > > >
> > > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> > > > >
> > > > > %ptr_inb = gep inbounds %any_ptr, 1
> > > > > f(%ptr_inb)
> > > > > =>
> > > > > %ptr_inb = .. (identical)
> > > > > f(nonnull %ptr_inb)
> > > > >
> > > > > This optimization is incorrect if `f(nonnull null)` is UB. The
> > > > > reason is as follows.
> > > > > If `gep inbounds %any_ptr, 1` yields poison, the source is
> > > > > `f(poison)` whereas the optimized one is `f(nonnull poison)`.
> > > > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB.
> > > > > So, the transformation introduced UB.
> > > > > This optimization is correct if both `f(nonnull null)` and
> > > > > `f(nonnull poison)` are equivalent to `f(poison)`.
> > > > >
> > > > >
> > > > > 3. https://reviews.llvm.org/D69477
> > > > >
> > > > > f(nonnull ptr);
> > > > > use(ptr);
> > > > > =>
> > > > > llvm.assume(ptr != null);
> > > > > use(ptr);
> > > > > f(nonnull ptr);
> > > > >
> > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null,
> > > > > the added llvm.assume(ptr != null) raises UB whereas the source may
> > > > > not raise UB at all. (e.g. assume that f() was an empty function) If
> > > > > f(nonnull null) is UB, this is correct.
> > > > >
> > > > >
> > > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
> > > > >
> > > > > f(nonnull ptr); // f's argument is dead => f(nonnull undef);
> > > > >
> > > > > This is incorrect if f(nonnull null) is UB. To make this correct,
> > > > > nonnull should be dropped. This becomes harder to fix if nonnull was
> > > > > attached at the signature of a function (not at the callee site).
> > > > > It is correct if f(nonnull null) is f(poison).
> > > > >
> > > > > Actually the D70749's thread had an end-to-end miscompilation
> > > > > example due to the interaction between this DAE and the optimization
> > > > > 3 (insertion of llvm.assume).
> > > > >
> > > > > Thanks,
> > > > > Juneyoung Lee
> > > >
> > >
> > > --
> > >
> > > Johannes Doerfert
> > > Researcher
> > >
> > > Argonne National Laboratory
> > > Lemont, IL 60439, USA
> > >
> > > jdoerfert at anl.gov
> >
> >
> 
> -- 
> 
> Juneyoung Lee
> Software Foundation Lab, Seoul National University

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/751063bb/attachment-0001.sig>


More information about the llvm-dev mailing list