[llvm-dev] The semantics of nonnull attribute

Juneyoung Lee via llvm-dev llvm-dev at lists.llvm.org
Tue Feb 18 18:55:05 PST 2020


Correction of a few typos:

> dereferenceable may need UB even without nonnull
may need to be UB even without non_poison

> int x0 = x > 0 ? x : 1; // x0 is never zero *or poison(undef)*
// x0 should be never zero *or poison(undef)*

On Wed, Feb 19, 2020 at 11:50 AM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
wrote:

> Hello all,
>
> I think not_poison (Johannes's used keyword) makes sense. We can simulate
> the original UB semantics by simply attaching it, as explained.
> For the attributes other than nonnull, we may need more discussion; align
> attribute seems to be okay with defining it as poison, dereferenceable may
> need UB even without nonnull (because it needs to be non-poison as shown
> Nuno's hoisting example).
>
> The reason why I think not_poison is good even if it may cause divergence
> of attribute semantics is that it brings a few more opportunities to
> optimizations.
>
>
> (1) We can do optimizations that require certain operands to be never
> poison.
>
> This is important for, say, hoisting of division:
>
> void f(int x, int y) {
> int x0 = x > 0 ? x : 1; // x0 is never zero *or poison(undef)*
> loop() { f(y / x0); }
> }
> =>
> void f(int x, int y) { ; x shouldn't be undef/poison
> int x0 = x > 0 ? x : 1;
> int tmp = y / x0; // this may raise UB
> loop() { f(tmp); }
> }
>
> This optimization is incorrect if x is undef or poison. If x has
> non_poison, this becomes valid.
>
> I guess in many cases Rust/Swift functions can be lowered into IR
> functions with not_poison flags attached (though I'm not an Rust/Swift
> expert, so just a guess)
> For C, people may want to allow idioms like below, so further
> investigation is needed:
>
> int x, y;
> if (cond) x = 1; else y = 1;
> f(cond, x, y);
>
>
> (2) Hoisting of function call becomes easier and semantically clearer
>
> https://godbolt.org/z/ThYt29
> f() call is hoisted out of the loop thanks to `speculatable` attribute,
> but this is problematic if nonnull was defined to raise UB.
> We should have syntactically prohibited having nonnull and speculatable in
> a single function declaration, but this may block optimization
> opportunities.
> By adding not_poison and making nonnull to have poison semantics, this
> problem is resolved.
>
>
> Thanks,
> Juneyoung Lee
>
>
> On Wed, Feb 19, 2020 at 6:37 AM Eli Friedman <efriedma at quicinc.com> wrote:
>
>> I think calling the attribute "used" is confusing.  I'd suggest the
>> following:
>>
>> "not_poison": If an argument is marked not_poison, and the argument is
>> poison at runtime, the call is instant UB.  Whether an argument is poison
>> is checked after the rules for other attributes like "nonnull" and "align"
>> are applied.
>>
>> This makes it clear that the IR semantics don't depend on the
>> implementation of the called function.  And it's the logical extension of
>> the way we define the relationship between UB and poison for instructions.
>>
>> -Eli
>>
>> > -----Original Message-----
>> > From: Doerfert, Johannes <jdoerfert at anl.gov>
>> > Sent: Tuesday, February 18, 2020 12:14 PM
>> > To: Nuno Lopes <nuno.lopes at ist.utl.pt>
>> > Cc: 'Juneyoung Lee' <juneyoung.lee at sf.snu.ac.kr>; 'llvm-dev' <llvm-
>> > dev at lists.llvm.org>; 'Roman Lebedev' <lebedev.ri at gmail.com>; llvm-
>> > dev at redking.me.uk; Eli Friedman <efriedma at quicinc.com>; 'Philip Reames'
>> > <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at anl.gov>;
>> 'Chung-Kil Hur'
>> > <gil.hur at sf.snu.ac.kr>; 'John Regehr' <regehr at cs.utah.edu>
>> > Subject: [EXT] Re: [llvm-dev] The semantics of nonnull attribute
>> >
>> > I think we are actually in agreement. "Used" should be removable, it
>> would
>> > potentially define UB but that is valid and happening anyway everywhere.
>> > Maybe my initial mail might have been confusing but I tried to show
>> that with
>> > this snippet:
>> >
>> > >   void @foo(nonnull);
>> > >   void @bar(nonnull used);
>> > >
>> > > > foo(NULL)  // <- argument is poison inside of @foo
>> > > > bar(NULL)  // <- UB at the call site of @bar
>> >
>> > If we remove `used`, thus go from @bar to @foo, we get a poison argument
>> > instead of UB, which is fine.
>> >
>> > WDYT?
>> >
>> >
>> > > Even if the function is just a load of the pointer argument.
>> >
>> > The Attributor does infer `nonnull` and `dereferenceable` in this case
>> already so
>> > `used` would follow the same logic (+ look for uses in branches etc)
>> >
>> >
>> > FWIW, frontends that want `nonnull`, `align`, `dereferenceable`, etc.
>> to keep the
>> > UB behavior can just add `used` as well.
>> >
>> > ________________________________________
>> > From: Nuno Lopes <nuno.lopes at ist.utl.pt>
>> > Sent: Tuesday, February 18, 2020 12:29
>> > To: Doerfert, Johannes
>> > Cc: 'Juneyoung Lee'; 'llvm-dev'; 'Roman Lebedev';
>> llvm-dev at redking.me.uk; 'Eli
>> > Friedman'; 'Philip Reames'; Finkel, Hal J.; 'Chung-Kil Hur'; 'John
>> Regehr'
>> > Subject: RE: [llvm-dev] The semantics of nonnull attribute
>> >
>> > Hi Johannes,
>> >
>> > >> Not sure the semantics of "used" you propose is sufficient. AFAIU the
>> > >> proposal, "used" could only be used in cases where the function will
>> > >> always trigger UB if poison is passed as argument.  The semantics of
>> > >> attributes is usually the other way around, since function calls need
>> > >> to have UB as strong as the worst behavior of the function. If a
>> > >> function may for some reason trigger UB with a given set of
>> arguments,
>> > >> then the function call has to trigger UB as well.
>> > >
>> > >"Used", as other attributes are "best effort". We can drop them any
>> time, so
>> > having "insufficient information" is not a correctness problem but an
>> > optimization problem. I mean, even if the function always causes UB if
>> a `null`
>> > argument is given we do not need to know/annotate/exploit that. In
>> fact, we
>> > can, and do, throw away that information, e.g., by removing the
>> instruction
>> > known to cause UB which effectively defines some semantic for the UB
>> program
>> > (which is then "w/o UB"). Long story short, I don't think "as strong as
>> the worst
>> > behavior" is plausible (nor possible TBH).
>> >
>> > I agree with the motivation, but not with the exact semantics. The
>> reason is that
>> > it seems that removing "used" wouldn't be possible. This is because a
>> function
>> > with "used" gives poison, and without gives UB. It should the other way
>> around,
>> > such that optimizers can freely remove "used".
>> > To make it clear, what I understood was:
>> >  - %v = call @fn(nonnull null) --  %v == poison
>> >  - %v = call @fn(nonnull used null) --  UB
>> >
>> > Even if the function is just a load of the pointer argument.
>> >
>> > Nuno
>> >
>> >
>> >
>> > > Imagine something like:
>> > > Call f(null, 1)
>> > >
>> > > int f(nonnull %ptr, int c) {
>> > >   if (c)
>> > >     deref %ptr
>> > > }
>> > >
>> > > You can't use the "used" attribute, since UB is only conditional.
>> >
>> > Agreed, for the callee (f) you cannot annotate "used". You can place
>> "used" on
>> > the call site though*. And you can place it on the callee (f) if it was
>> internal or
>> > you copy it to make it internal. So the optimization potential is still
>> there, even if
>> > it is a little trickier.
>> >
>> > * FWIW, one-level of selective context sensitive propagaton is planned
>> >   for the Attributor.
>> >
>> >
>> > > we would say the return value of f(null, 1) is poison.  But if you
>> > > inline the function you get UB. Inlining has to continue to be correct
>> > > without exceptions.
>> >
>> > The attributes will just never represent the possible UB exhaustively.
>> > I mean, inlining is still correct and the UB that it will expose,
>> together with SCCP,
>> > was there all along. With one-level of context sensitivity we could
>> have even
>> > known, without we just didn't know but that is fine too.
>> >
>> > > ...
>> > [ moved to the front ]
>> >
>> >
>> > I hope this makes some sense.
>> >
>> > Cheers,
>> >   Johannes
>> >
>> > P.S.
>> >   I think I'll write "used" deduction later so we can take a look at how
>> >   this would look like.
>> >
>> >
>> > > Thanks,
>> > > Nuno
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Johannes Doerfert <jdoerfert at anl.gov>
>> > > Sent: 18 de fevereiro de 2020 02:35
>> > > Subject: Re: [llvm-dev] The semantics of nonnull attribute
>> > >
>> > > Hi Juneyoung,
>> > >
>> > > thank you for starting this discussion again, the last one [0] did not
>> > > really end in a solution but the issue remains and will only become
>> > > worse, e.g., enable the Attributor and you might see all of the
>> > > problems below exposed in a single pass [1-4].
>> > >
>> > > As I argued in [0], UB is problematic and I am strongly in favor of
>> > > poison. In some sense I see UB as a very "strong" guarantee and poison
>> > > as a fairly "week" one. We will have a hard time deriving the strong
>> > > one and it might cause us to break things along the way, I mean your
>> > > examples show that we already miscompile codes.
>> > >
>> > > I agree that poison is on its own not "strong enough" for certain
>> > > optimization, as you pointed out below, but we could reasonably
>> > > augment it with another attribute, let's call it `used`, that
>> > > basically bridges the gap:
>> > >
>> > >   void @foo(nonnull);
>> > >   void @bar(nonnull used);
>> > >
>> > >   foo(NULL)  // <- argument is poison inside of @foo
>> > >   bar(NULL)  // <- UB at the call site of @bar
>> > >
>> > > A value is `used` if it would inevitably cause UB if it was poison at
>> > > this program point. I hope this makes some sense if not I can explain
>> > > in more detail.
>> > >
>> > > Btw. I think this also integrates well with other attribute problems
>> > > we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in
>> > > this example were "not used" (in some naive implementation of memcpy
>> > > at least).
>> > >
>> > > Maybe I have overlooked a problem with this solution so let me know
>> > > what you think.
>> > >
>> > > Cheers,
>> > >   Johannes
>> > >
>> > >
>> > > [0] https://reviews.llvm.org/D70749
>> > > [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below:
>> > > https://godbolt.org/z/TAkSC6 [3] 3. below:
>> > > https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4.
>> > > below: https://godbolt.org/z/Qkkc_E
>> > >               https://godbolt.org/z/uXAfp_
>> > >               ...
>> > >
>> > > On 02/18, Juneyoung Lee via llvm-dev wrote:
>> > > > Hello all,
>> > > >
>> > > > LangRef says it is undefined behavior to pass null to a nonnull
>> > > > argument (`call f(nonnull null);`), but the semantics is too strong
>> > > > for a few existing optimizations.
>> > > > To support these, we can relax the semantics so `f(nonnull null)` is
>> > > > equivalent to `f(poison)`, but (A) it again blocks another set of
>> > > > optimizations, and (B) this makes the semantics of nonnull deviate
>> > > > from other attributes like dereferenceable.
>> > > > I found that there was a similar discussion about this issue in the
>> > > > past as well, but seems it is not settled yet.
>> > > > What should the semantics of nonnull be?
>> > > > I listed a few optimizations that are relevant with this issue.
>> > > >
>> > > >
>> > > > 1. Propagating nonnull attribute to callee's arg (
>> > > > https://godbolt.org/z/-cVsVP )
>> > > >
>> > > > g(i8* ptr) {
>> > > > f(nonnull ptr);
>> > > > }
>> > > > =>
>> > > > g(i8* nonnull ptr) {
>> > > > f(nonnull ptr);
>> > > > }
>> > > >
>> > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull
>> > > > null) should have raised UB, so ptr shouldn't be null.
>> > > > However, this transformation is incorrect if f(nonnull null) is
>> > > > equivalent to f(poison).
>> > > > If f was an empty function, f(nonnull null) never raises UB
>> > > > regardless of ptr. So we can't guarantee ptr != null at other uses
>> of ptr.
>> > >
>> > > For the given code snipped it is not even invalid to propagate nonnull
>> > > in the poison case but your point is not wrong, if we have f(poison)
>> > > we cannot derive anything from it if the argument might be unused.
>> > >
>> > >
>> > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
>> > > >
>> > > > %ptr_inb = gep inbounds %any_ptr, 1
>> > > > f(%ptr_inb)
>> > > > =>
>> > > > %ptr_inb = .. (identical)
>> > > > f(nonnull %ptr_inb)
>> > > >
>> > > > This optimization is incorrect if `f(nonnull null)` is UB. The
>> > > > reason is as follows.
>> > > > If `gep inbounds %any_ptr, 1` yields poison, the source is
>> > > > `f(poison)` whereas the optimized one is `f(nonnull poison)`.
>> > > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB.
>> > > > So, the transformation introduced UB.
>> > > > This optimization is correct if both `f(nonnull null)` and
>> > > > `f(nonnull poison)` are equivalent to `f(poison)`.
>> > > >
>> > > >
>> > > > 3. https://reviews.llvm.org/D69477
>> > > >
>> > > > f(nonnull ptr);
>> > > > use(ptr);
>> > > > =>
>> > > > llvm.assume(ptr != null);
>> > > > use(ptr);
>> > > > f(nonnull ptr);
>> > > >
>> > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null,
>> > > > the added llvm.assume(ptr != null) raises UB whereas the source may
>> > > > not raise UB at all. (e.g. assume that f() was an empty function) If
>> > > > f(nonnull null) is UB, this is correct.
>> > > >
>> > > >
>> > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749
>> )
>> > > >
>> > > > f(nonnull ptr); // f's argument is dead => f(nonnull undef);
>> > > >
>> > > > This is incorrect if f(nonnull null) is UB. To make this correct,
>> > > > nonnull should be dropped. This becomes harder to fix if nonnull was
>> > > > attached at the signature of a function (not at the callee site).
>> > > > It is correct if f(nonnull null) is f(poison).
>> > > >
>> > > > Actually the D70749's thread had an end-to-end miscompilation
>> > > > example due to the interaction between this DAE and the optimization
>> > > > 3 (insertion of llvm.assume).
>> > > >
>> > > > Thanks,
>> > > > Juneyoung Lee
>> > >
>> >
>> > --
>> >
>> > Johannes Doerfert
>> > Researcher
>> >
>> > Argonne National Laboratory
>> > Lemont, IL 60439, USA
>> >
>> > jdoerfert at anl.gov
>>
>>
>
> --
>
> Juneyoung Lee
> Software Foundation Lab, Seoul National University
>


-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200219/9d80d65f/attachment.html>


More information about the llvm-dev mailing list