<div dir="ltr"><div>Hello,</div><div><br></div><div>>   I don't understand this. Why should the transformed code raise UB when<br>> the original did not? I assume the loop is executed for sure, if the<br>> loop is not executed for sure, the situation is different but then I<br>> don't think the hoisting is "sound". (Btw. `nonnull` should also be<br>> applicable to integers.)  <br></div><div dir="ltr"><br></div><div>Sorry, I missed to mention the assumption that the loop may not iterate at all, leading to not raising undefined behavior.</div><div dir="ltr"><br></div><div dir="ltr">>  I am unsure if there are opportunities where you need "poison" for<br>> `nonnull` and "UB" for the rest. With `non_poison` we get, poison for<br>> all or UB for all. We can think of other combinations but we should<br>> determine if we actually need them.<br></div><div><br></div><div>I think at least a dereferenceable argument cannot have poison. It should guarantee that dereferencing it is not UB, so its memory access can be freely code-motioned, as shown in Nuno's example.<br></div><div><div>It doesn't mean that all attributes should raise UB however, as discussed so far;</div><div>Or, inversely, I gave a thought about having only a small set of attributes that are allowed to have poison, such as a new 'nonnull_or_poison'.</div><div>This can be gracefully merged with 'not_poison' and be promoted to 'nonnull'.</div><div><br></div><div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 19, 2020 at 2:20 PM Doerfert, Johannes <<a href="mailto:johannesdoerfert@gmail.com">johannesdoerfert@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 02/19, Juneyoung Lee via llvm-dev wrote:<br>

> I think not_poison (Johannes's used keyword) makes sense. We can simulate<br>

> the original UB semantics by simply attaching it, as explained.<br>

<br>

Agreed.<br>

<br>

<br>

> For the attributes other than nonnull, we may need more discussion; align<br>

> attribute seems to be okay with defining it as poison, dereferenceable may<br>

> need UB even without nonnull (because it needs to be non-poison as shown<br>

> Nuno's hoisting example).<br>

<br>

I would strongly prefer not to diverge things here.<br>

<br>

<br>

> The reason why I think not_poison is good even if it may cause divergence<br>

> of attribute semantics is that it brings a few more opportunities to<br>

> optimizations.<br>

<br>

I am unsure if there are opportunities where you need "poison" for<br>

`nonnull` and "UB" for the rest. With `non_poison` we get, poison for<br>

all or UB for all. We can think of other combinations but we should<br>

determine if we actually need them.<br>

<br>

<br>

> (1) We can do optimizations that require certain operands to be never<br>

> poison.<br>

> <br>

> This is important for, say, hoisting of division:<br>

> <br>

> void f(int x, int y) {<br>

> int x0 = x > 0 ? x : 1; // x0 is never zero *or poison(undef)*<br>

> loop() { f(y / x0); }<br>

> }<br>

> =><br>

> void f(int x, int y) { ; x shouldn't be undef/poison<br>

> int x0 = x > 0 ? x : 1;<br>

> int tmp = y / x0; // this may raise UB<br>

> loop() { f(tmp); }<br>

> }<br>

> <br>

> This optimization is incorrect if x is undef or poison. If x has<br>

> non_poison, this becomes valid.<br>

<br>

I don't understand this. Why should the transformed code raise UB when<br>

the original did not? I assume the loop is executed for sure, if the<br>

loop is not executed for sure, the situation is different but then I<br>

don't think the hoisting is "sound". (Btw. `nonnull` should also be<br>

applicable to integers.)<br>

<br>

Three cases for a loop that is always entered:<br>

1) x = undef,<br>

    if each use of x allows a different value I pick 1 in the comparison<br>

    and 0 otherwise, this is UB in either case.<br>

    if each use (in the same instruction) has to be the same `x0 = max(x, 1)`<br>

    which should be save.<br>

2) x = poison,<br>

    you should be able to propagate `x0 = poison` which is UB in either<br>

    case.<br>

3) x is a "proper value",<br>

    `x0 = max(x, 1)` which should be save.<br>

<br>

Let me know what I am missing here.<br>

<br>

<br>

> I guess in many cases Rust/Swift functions can be lowered into IR functions<br>

> with not_poison flags attached (though I'm not an Rust/Swift expert, so<br>

> just a guess)<br>

> For C, people may want to allow idioms like below, so further investigation<br>

> is needed:<br>

> <br>

> int x, y;<br>

> if (cond) x = 1; else y = 1;<br>

> f(cond, x, y);<br>

<br>

I'm unsure what semantics you want here. Either x or y is uninitialized,<br>

right? From the C perspective that is totally fine so far (AFAIK).<br>

<br>

<br>

> (2) Hoisting of function call becomes easier and semantically clearer<br>

> <br>

> <a href="https://godbolt.org/z/ThYt29" rel="noreferrer" target="_blank">https://godbolt.org/z/ThYt29</a><br>

> f() call is hoisted out of the loop thanks to `speculatable` attribute, but<br>

> this is problematic if nonnull was defined to raise UB.<br>

> We should have syntactically prohibited having nonnull and speculatable in<br>

> a single function declaration, but this may block optimization<br>

> opportunities.<br>

> By adding not_poison and making nonnull to have poison semantics, this<br>

> problem is resolved.<br>

<br>

I don't see why nonnull (either with UB or poison semantics) is<br>

necessary incompatible with speculateable (which has UB semantics) but I<br>

agree that the transformation we do here become actually valid only with<br>

poison for nonnull.<br>

<br>

Cheers,<br>

  Johannes<br>

<br>

<br>

> Thanks,<br>

> Juneyoung Lee<br>

> <br>

> <br>

> On Wed, Feb 19, 2020 at 6:37 AM Eli Friedman <<a href="mailto:efriedma@quicinc.com" target="_blank">efriedma@quicinc.com</a>> wrote:<br>

> <br>

> > I think calling the attribute "used" is confusing.  I'd suggest the<br>

> > following:<br>

> ><br>

> > "not_poison": If an argument is marked not_poison, and the argument is<br>

> > poison at runtime, the call is instant UB.  Whether an argument is poison<br>

> > is checked after the rules for other attributes like "nonnull" and "align"<br>

> > are applied.<br>

> ><br>

> > This makes it clear that the IR semantics don't depend on the<br>

> > implementation of the called function.  And it's the logical extension of<br>

> > the way we define the relationship between UB and poison for instructions.<br>

> ><br>

> > -Eli<br>

> ><br>

> > > -----Original Message-----<br>

> > > From: Doerfert, Johannes <<a href="mailto:jdoerfert@anl.gov" target="_blank">jdoerfert@anl.gov</a>><br>

> > > Sent: Tuesday, February 18, 2020 12:14 PM<br>

> > > To: Nuno Lopes <<a href="mailto:nuno.lopes@ist.utl.pt" target="_blank">nuno.lopes@ist.utl.pt</a>><br>

> > > Cc: 'Juneyoung Lee' <<a href="mailto:juneyoung.lee@sf.snu.ac.kr" target="_blank">juneyoung.lee@sf.snu.ac.kr</a>>; 'llvm-dev' <llvm-<br>

> > > <a href="mailto:dev@lists.llvm.org" target="_blank">dev@lists.llvm.org</a>>; 'Roman Lebedev' <<a href="mailto:lebedev.ri@gmail.com" target="_blank">lebedev.ri@gmail.com</a>>; llvm-<br>

> > > <a href="mailto:dev@redking.me.uk" target="_blank">dev@redking.me.uk</a>; Eli Friedman <<a href="mailto:efriedma@quicinc.com" target="_blank">efriedma@quicinc.com</a>>; 'Philip Reames'<br>

> > > <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>>; Finkel, Hal J. <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>;<br>

> > 'Chung-Kil Hur'<br>

> > > <<a href="mailto:gil.hur@sf.snu.ac.kr" target="_blank">gil.hur@sf.snu.ac.kr</a>>; 'John Regehr' <<a href="mailto:regehr@cs.utah.edu" target="_blank">regehr@cs.utah.edu</a>><br>

> > > Subject: [EXT] Re: [llvm-dev] The semantics of nonnull attribute<br>

> > ><br>

> > > I think we are actually in agreement. "Used" should be removable, it<br>

> > would<br>

> > > potentially define UB but that is valid and happening anyway everywhere.<br>

> > > Maybe my initial mail might have been confusing but I tried to show that<br>

> > with<br>

> > > this snippet:<br>

> > ><br>

> > > >   void @foo(nonnull);<br>

> > > >   void @bar(nonnull used);<br>

> > > ><br>

> > > > > foo(NULL)  // <- argument is poison inside of @foo<br>

> > > > > bar(NULL)  // <- UB at the call site of @bar<br>

> > ><br>

> > > If we remove `used`, thus go from @bar to @foo, we get a poison argument<br>

> > > instead of UB, which is fine.<br>

> > ><br>

> > > WDYT?<br>

> > ><br>

> > ><br>

> > > > Even if the function is just a load of the pointer argument.<br>

> > ><br>

> > > The Attributor does infer `nonnull` and `dereferenceable` in this case<br>

> > already so<br>

> > > `used` would follow the same logic (+ look for uses in branches etc)<br>

> > ><br>

> > ><br>

> > > FWIW, frontends that want `nonnull`, `align`, `dereferenceable`, etc. to<br>

> > keep the<br>

> > > UB behavior can just add `used` as well.<br>

> > ><br>

> > > ________________________________________<br>

> > > From: Nuno Lopes <<a href="mailto:nuno.lopes@ist.utl.pt" target="_blank">nuno.lopes@ist.utl.pt</a>><br>

> > > Sent: Tuesday, February 18, 2020 12:29<br>

> > > To: Doerfert, Johannes<br>

> > > Cc: 'Juneyoung Lee'; 'llvm-dev'; 'Roman Lebedev'; <a href="mailto:llvm-dev@redking.me.uk" target="_blank">llvm-dev@redking.me.uk</a>;<br>

> > 'Eli<br>

> > > Friedman'; 'Philip Reames'; Finkel, Hal J.; 'Chung-Kil Hur'; 'John<br>

> > Regehr'<br>

> > > Subject: RE: [llvm-dev] The semantics of nonnull attribute<br>

> > ><br>

> > > Hi Johannes,<br>

> > ><br>

> > > >> Not sure the semantics of "used" you propose is sufficient. AFAIU the<br>

> > > >> proposal, "used" could only be used in cases where the function will<br>

> > > >> always trigger UB if poison is passed as argument.  The semantics of<br>

> > > >> attributes is usually the other way around, since function calls need<br>

> > > >> to have UB as strong as the worst behavior of the function. If a<br>

> > > >> function may for some reason trigger UB with a given set of arguments,<br>

> > > >> then the function call has to trigger UB as well.<br>

> > > ><br>

> > > >"Used", as other attributes are "best effort". We can drop them any<br>

> > time, so<br>

> > > having "insufficient information" is not a correctness problem but an<br>

> > > optimization problem. I mean, even if the function always causes UB if a<br>

> > `null`<br>

> > > argument is given we do not need to know/annotate/exploit that. In fact,<br>

> > we<br>

> > > can, and do, throw away that information, e.g., by removing the<br>

> > instruction<br>

> > > known to cause UB which effectively defines some semantic for the UB<br>

> > program<br>

> > > (which is then "w/o UB"). Long story short, I don't think "as strong as<br>

> > the worst<br>

> > > behavior" is plausible (nor possible TBH).<br>

> > ><br>

> > > I agree with the motivation, but not with the exact semantics. The<br>

> > reason is that<br>

> > > it seems that removing "used" wouldn't be possible. This is because a<br>

> > function<br>

> > > with "used" gives poison, and without gives UB. It should the other way<br>

> > around,<br>

> > > such that optimizers can freely remove "used".<br>

> > > To make it clear, what I understood was:<br>

> > >  - %v = call @fn(nonnull null) --  %v == poison<br>

> > >  - %v = call @fn(nonnull used null) --  UB<br>

> > ><br>

> > > Even if the function is just a load of the pointer argument.<br>

> > ><br>

> > > Nuno<br>

> > ><br>

> > ><br>

> > ><br>

> > > > Imagine something like:<br>

> > > > Call f(null, 1)<br>

> > > ><br>

> > > > int f(nonnull %ptr, int c) {<br>

> > > >   if (c)<br>

> > > >     deref %ptr<br>

> > > > }<br>

> > > ><br>

> > > > You can't use the "used" attribute, since UB is only conditional.<br>

> > ><br>

> > > Agreed, for the callee (f) you cannot annotate "used". You can place<br>

> > "used" on<br>

> > > the call site though*. And you can place it on the callee (f) if it was<br>

> > internal or<br>

> > > you copy it to make it internal. So the optimization potential is still<br>

> > there, even if<br>

> > > it is a little trickier.<br>

> > ><br>

> > > * FWIW, one-level of selective context sensitive propagaton is planned<br>

> > >   for the Attributor.<br>

> > ><br>

> > ><br>

> > > > we would say the return value of f(null, 1) is poison.  But if you<br>

> > > > inline the function you get UB. Inlining has to continue to be correct<br>

> > > > without exceptions.<br>

> > ><br>

> > > The attributes will just never represent the possible UB exhaustively.<br>

> > > I mean, inlining is still correct and the UB that it will expose,<br>

> > together with SCCP,<br>

> > > was there all along. With one-level of context sensitivity we could have<br>

> > even<br>

> > > known, without we just didn't know but that is fine too.<br>

> > ><br>

> > > > ...<br>

> > > [ moved to the front ]<br>

> > ><br>

> > ><br>

> > > I hope this makes some sense.<br>

> > ><br>

> > > Cheers,<br>

> > >   Johannes<br>

> > ><br>

> > > P.S.<br>

> > >   I think I'll write "used" deduction later so we can take a look at how<br>

> > >   this would look like.<br>

> > ><br>

> > ><br>

> > > > Thanks,<br>

> > > > Nuno<br>

> > > ><br>

> > > ><br>

> > > > -----Original Message-----<br>

> > > > From: Johannes Doerfert <<a href="mailto:jdoerfert@anl.gov" target="_blank">jdoerfert@anl.gov</a>><br>

> > > > Sent: 18 de fevereiro de 2020 02:35<br>

> > > > Subject: Re: [llvm-dev] The semantics of nonnull attribute<br>

> > > ><br>

> > > > Hi Juneyoung,<br>

> > > ><br>

> > > > thank you for starting this discussion again, the last one [0] did not<br>

> > > > really end in a solution but the issue remains and will only become<br>

> > > > worse, e.g., enable the Attributor and you might see all of the<br>

> > > > problems below exposed in a single pass [1-4].<br>

> > > ><br>

> > > > As I argued in [0], UB is problematic and I am strongly in favor of<br>

> > > > poison. In some sense I see UB as a very "strong" guarantee and poison<br>

> > > > as a fairly "week" one. We will have a hard time deriving the strong<br>

> > > > one and it might cause us to break things along the way, I mean your<br>

> > > > examples show that we already miscompile codes.<br>

> > > ><br>

> > > > I agree that poison is on its own not "strong enough" for certain<br>

> > > > optimization, as you pointed out below, but we could reasonably<br>

> > > > augment it with another attribute, let's call it `used`, that<br>

> > > > basically bridges the gap:<br>

> > > ><br>

> > > >   void @foo(nonnull);<br>

> > > >   void @bar(nonnull used);<br>

> > > ><br>

> > > >   foo(NULL)  // <- argument is poison inside of @foo<br>

> > > >   bar(NULL)  // <- UB at the call site of @bar<br>

> > > ><br>

> > > > A value is `used` if it would inevitably cause UB if it was poison at<br>

> > > > this program point. I hope this makes some sense if not I can explain<br>

> > > > in more detail.<br>

> > > ><br>

> > > > Btw. I think this also integrates well with other attribute problems<br>

> > > > we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in<br>

> > > > this example were "not used" (in some naive implementation of memcpy<br>

> > > > at least).<br>

> > > ><br>

> > > > Maybe I have overlooked a problem with this solution so let me know<br>

> > > > what you think.<br>

> > > ><br>

> > > > Cheers,<br>

> > > >   Johannes<br>

> > > ><br>

> > > ><br>

> > > > [0] <a href="https://reviews.llvm.org/D70749" rel="noreferrer" target="_blank">https://reviews.llvm.org/D70749</a><br>

> > > > [1] 1. below: <a href="https://godbolt.org/z/J4wFVw" rel="noreferrer" target="_blank">https://godbolt.org/z/J4wFVw</a> [2] 2. below:<br>

> > > > <a href="https://godbolt.org/z/TAkSC6" rel="noreferrer" target="_blank">https://godbolt.org/z/TAkSC6</a> [3] 3. below:<br>

> > > > <a href="https://godbolt.org/z/UqYi6S" rel="noreferrer" target="_blank">https://godbolt.org/z/UqYi6S</a> (doesn't work yet, will though) [4] 4.<br>

> > > > below: <a href="https://godbolt.org/z/Qkkc_E" rel="noreferrer" target="_blank">https://godbolt.org/z/Qkkc_E</a><br>

> > > >               <a href="https://godbolt.org/z/uXAfp_" rel="noreferrer" target="_blank">https://godbolt.org/z/uXAfp_</a><br>

> > > >               ...<br>

> > > ><br>

> > > > On 02/18, Juneyoung Lee via llvm-dev wrote:<br>

> > > > > Hello all,<br>

> > > > ><br>

> > > > > LangRef says it is undefined behavior to pass null to a nonnull<br>

> > > > > argument (`call f(nonnull null);`), but the semantics is too strong<br>

> > > > > for a few existing optimizations.<br>

> > > > > To support these, we can relax the semantics so `f(nonnull null)` is<br>

> > > > > equivalent to `f(poison)`, but (A) it again blocks another set of<br>

> > > > > optimizations, and (B) this makes the semantics of nonnull deviate<br>

> > > > > from other attributes like dereferenceable.<br>

> > > > > I found that there was a similar discussion about this issue in the<br>

> > > > > past as well, but seems it is not settled yet.<br>

> > > > > What should the semantics of nonnull be?<br>

> > > > > I listed a few optimizations that are relevant with this issue.<br>

> > > > ><br>

> > > > ><br>

> > > > > 1. Propagating nonnull attribute to callee's arg (<br>

> > > > > <a href="https://godbolt.org/z/-cVsVP" rel="noreferrer" target="_blank">https://godbolt.org/z/-cVsVP</a> )<br>

> > > > ><br>

> > > > > g(i8* ptr) {<br>

> > > > > f(nonnull ptr);<br>

> > > > > }<br>

> > > > > =><br>

> > > > > g(i8* nonnull ptr) {<br>

> > > > > f(nonnull ptr);<br>

> > > > > }<br>

> > > > ><br>

> > > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull<br>

> > > > > null) should have raised UB, so ptr shouldn't be null.<br>

> > > > > However, this transformation is incorrect if f(nonnull null) is<br>

> > > > > equivalent to f(poison).<br>

> > > > > If f was an empty function, f(nonnull null) never raises UB<br>

> > > > > regardless of ptr. So we can't guarantee ptr != null at other uses<br>

> > of ptr.<br>

> > > ><br>

> > > > For the given code snipped it is not even invalid to propagate nonnull<br>

> > > > in the poison case but your point is not wrong, if we have f(poison)<br>

> > > > we cannot derive anything from it if the argument might be unused.<br>

> > > ><br>

> > > ><br>

> > > > > 2. InstCombine (<a href="https://godbolt.org/z/HDQ7rD" rel="noreferrer" target="_blank">https://godbolt.org/z/HDQ7rD</a> ):<br>

> > > > ><br>

> > > > > %ptr_inb = gep inbounds %any_ptr, 1<br>

> > > > > f(%ptr_inb)<br>

> > > > > =><br>

> > > > > %ptr_inb = .. (identical)<br>

> > > > > f(nonnull %ptr_inb)<br>

> > > > ><br>

> > > > > This optimization is incorrect if `f(nonnull null)` is UB. The<br>

> > > > > reason is as follows.<br>

> > > > > If `gep inbounds %any_ptr, 1` yields poison, the source is<br>

> > > > > `f(poison)` whereas the optimized one is `f(nonnull poison)`.<br>

> > > > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB.<br>

> > > > > So, the transformation introduced UB.<br>

> > > > > This optimization is correct if both `f(nonnull null)` and<br>

> > > > > `f(nonnull poison)` are equivalent to `f(poison)`.<br>

> > > > ><br>

> > > > ><br>

> > > > > 3. <a href="https://reviews.llvm.org/D69477" rel="noreferrer" target="_blank">https://reviews.llvm.org/D69477</a><br>

> > > > ><br>

> > > > > f(nonnull ptr);<br>

> > > > > use(ptr);<br>

> > > > > =><br>

> > > > > llvm.assume(ptr != null);<br>

> > > > > use(ptr);<br>

> > > > > f(nonnull ptr);<br>

> > > > ><br>

> > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null,<br>

> > > > > the added llvm.assume(ptr != null) raises UB whereas the source may<br>

> > > > > not raise UB at all. (e.g. assume that f() was an empty function) If<br>

> > > > > f(nonnull null) is UB, this is correct.<br>

> > > > ><br>

> > > > ><br>

> > > > > 4. Dead argument elimination (from <a href="https://reviews.llvm.org/D70749" rel="noreferrer" target="_blank">https://reviews.llvm.org/D70749</a> )<br>

> > > > ><br>

> > > > > f(nonnull ptr); // f's argument is dead => f(nonnull undef);<br>

> > > > ><br>

> > > > > This is incorrect if f(nonnull null) is UB. To make this correct,<br>

> > > > > nonnull should be dropped. This becomes harder to fix if nonnull was<br>

> > > > > attached at the signature of a function (not at the callee site).<br>

> > > > > It is correct if f(nonnull null) is f(poison).<br>

> > > > ><br>

> > > > > Actually the D70749's thread had an end-to-end miscompilation<br>

> > > > > example due to the interaction between this DAE and the optimization<br>

> > > > > 3 (insertion of llvm.assume).<br>

> > > > ><br>

> > > > > Thanks,<br>

> > > > > Juneyoung Lee<br>

> > > ><br>

> > ><br>

> > > --<br>

> > ><br>

> > > Johannes Doerfert<br>

> > > Researcher<br>

> > ><br>

> > > Argonne National Laboratory<br>

> > > Lemont, IL 60439, USA<br>

> > ><br>

> > > <a href="mailto:jdoerfert@anl.gov" target="_blank">jdoerfert@anl.gov</a><br>

> ><br>

> ><br>

> <br>

> -- <br>

> <br>

> Juneyoung Lee<br>

> Software Foundation Lab, Seoul National University<br>

<br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br>

<br>

-- <br>

<br>

Johannes Doerfert<br>

Researcher<br>

<br>

Argonne National Laboratory<br>

Lemont, IL 60439, USA<br>

<br>

<a href="mailto:jdoerfert@anl.gov" target="_blank">jdoerfert@anl.gov</a><br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><br></div><font size="1">Juneyoung Lee</font><div><font size="1">Software Foundation Lab, Seoul National University</font></div></div></div></div>