[llvm-dev] RFC: Killing undef and spreading poison

Thu Jun 8 10:52:52 PDT 2017

> On Jun 8, 2017, at 10:33 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
> Hi Peter,
> 
> On Thu, Jun 8, 2017 at 9:41 AM, Peter Lawrence
> <peterl95124 at sbcglobal.net> wrote:
>> 
>>> On Jun 7, 2017, at 2:23 PM, Nuno Lopes <nunoplopes at sapo.pt> wrote:
>>> 
>>> Since most add/sub operations compiled from C have the nsw attribute, we cannot simply restrict movement of these instructions.
>> 
>> Nuno,
>>          I’m not saying the operations can’t be moved,
>> I’m saying that once you move one the ‘nsw’ attribute no longer applies,
>> unless you can mathematically prove that it still does,
>> otherwise an “add nsw” has to be converted to plain “add”.
>> 
>> It is only by illegally retaining the ‘nsw’ after hoisting the multiply out of the ‘if’ statement
>> that subsequent transformations yield end-to-end-miscompilation in Sanjoy’s example.
> 
> That would be correct (and we do this for some constructs: for
> instance when we have !dereferenceable attached to a load instruction
> we will strip the !dereferenceable when hoisting it out of control
> flow).  

In other words you are agreeing with me.

And once we’ve agreed on that, why do you insist on illegally hoisting the ‘nsw’
Out of the if-statement, since adding “poison” is clearly not necessary (once
The ‘nsw’ is stripped off the multiply the ‘sext’ can no longer be commuted).

In other words “poison” isn’t cake, it is a bandaid over an illegal transformation,
It has no benefit. Don’t make the illegal transformation and ‘poison’ isn’t necessary.

Peter Lawrence.

> However, with poison we want to have our cake and eat it too
> (perhaps eating is not the best analogy with poison :) ) -- we want to
> (whenever correct) exploit the fact that a certain operation does not
> overflow when possible even when hoisting it above control flow.  For
> instance, if we have:
> 
> if (cond) {
>  t = a +nsw b;
>  print(t);
> }
> 
> Now if once we hoist t out of the control block:
> 
> t = a +nsw b;
> if (cond) {
>  print(t);
> }
> 
> in the transformed program, t itself may sign overflow.  In LLVM IR
> (or at least in the semantics we'd like), this has no correctness
> implications -- t becomes "poison" (which is basically deferred
> undefined behavior), and the program is undefined only if the
> generated poison value is used in a "side effecting" manner.  Assuming
> that print is a "side effect", this means at print, we can assume t
> isn't poison (and thus a + b did not sign overflow).  This is a weaker
> model than C/C++; and the difficult bits are getting the poison
> propagation rules correct, and to have a sound definition of a "side
> effect" (i.e. the points at which poison == deferred UB actually
> becomes UB).
> 
>> I think the LLVM community in general has misunderstood and misused ‘nsw’, don’t you agree now ?
> 
> FYI, I think it is poor form to insinuate such things when you clearly
> haven't made an effort to dig back and understand the all of prior
> discussions we've had in this area (hint: we've discussed and
> explicitly decided to not implement the semantics you're suggesting).
> Of course, fresh ideas are always welcome but I suggest you start by
> first reading http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf
> and some of the mailing list discussions we've had in the past on this
> topic.
> 
> Thanks!
> -- Sanjoy