[llvm-dev] The undef story

Wed Jun 28 23:26:32 PDT 2017

Hi Peter,

I've tried to specifically address the technical points below.

On Wed, Jun 28, 2017 at 9:39 AM, Peter Lawrence
<peterl95124 at sbcglobal.net> wrote:
> That the current LangRef IR definition of "undef" is incorrect
> is not immediately obvious, it has gone unnoticed for a long
> time, and has become lodged in peoples minds as unquestionable.
> Instead folks have viewed it as insufficient, and piled on
> fixes in the form of "poison" and "freeze".
>
> Here are examples of why the current definition of "undef",
> which explicitly assumes that it is correct to copy-propagate
> "x = undef" everywhere "x" occurs, is incorrect:
>
>
> From 3.2 in "Taming Undefined Behavior in LLVM" [3]
>
> *** This shows that the current definition blocks a
> highly desirable and otherwise legal optimization. ***
>
> if (k != 0) {
>   while (cond) {
>     t = 1/k;
>     use(t);
>   }
> }
> -----> hoist loop invariant ----->
> if (k != 0) {
>     t = 1/k;
>   while (cond) {
>     use(t);
>   }
> }
>
> This becomes illegal when later "k" is replaced
> with "undef" because then "(undef != 0)" and
> "t = 1/undef" need not be consistent,and the
> program can incorrectly divide-by-zero. So this
> is an example of why copy-propagation of "undef"
> is undesirable.

This problem is fixed in semantics mentioned in the paper.

> From "Function Inlining and undef / poison question" [4]
>
> *** This shows that the current definition creates illegal
> translations. ***
>
> this function always executes statement S
> foo(a)
> {
>   if (a == a)
>     S;
> }
> but in llvm when foo is inlined and "a" is replaced
> with "undef" then nothing can be said about whether
> statement S is executed because "(undef == undef)"
> can go either way with the current definition. So this
> is an example of why copy-propagation of "undef" is
> illegal.

This too is fixed in the semantics mentioned in the paper.  This also
isn't new to us, it is covered in section 3.1 "Duplicate SSA Uses".

If you're trying to say that the new behavior (undefined behavior on
branch on poison) is "unintuitive", then that's off base.  You can
design a programming language that will never generate IR that
encounters such a scenario (for instance, a Java frontend), so an "end
user" will never have to deal with this.

In the other thread you vaguely said something about compiler IRs
"taking a stance on undefined behavior" on this matter.  That too is
off base.  If a C/C++ implementation wants to avoid unintuitive
"catch-fire" semantics they can generate IR in a way that the
situation you outline never happens.

> These problems cannot be fixed by adding "poison" or "freeze",
> it is "undef" itself that is inherently flawed.

undef *is* "flawed", which is why we're removing it.  Please refer to the paper.

> The LangRef's definition of “undef” needs to be rewritten so that “x =
> undef”
> simply means that this is where the live range of x starts, and that it is
> uninitialized. Another way to think of it is as an incoming argument
> register,
> which similarly has an unknown (ie "undef") but stable bit pattern.

This world you're proposing with:

 - An undef "instruction"
 - Undefined behavior on a overflowing nsw/nuw arithmetic (equivalent
to "strip nsw/nuw on hoisting")

is radically different from where LLVM is right now and the burden is
on you to show that this end result is realistic.  The biggest hurdle
of the new semantics was implementing it (many thanks to Juyeyoung Lee
who did most of the hard work) to show that it actually works and does
not regress performance.  You need to do the same thing for your
proposal to sit at the same table.

-- Sanjoy