[llvm-dev] The undef story

Wed Jun 28 09:39:26 PDT 2017

Part I.

The original LangRef appeared to be “nice and pretty”
and originally ‘undef’ did not seem to stick out.

Then  evidence came to light in the form of bug reports, and
in the form of Dan Gohman’s email “the nsw story”, that all
was not good for ‘undef’ [1,2].

A proposal was made based on that evidence.
A presentation was made at an llvm gathering.
A paper was written. The proposal has even been
implemented and tested.

The IR might not be so “nice and pretty” anymore,
but at least all the known bugs could be closed.
I think everyone agreed that the case could be closed
based on the evidence at hand.

However new evidence has come to light,
the function-inlining example for one,
which the proposal does not address.

This means the case must be re-opened.

Now some folks seem upset over the fact that the new
evidence implies that the original bugs had not been
fully root-caused and that a simpler solution exists.

But this is not a reason to be upset, rather it is a reason to
be pleased that the IR can be restored to its original “nice
and pretty” status,

and I think Dan Gohman would be pleased.

Part II.

That the current LangRef IR definition of "undef" is incorrect
is not immediately obvious, it has gone unnoticed for a long
time, and has become lodged in peoples minds as unquestionable.
Instead folks have viewed it as insufficient, and piled on
fixes in the form of "poison" and "freeze".

Here are examples of why the current definition of "undef",
which explicitly assumes that it is correct to copy-propagate
"x = undef" everywhere "x" occurs, is incorrect:

From 3.2 in "Taming Undefined Behavior in LLVM" [3]

*** This shows that the current definition blocks a
highly desirable and otherwise legal optimization. ***

if (k != 0) {
  while (cond) {
    t = 1/k;
    use(t);
  }
}
-----> hoist loop invariant ----->
if (k != 0) {
    t = 1/k;
  while (cond) {
    use(t);
  }
}

This becomes illegal when later "k" is replaced
with "undef" because then "(undef != 0)" and
"t = 1/undef" need not be consistent,and the
program can incorrectly divide-by-zero. So this
is an example of why copy-propagation of "undef"
is undesirable.

From "Function Inlining and undef / poison question" [4]

*** This shows that the current definition creates illegal
translations. ***

this function always executes statement S
foo(a)
{
  if (a == a)
    S;
}
but in llvm when foo is inlined and "a" is replaced
with "undef" then nothing can be said about whether
statement S is executed because "(undef == undef)"
can go either way with the current definition. So this
is an example of why copy-propagation of "undef" is
illegal.

These problems cannot be fixed by adding "poison" or "freeze",
it is "undef" itself that is inherently flawed.

Part III.

Here is the offending paragraph in the LangRef, from "Undefined Values" :

	This example points out that two ‘undef‘ operands are not necessarily
	the same. This can be surprising to people (and also matches C
	semantics) where they assume that “X^X” is always zero, even if X is
	undefined. This isn’t true for a number of reasons, but the short
	answer is that an ‘undef‘ “variable” can arbitrarily change its value
	over its “live range”. This is true because the variable doesn’t
	actually have a live range. Instead, the value is logically read from
	arbitrary registers that happen to be around when needed, so the value
	is not necessarily consistent over time.

The purported benefit of this definition comes down to a register allocation
argument in that you don't need a register for an uninitialized variable. 
But this benefit has never been shown, and if it ever does become important then
it should be handled in the register allocator, not in the IR.

While the evidence of the harm of this definition is overwhelming. It leads to
translations like “x ^ x” not necessarily being 0 that are non-sensical (no
matter what you think the C standard says), the inability to hoist a loop
invariant divide out of a loop, and the function-inlining example which is
patently illegal.

The LangRef's definition of “undef” needs to be rewritten so that “x = undef”
simply means that this is where the live range of x starts, and that it is
uninitialized. Another way to think of it is as an incoming argument register,
which similarly has an unknown (ie "undef") but stable bit pattern.

(in fact you'll know the compiler correctly implements "undef" when you
 can insert "x = undef" into a function's entry block for each incomming
 argument, and (without any inlining or other IPA of course) it still compiles
 AOK at the IR level (you'd have to strip them back out before codegen))

The LangRef's mathematical examples of "undef" all need to be deleted because
while technically correct thay all presume copy-propagation, but you cannot 
copy-propagate the “undef” symbol itself because doing so leads to absurdities
and mis-compiles.

Once the definition of "undef" is changed then there are no longer any
examples of optimizatitons that need "poison" instead of "undef", so
"poison" needs to be deleted, and "freeze" no longer needs to be considered.

Peter Lawrence.

References

[1. llvm-dev, Dan Gohman, Tue Nov 29 15:21:58 PST 2011 ]
[2. llvm-dev, Dan Gohman, Mon Dec 12 12:58:31 PST 2011 ]
[3. http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf ]
[4. llvm-dev, Peter Lawrence, Thu Jun 15 10:27:40 PDT 2017 ]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/b7f3636b/attachment.html>