PATCH: UndefInst and UnreachableValue (Constant)

Fri Apr 1 01:19:17 PDT 2016

This series of patches proposes two closely related extensions to LLVM's 
handling and optimization of undefined behaviour.

The first innovation is the "undef instruction" which produces a single 
value that is in range for its type, but does not define what that 
single value is. That is, it can not be unsigned-less-than zero, nor can 
it be both not equal to X and not equal to Y where X is not equal to Y. 
That sort of thing.

This is useful for a non-C-like languages where reading an allocated but 
uninitialized stack variable produces the real concrete value that was 
in memory -- a single value whose bits may be anything, but does not 
cause uncontrolled UB. We are given the freedom to decide what that 
value actually is, as long as we pick only one possibility. This is what 
the undef instruction provides.

For some background, you may want to read Sanjoy's post about undef:
   http://www.playingwithpointers.com/problem-with-undef.html
where he talks about how every use of an UndefValue value is an SSA 
definition (a problem that UndefInst fixes), and John Regehr's Friendly 
C work:
   http://blog.regehr.org/archives/1180

To preserve behaviour for C-like compilers, their pass pipelines should 
include a pass that replaces all uses of undef instructions with undef 
constants and deletes the now-dead undef instructions. This is 
integrated into instsimplify.

The second innovation is the inverse of the first, adding 
UnreachableValue with semantics similar to the existing unreachable 
instruction, but as a constant. The intended primary user is the 
constant folder itself.

Similarly to undef, combining an unreachable constant with another value 
produces an unreachable constant. "unreachable + 1 == unreachable". You 
can safely put an unreachable on one side of a select as long as that 
side is not chosen (indeed, it indicates that side is never chosen). If 
an instruction evaluates to unreachable, the instruction has "no defined 
semantics" (ie., behaves the same as an unreachable instruction). This 
also means that you can use unreachable constants in PHI nodes (there's 
an obvious optimization opportunity there) and as global variable 
initializers (no 'unreachable' semantic until the load instruction).

If an optimization shows that an instruction is equivalent to an 
unreachable constant it is best to transform the instruction into an 
unreachable, but if you simply RAUW+DCE the instruction then you've 
pushed the unreachable safely down the line, and the optimization is 
correct with a missed optz'n opportunity.

One of the major benefits of framing unreachability as a value is that 
you can accidentally propagate control flow information through any data 
flow algorithm.

We should add UndefInst and UnreachableValue first, then audit calls to 
UndefValue::get to see whether the optimizer should create an undef 
instruction, an undef value or an unreachable constant.

The first patch in the series adds the unreachable constant: 
http://reviews.llvm.org/D18686 .

I tried implementing this as a ConstantExpr where the expr was 
Instruction::Unreachable but it didn't come out cleanly because the 
unreachable instruction is both a terminator and has void type. No other 
ConstantExpr is a terminator, or has void type. Making constant folding 
potentially change the type to void was very awkward as well.

Anyways, please let me know what you think!

Nick