PATCH: UndefInst and UnreachableValue (Constant)
Nick Lewycky via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 1 01:19:17 PDT 2016
This series of patches proposes two closely related extensions to LLVM's
handling and optimization of undefined behaviour.
The first innovation is the "undef instruction" which produces a single
value that is in range for its type, but does not define what that
single value is. That is, it can not be unsigned-less-than zero, nor can
it be both not equal to X and not equal to Y where X is not equal to Y.
That sort of thing.
This is useful for a non-C-like languages where reading an allocated but
uninitialized stack variable produces the real concrete value that was
in memory -- a single value whose bits may be anything, but does not
cause uncontrolled UB. We are given the freedom to decide what that
value actually is, as long as we pick only one possibility. This is what
the undef instruction provides.
For some background, you may want to read Sanjoy's post about undef:
http://www.playingwithpointers.com/problem-with-undef.html
where he talks about how every use of an UndefValue value is an SSA
definition (a problem that UndefInst fixes), and John Regehr's Friendly
C work:
http://blog.regehr.org/archives/1180
To preserve behaviour for C-like compilers, their pass pipelines should
include a pass that replaces all uses of undef instructions with undef
constants and deletes the now-dead undef instructions. This is
integrated into instsimplify.
The second innovation is the inverse of the first, adding
UnreachableValue with semantics similar to the existing unreachable
instruction, but as a constant. The intended primary user is the
constant folder itself.
Similarly to undef, combining an unreachable constant with another value
produces an unreachable constant. "unreachable + 1 == unreachable". You
can safely put an unreachable on one side of a select as long as that
side is not chosen (indeed, it indicates that side is never chosen). If
an instruction evaluates to unreachable, the instruction has "no defined
semantics" (ie., behaves the same as an unreachable instruction). This
also means that you can use unreachable constants in PHI nodes (there's
an obvious optimization opportunity there) and as global variable
initializers (no 'unreachable' semantic until the load instruction).
If an optimization shows that an instruction is equivalent to an
unreachable constant it is best to transform the instruction into an
unreachable, but if you simply RAUW+DCE the instruction then you've
pushed the unreachable safely down the line, and the optimization is
correct with a missed optz'n opportunity.
One of the major benefits of framing unreachability as a value is that
you can accidentally propagate control flow information through any data
flow algorithm.
We should add UndefInst and UnreachableValue first, then audit calls to
UndefValue::get to see whether the optimizer should create an undef
instruction, an undef value or an unreachable constant.
The first patch in the series adds the unreachable constant:
http://reviews.llvm.org/D18686 .
I tried implementing this as a ConstantExpr where the expr was
Instruction::Unreachable but it didn't come out cleanly because the
unreachable instruction is both a terminator and has void type. No other
ConstantExpr is a terminator, or has void type. Making constant folding
potentially change the type to void was very awkward as well.
Anyways, please let me know what you think!
Nick
More information about the llvm-commits
mailing list