[llvm-dev] RFC: Killing undef and spreading poison

Nuno Lopes via llvm-dev llvm-dev at lists.llvm.org
Tue Oct 18 05:06:31 PDT 2016


Hi,

Over the past few years we've been trying to kill poison somehow. There have
been a few proposals, but they've all failed to pass the bar and/or to
gather significant support (my own proposals included).
We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and myself)
have a new proposal to kill undef instead and replace it with poison + a new
'freeze' instruction.  We believe this proposal simplifies things at the IR
level, allows us to fix long-standing bugs in LLVM, and is roughly
performance-neutral for now (and can enable further optimizations in the
future).

Sorry for the longish email.  We will give a talk about this proposal at the
LLVM dev meeting in November, so hopefully we'll be able to discuss this
further in person.

We would appreciate any comments, feedback, suggestions, etc.  If you have
questions let me know as well (not everything below is super detailed, so
please ask where things are not explicit enough).
(I've CC'ed a few people that have been involved in discussions re semantics
of LLVM IR in the past year or so.  Apologies in advance if I forgot someone
-- which probably I did.  I've also CC'ed some CodeGen folks, from which we
would appreciate input on how this proposal fits within the current and the
future pipeline)

Thanks,
Nuno


---------------------------------------------------------


Motivation for undef & poison
=============================
There were a few motivations behind the creation of undef and poison:
 1) There was a desire to exploit certain undefined behaviors without
hurting optimization opportunities.  This led to the creation of weaker
forms of undefined behavior (UB), while leaving "full" UB to operations that
can trap the CPU, like division by zero.  Weaker UB operations include, for
example, signed integer overflow, say 'a + b'.  If signed overflow was full
UB, then we couldn't speculatively execute these operations without proving
that they could not overflow, otherwise the compiler would be introducing UB
in a perfectly fine program. E.g., the compiler couldn't hoist signed
additions out of loops.  In weaker forms of UB, the triggering of full UB is
delayed so that the compiler can speculatively execute these operations and
there's only full UB if the result is somehow "consumed".

 2) Undef was created to represent uninitialized memory reads. For example,
it's very useful in conjunction with PHI nodes:
int a;
if (foo)
  a_0 = 42;
a_1 = phi(a_0, undef)

By using undef, we don't have to materialize some arbitrary constant on a
branch where a variable is not initialized.

3) And then people realized that undef wasn't enough.  For example, we'd
also like to be able to optimize expressions like "((a + 1) >s a)" to
"true", which isn't possible with "undef" as defined in (1), since there's
no value for `%x` that makes `%x > INT_MAX` true.  This was important for
loop analyses and certain InstCombine patterns, and so we needed a stronger
version of UB which was still not full UB. And poison was born. For example,
on 64-bit platforms, poison is very handy for widening induction variables
to 64 bits.



Problems with undef & poison
============================
The interactions between undef and poison are particularly complicated and
they inhibit innocent-looking optimizations.  For example, the following is
wrong:
%v = select %c, %x, undef
  =>
%v = %x

If %x is poison and %c = false, then we've just replaced undef with poison
which is bad, since poison is stronger than undef.  So, this is an example
of something we want to do but we can't at the moment (even if we still do
it anyway, risking miscompilations!)



Goal
====
Our goal was to propose minimal changes to the IR semantics such that 1) we
could keep most of the optimizations we do today, 2) would require minimal
code changes, 3) compilation time, run time, and memory consumption would
stay roughly the same, and 4) we could fix all the longstanding bugs due to
undef/poison semantics and interactions.
We believe this proposal fulfills the goals, and can even enable future
optimizations.



Proposal
========
 1) Kill undef
 2) Create a new poison value (representable in IR, inheriting from
llvm::Constant)
 3) Create a new instruction, '%y = freeze %x', that stops propagation of
poison.  If the input is poison, then it returns an arbitrary, but fixed,
value. (like old undef, but each use gets the *same* value), otherwise it
just returns its input value.
 4) All operations over poison return poison.
For example:
and %x, poison   ->  poison  ; just like before
and 0, poison    ->  poison  ; just like before

%y = freeze poison
%z = and %y, 1  --- 000..0x   ; just like old undef
%w = xor %y, %y                    ; 0 -- not undef: both uses of %y have
the same value

Instruction-specific semantics:
 - br poison -> UB
 - select poison, %a, %b -> poison   (some InstCombine patterns will need
freeze, but they are wrong right now already!)
 - if %c is not poison, select %c, %a, %b is poison  if %c = 1 and %a is
poison, or %c = 0 and %b is poison  (see discussion below)
 - bitcast between types of different bitwidth can return poison if when
concatenating adjacent values one of these values is poison. For example:
%x = bitcast <3 x i2>  <2, poison, 2> to <2 x i3>
  ->
%x = <poison, poison>

%x = bitcast <6 x i2>  <2, poison, 2, 2, 2, 2> to <4 x i3>
  ->
%x = <poison, poison, 5, 2>



Purpose of Freeze
=================
Poison is propagated aggressively throughout. However, there are cases where
this breaks certain optimizations, and therefore freeze is used to
selectively stop poison from being propagated.

A use of freeze is to enable speculative execution.  For example, loop
switching does the following transformation:
while (C) {
  if (C2) {
   A
  } else {
   B
  }
}
=>
if (C2) {
   while (C)
      A
} else {
    while (C)
       B
}

Here we are evaluating C2 before C.  If the original loop never executed
then we had never evaluated C2, while now we do.  So we need to make sure
there's no UB for branching on C2.  Freeze ensures that so we would actually
have 'if (freeze(C2))' instead.
Note that having branch on poison not trigger UB has its own problems.  We
believe this is a good tradeoff.


Another use is, for example, to implement bit-fields:
a.x = 2
becomes:
%v = load %a
%v2 = freeze %v  ; %v could be uninitialized data (poison)
%v3 = ... bitmasking...
store %a, %v3



Performance
===========
We measured compile time, running time, memory consumption, and object/IR
size of a few benchmarks (-O3 vs -O3 w/ freeze).  They should give a rough
picture of the overhead involved.
Benchmarks were run on 2 machines (server1 and server2), both x86_64 running
Ubuntu 16.04.

Summary:
 - Memory consumption: there's generally a 1% increase, with the max of 8%
in oggenc
 - Running time on SPEC: mostly in the noise (about 1% up or down)
 - Compile time: mostly in the noise (about 1% up or down, but usually no
difference)
 - LNT shows 0.5% average slowdown, but with wider swings both ways. We are
aware of a few things that would need tweaking to recover perf like loop
unrolling (likely because of SCEV not knowing what freeze is).

You can see the raw data we have collected in the links below:

Memory consumption:
https://docs.google.com/spreadsheets/d/1ycJaMPLzh_4YV7RQmVaLR-vHzd7ZlYiV2iES
G-lBRbk/edit#gid=0
SPEC running time:
https://docs.google.com/spreadsheets/d/1tAwj-Q5raI4rYD7EEg-tJd-Ex533fy9DezIV
rbfeIns/edit#gid=0
Compilation time:
https://docs.google.com/spreadsheets/d/1_xj6o_ANGcR8xD5Y9rN5VjWbsJaS-GeYKzeW
WAR9O2c/edit#gid=0
Statistics on number of Freeze Instructions (up to 5% in total in gcc):
https://docs.google.com/spreadsheets/d/1mbOpduooEetIR5i9Db07GHbu72a6mLUmkRIF
1SmZ-6Q/edit#gid=0
LNT raw data:
https://github.com/aqjune/freezescript/tree/master/resultcsv-mailinglist/LNT

Server1: Intel Core i7 CPU 870 @ 2.93GHz, 8 GBs RAM
Server2: Intel Core i5-6600 CPU @ 3.30GHz, 8 GBs RAM



Implementation
==============
A prototype is available at:
 - https://github.com/snu-sf/llvm-freeze/tree/x86jmp
 - https://github.com/snu-sf/clang-freeze

This implementation includes:
 - Make clang emit freeze for bit-field codegen
 - Add freeze instruction to IR, plus a few instcombine patterns to simply
freeze(freeze(x)), for example
 - Fix a few instcombine select patterns to introduce freeze when needed
(these were wrong with current semantics already)
 - Add a freeze node to selection DAG and add codegen support.  For codegen,
it uses CopyFromReg+CopyToReg to lower the freeze node, and so it assumes
that from that point on LLVM will not propagate undef anymore (to ensure
that 'freeze undef' always returns the same value for all uses, which is not
what happens with 'undef' on its own). This needs further discussion.
 - Fix loop unswitch to freeze condition
 - Change GVN's load widening to load as vector of bits+bitcast (gvload
branch only; this was not included in the tests above)

(this implementation still uses undef value, but that should be replaced
with poison)



Deployment
==========
A proposal for incremental deployment of the proposed changed:
 1) Add freeze instruction + CodeGen support
 2) Change clang to start emitting freeze for bit-field reads
 3) Add auto-upgrade to convert undef to 'freeze poison' (undef and 'freeze
poison' with one use are equivalent)
 4) Fix InstCombine, Loop unswitching, etc to use freeze
 5) Replace references to undef in the code with either poison or 'freeze
poison'
 7) Kill undef
 8) Investigate any perf regressions
 9) Run John's LLVM IR fuzzer with Alive to find leftover bugs

Regarding perf, at least PR30615 needs fixing to enable efficient load
widening at IR level (which seems it's still undecided whether GVN will
continue doing it or not).



Discussion on select
====================
Select is a tricky instruction to define semantics for.  There are multiple
views of select's purpose that are contradictory:
 1) Select should be equivalent to arithmetic (e.g., allow  'select %c,
true, %x' -> 'or %c, %x' and arithmetic -> select)
 2) br + phi -> select should be allowed (simplifyCFG)
 3) Select -> br + phi should be allowed

To have 1), we need to make select poison if either of its arguments is
poison. This disallows 2), since there we need to have select being poison
only if the dynamically-chosen value is poison.
2) and 3) are orthogonal and do not conflict, though.

3) is hard because it requires select to be stronger than branching
(UB-wise), meaning that we would need select to be UB if the condition was
poison. This blocks a bunch of instcombine patterns, like:
Pre: C < 0
%r = udiv %a, C
  =>
%c = icmp ult %a, C
%r = select %c, 0, 1

If %a is poison, then we would be introducing UB. Adding freeze(%c) would
fix the problem, though.


Since we really want to have 2) since that's simplifyCFG, we propose to make
'select %c, %a, %b' poison if any of the following holds:
 - %c is poison
 - %c = true and %a is poison
 - %c = false and %b is poison

This disallows 3) and some transformations of 1). Since 3) is only performed
in CodeGenPrepare, we can safely ignore it (no aggressive optimizations are
run afterwards). For 1), we will have to restrict InstCombine patterns for
the cases when we are sure a given variable is non-poisonous (which is only
when it came from a 'freeze' instruction or it's the result of a
non-poison-producing instruction).
This semantics also allows arithmetic -> select, but not select ->
arithmetic (without use of freeze).



Acknowledgements
================
I would personally like to thank David, John, and Sanjoy for embarking on
this trip over a year ago and spending a lot of time on this project; Gil
and his students (Juneyoung, Yoonseung, and Youngju) for prototyping and
testing several ideas (and fixing them!); and everybody else that has
contributed in the form of off-line and on-line discussions.



Future work
===========
This proposal only handles integers.  It's a good step forward, but we are
still missing pointers and floats; we are aware of problems (err,
possibilities for improvement) there.  We will work on these next. They are
orthogonal problems, so what we are proposing today won't require further
changes (hopefully).



More information about the llvm-dev mailing list