[LLVMdev] The nsw story

Tue Nov 29 15:21:58 PST 2011

The old and popular tradition of "int" being the default all-purpose type
in C hit a snag with the arrival of 64-bit general-purpose architectures,
when, for reasons of memory usage and compatibility, "int" was fixed
at 32 bits. This meant it was no longer the same size as a pointer, the
size of an array index, or the widest native integer type. (Sequence of
events simplified for brevity.)

The C standard tries to help with a bunch of typedefs: int_fast32_t,
int_least32_t, intptr_t, int32_t, size_t, ptrdiff_t, and more. With all
these great typedefs, one could imagine that there's no reason anyone
should ever use plain "int" ever again. Unfortunately, juggling all these
typedefs can be impractical, for several reasons.

C doesn't have function overloading, so whenever one talks to the standard
library (e.g. to call "abs"), one must know what the underlying types are
(e.g. which of "abs", "absl", or "absll" is needed?). printf also has
a problem, and although the C standard actually tries to help there,
"%" PRIdLEAST32 and "%" PRIdFAST32 have yet to seriously challenge the
mindshare of "%d". And there are other issues.

"int" remains widely popular, even though it isn't quite the close fit
to hardware that it used to be. Consider this simple testcase, which
is representative of a broader issue, containing an innocent use of int:

  void my_copy(char *dst, const char *src, int n) {
    for (int i = 0; i != n; ++i)
      dst[i] = src[i];
  }

On LP64 platforms, this code has implicit sign-extensions to convert
the 32-bit int "i" to 64 bits in the array index expressions. Sign
extensions are relatively fast on most machines, but there's also
the cost of delaying the computations of the addresses for the memory
references, which is significant. And small loops turn small problems
into big problems. The result is around a 12% slowdown on the machine
I happen to be typing this on.

(I'm ignoring loop vectorization and memcpy pattern matching here,
to keep this example simple without loosing too much generality.)

However, C compilers have a way to fix this problem, by repurposing
an ancient hack. Long ago, the C standard was designed to accommodate
machines that didn't use two's complement representations for signed
integers. Different representations lead to different behaviors on
overflow, and the C committee decided to make signed integer arithmetic
portable by declaring that overflow in a signed integer type gets
undefined behavior. Since then, two's complement has largely taken over,
but the C standard still has that rule.

Today, compilers use this rule to promote 32-bit "int" variables to 64
bits to eliminate these troublesome sign extensions. This is considered
valid because any time this would change the behavior of the program,
it would be an overflow in the narrower type, which the rule says is
undefined behavior.

Now consider LLVM. In LLVM, "int" is lowered to "i32" (on LP64 targets),
and signedness is a property of operators, rather than types. For many
years, LLVM was unable to promote "int" variables, because it lacked
the necessary information. Then, I was assigned to fix this. I added
a flag named "nsw", which stands for No Signed Wrap, to be attached to
add instructions and others. It's not a very satisfying name, but it's
sufficiently unique and easy to find in LangRef. The purpose of the nsw
flag is to indicate instructions which are known to have no overflow.

The nsw flag is simple on the surface, but it has complexities. One
is that nsw add is not an associative operation (unlike mathematical
integers, two's complement wrapping signed integers, and unsigned
integers, which are all associative). With nsw, ((a + b) + c) is not
equivalent to (a + (b + c)), because (b + c) might overflow, even if
((a + b) + c) has no overflow. This is inconvenient, but it's manageable.

A much bigger complication is that an add with undefined behavior on
overflow is an instruction that potentially has side effects. Side effects
mean the compiler can't freely move such instructions around.  This is
jarring, because add instructions are the kinds of things that otherwise
can be easily moved around -- they are cheap and safe. An example of this
is short-circuit elimination -- turning && into &. If the right operand
of the && contains an add, it will need to be speculatively executed,
and that isn't safe if it might have side effects.

The first solution for this was to define nsw add to return undef on
overflow, instead of invoking immediate undefined behavior. This way,
overflow can happen and it doesn't cause any harm unless the value
is actually used. With the code motion problem fixed, I proceeded to
implement nsw using this approach, and a sign-extension elimination
optimization on top of it.

However, I later realized that undef isn't actually expressive enough
for this purpose. Undef can produce any value, but it will still be some
specific value of its type (with fluctuation, but that's not important
here). When an integer add is promoted to a wider type, an expression in
the narrower type which would overflow is converted to an expression in
the wider type which returns a value which can't be expressed in the
narrower type. There is no possible value in the narrower type that
undef could take on which would produce the same behavior.

When I asked Chris what to do about this problem, he didn't understand
why I was worrying about it. Admittedly, it is a fairly obscure problem.
But at the time, I was of the opinion that we shouldn't sweep known
semantic problems under the rug. We don't have formal semantics, and we
don't do correctness proofs, but it still seemed that we should respond
when we do notice inconsistencies. I wanted to get to the bottom of it.

I set out to find some way to say that an nsw add which overflows doesn't
invoke immediate undefined behavior, but invokes undefined behavior only
when its result is used. It can't be just any use though, because I didn't
want to rule out the possibility of hoisting chains of instructions,
like a+b+c+d+e. LLVM doesn't usually speculate this aggressively today,
but there are plausible scenarios where it might want to in the future.

This led me to think about making nsw overflow return some kind
of poisoned value, like undef, only more powerful. The poison would
propagate down the def-use graph until it reaches an instruction that
is capable of triggering the undefined behavior.

I called the poison value "trap", because of some superficial
similarity to C's "trap value" concept, although I now believe this to
be mistaken. It really is quite different. Also, it's unrelated to the
llvm.trap intrinsic. It should probably be renamed.

The presence of this poison value means that undefined behavior has been
invoked on a potentially speculative code path. The consequences of the
undefined behavior are deferred until that code path is actually used
on some non-speculative path.

This concept seemed to fit all the requirements. It allows for
arbitrary speculation, it's sufficient for sign-extension elimination,
and it's useful for a handful of other neat tricks as well. I wrote up
a description of this concept, and it's been in LangRef ever since. It
sticks out though, because it got pretty big and complex, especially
in view of its relative obscurity. Realistically speaking, it's
probably not fully watertight yet. But I'm not aware of any fundamental
flaws.

Perhaps the best way to understand the complexity is to imagine writing
a tool to detect this kind of undefined behavior. Detecting signed
overflow in C source is pretty easy: just insert checking code at each
signed operation. But as we've just seen, LLVM IR has different rules
from C. The undefined behavior of nsw doesn't matter until it's
actually observed.

So initially, one might imagine detecting trouble by adding a bit
to every Value to track whether it is a poison value. That's pretty
heavy-weight, but it's doable. It gets worse though, because poison
values can also be transmitted through memory. So one would also
have to add a flag to bytes of memory which are poisoned. But it gets
even worse, because poison values can be used as branch conditions,
so one would have to do advanced CFG analysis to prove whether branch
decisions based on poisoned values actually matter. And it gets worse
yet, because the control structure of the program may be dynamic, so
what one would really need to do is something like non-deterministic
execution. This can easily take exponential amounts of time and memory.

A natural reaction to this problem is to think that LLVM IR is so nice
and pretty that naturally there must be a nice and pretty solution. Here
are some alternatives that have been considered:

 - Go back to using undef for overflow. There were no known real-world
   bugs with this. It's just inconsistent.

 - Define add nsw as a fully side-effecting operation, and accept the
   limits on code motion that this implies. However, as LLVM starts doing
   profile-guided optimizations, and starts thinking about more diverse
   architectures, code speculation will likely become more important.

 - Define add nsw as a fully side-effecting operation, and teach
   optimization passes to strip nsw when moving code past control
   boundaries. This is seen as suboptimal because it prevents subsequent
   passes from making use of the nsw information. And, it's extra work
   for optimization passes.

 - Instead of trying to define dependence in LangRef, just say that if
   changing the value returned from an overflowing add nsw would
   affect the observable behavior of the program, then the behavior of
   the program is undefined. This would reduce the amount of text in
   LangRef, but it would be a weaker definition, and it would require
   sign-extension optimizers and others to do significantly more work
   to establish that their transformations are safe.

 - Give up on nsw and have compilers emit warnings when they are unable
   to perform some optimization due to their inability to exclude the
   possibility of overflow. Obviously the warnings would not be on by
   default, or even -Wall or probably even -Wextra, so -Werror users need
   not revolt. Many people are often faced with code that they cannot
   modify for any number of reasons, and would not be able to make use
   of such warnings. It's an interesting tradeoff, but it's unpopular.

Unfortunately, none of these are completely nice and pretty. Because of
this, and because most people don't care, nsw with all its conceptual
complexity has survived.

Dan