[LLVMdev] nsw is still logically inconsistent

Dan Gohman gohman at apple.com
Mon Dec 12 12:58:31 PST 2011

The recent discussion of nsw led me to go back and review the
current definition of nsw in greater depth, and it turns out that
even with all the theoretical effort, it's still logically

First, a warning: The scenario below is artificial. This is just
a demonstration. Also, ignore the fact that instcombine would zap
everything. Depending on that would be an implicit pass dependency,
which is against the rules.

Ok, consider this LLVM IR code fragment:

  br i1 %overflow_check, label %no_overflow, label %end

  %t0 = add nsw i32 %a, %b
  %t1 = sext i32 %t0 to i64
  %t2 = ashr i64 %t1, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5

Assume label %no_overflow has no other predecessors. And assume
adding %a and %b can sometimes produce overflow, but only when
%overflow_check is false.

This code has no undefined behavior. It's a bit subtle, but in
particular, note that it's not possible for the udiv to divide by
zero, which would be undefined behavior. The sext produces a
value where the high 33 bits are all identical, the ashr produces
a value which is either all ones or all zeros, the add produces
either 0 or 1, and the icmp ilt is always 1.

We first perform a speculation transformation, hoisting all of the
code above the %overflow_check branch:

  %t0 = add nsw i32 %a, %b
  %t1 = sext i32 %t0 to i64
  %t2 = ashr i64 %t1, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end


Was this valid?

If nsw overflow is immediate undefined behavior, this transformation
would break the program, because the overflow is no longer guarded
by %overflow_check. But a premise of this exercise is that we want
to be able to speculate add nsw instructions. For now, let's assume
that there's a way to define nsw which permits this, with some kind
of deferred undefined behavior semantics.

It's obviously valid to hoist the sext, ashr, add, and icmp that
follow, because those instructions never have side effects.

The udiv is less obvious, but an analysis of the icmp, add, ashr,
and sext above it can show that the denominator of the udiv will
never be zero, so the udiv will never have side effects. Note that
this analysis doesn't actually need to look at the original add nsw
instruction. Climbing up to the sext is enough.

Next, we perform a promotion transformation, converting the add nsw
from i32 to i64:

  %s0 = sext i32 %a to i64
  %s1 = sext i32 %b to i64
  %t0 = add nsw i64 %s0, %s1
  %t2 = ashr i64 %t0, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end


Was this valid?

Any time the new i64 add would produce a different value than the
original sext would have, it would be a case where the 32-bit add
had an overflow. The nsw says that the program would have undefined
behavior in that case if the result is used, so this should be ok.

Unfortunately, the final code is actually broken. If adding %a
and %b as i32 values would overflow, adding them as i64 values
would produce a value for which bit 31 is not equal to bit 32, so
the subsequent ashr would produce a value which is not -1 or 0, the
subsequent add would produce a value which is not 0 or 1, and the
icmp ult would return 0, and the udiv would have undefined
behavior. And it's unconditional.

Consequently, the current definition of nsw is still inconsistent.


More information about the llvm-dev mailing list