[llvm-dev] Improving SCEV's behavior around IR level no-wrap flags

Fri Sep 23 19:36:20 PDT 2016

Hi Christof,

Christof Douma wrote:
 > As far as I understand this, the reasoning is that in SCEV the flags
 > must be interpreted as UB/not-allowed to happen because of SCEV
 > expressions can represent instructions in different contexts (e.g. in
 > different execution paths). In the IR these flags mean poison. This is
 > a semantical mismatch that gives a big proof burden on SCEV.
 >
 > To give an example (test/Analysis/ScalarEvolution/nsw.ll - addnsw):
 >
 > 01: entry:
 > 02:  %tmp = add i32 %a, %b
 > 03:  %cmp = icmp sgt i32 %tmp, 0
 > 04:  br i1 %cmp, label %greater, label %exit
 > 05: greater:
 > 06:   %tmp2 = add nsw i32 %a, %b
 > 07:  br label %exit
 > 08: exit:
 > 09:  %result = phi i32 [ %a, %entry ], [ %tmp2, %greater ]
 > 10:  ret i32 %result
 >

 > In the  following example line 2  and 6 are mapped to  a single SCEV
 > expressions: "(%a  + %b)”.  The SCEV  version of  ‘NSW' does  not hold
 > because at line 2 the instruction might actually wrap around.
 >
 > Up till now I had not realised that sharing of SCEV expressions was
 > the only reason for SCEV to have this different semantics.

Yeah, that's how poison is intended to work.  The branch adds nothing,
you have the same problem with

 > 01:  %tmp = add i32 %a, %b
 > 02:  %cmp = icmp sgt i32 %tmp, 0
 > 03:  %tmp2 = add nsw i32 %a, %b

However, lack of context sensitivity does mean that in

   %t0 = add i32 %a, 1
   if (%a s< 100)
     %t1 = add i32 %a, 1

SCEV cannot mark the expression for %t1 as nsw, even though that
addition won't sign-overflow; since, as you said, both %t0 and %t1
will be mapped to the same SCEV expression, in the current scheme as
well as in the new scheme I'm proposing.

In the new scheme, there is a solution for cases like the above: teach
instcombine or LVI etc. to transform that IR to

   %t0 = add i32 %a, 1
   if (%a s< 100)
     %t1 = add nsw i32 %a, 1

and then let the normal logic (that will be newly added in SCEV) kick
in.

 > I can’t help to ask. Why not define a wrapping nsw instruction as
 > UB, instead of “delayed UB” aka poison? I believe we have the notion
 > of poison solely to ease the movement of instructions. In my example
 > above line 6 can be moved to line 2 due to poison. If the wrapping was
 > UB such a move is still possible but would require dropping the nsw
 > flag, nothing more. If this information loss is considered a problem,
 > the use of llvm.assume can counter that.
 >
 > Has somebody ever looked at not having poison at all? What are the
 > downsides of that?

That's just the way things are at the moment.

We _could_ change that (and make overflowing nsw/nuw operations
undefined behavior) but since that is a very deep change to LLVM IR
(basically arithmetic will be sometimes non-speculatable), it has a very
high bar for acceptance.

-- Sanjoy