[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution

Sanjoy Das sanjoy at playingwithpointers.com
Sat Jun 27 00:16:20 PDT 2015


Hi Bjarke,

Your proposal sounds reasonable and I can't immediately spot any
fundamental issues.  However, there is yet another way to skin this
cat that you may want to consider:

First of all, going by the "poison causes UB only when observed", SCEV
does not do the right thing currently:


declare void @use(i64)

define void @f() {
 entry:
  br label %loop

 loop:
  %idx = phi i32 [ %idx.inc, %loop ], [ 0, %entry ]
  %idx2 = phi i32 [ %idx2.inc, %loop ], [ 0, %entry ]

  %idx.inc = add nsw i32 %idx, 1
  %idx2.inc = add i32 %idx2, 1
  %idx2.sext = sext i32 %idx2 to i64
  call void @use(i64 %idx2.sext)
  br label %loop
}


opt -analyze -scalar-evolution ==>

Classifying expressions for: @f
  %idx = phi i32 [ %idx.inc, %loop ], [ 0, %entry ]
  -->  {0,+,1}<nuw><nsw><%loop> U: [0,-2147483648) S: [0,-2147483648)
         Exits: <<Unknown>>
  %idx2 = phi i32 [ %idx2.inc, %loop ], [ 0, %entry ]
  -->  {0,+,1}<nuw><nsw><%loop> U: [0,-2147483648) S: [0,-2147483648)
         Exits: <<Unknown>>
  %idx.inc = add nsw i32 %idx, 1
  -->  {1,+,1}<nuw><nsw><%loop> U: [1,-2147483648) S: [1,-2147483648)
         Exits: <<Unknown>>
  %idx2.inc = add i32 %idx2, 1
  -->  {1,+,1}<nuw><nsw><%loop> U: [1,-2147483648) S: [1,-2147483648)
         Exits: <<Unknown>>
  %idx2.sext = sext i32 %idx2 to i64
  -->  {0,+,1}<nuw><nsw><%loop> U: [0,-9223372036854775808) S:
[0,-9223372036854775808)         Exits: <<Unknown>>


SCEV concludes that %idx2 is nsw, even though there is no side
effecting, observable use of %idx.


One way to get what you want and also fix this existing bug(?) is to
make the no wrap flags part of a SCEV expression's identity (i.e. make
{A,+,B}<nuw> a different expression from {A,+,B}).  If you start with
the precondition that every <nuw> [0] came from a <nuw> [1] present in
the IR, then you don't really have to worry about control dependence
-- you can always optimize under the assumption that the SCEV
expression does not unsign-overflow; if such an optimization changes
program behavior then that program has UB since it was data dependent
on poison [2].  We could still use the analysis you suggest and
exploit poison-caused-UB to optimize wrapping add recs to non-wrapping
add recs, but that's a separate optimization that may not be needed in
IR generated from C/C++ frontends (since interesting arithmetic will
already be marked <nuw>).

All of this is based on what I understand the _intent_ of poison to
be.  I'll leave it to David to comment on what the future holds for
nuw, nsw and poison in LLVM's specification.

-- Sanjoy

[0]: identical reasoning applies to <nsw>, leaving out <nsw> to avoid
clutter

[1]: it can also come from SCEV *proving* that the expression does not
unsign-overflow, for instance by looking at trip counts

[2]: in other words, it is always OK to replace zext(a +nuw b) with
zext(a) + zext(b).  If the program behavior changes observably by this
substitution then the program had UB.

On Fri, Jun 26, 2015 at 4:01 PM, Bjarke Roune <broune at google.com> wrote:
> *** Summary
> I'd like to propose (and implement) functionality in LLVM to determine when
> a poison value from an instruction is guaranteed to produce undefined
> behavior. I want to use that to improve handling of nsw, inbounds etc. flags
> in scalar evolution and LSR. I imagine that there would be other uses for
> it. I'd like feedback on this idea before I proceed with it.
>
>
> *** Details
> Poison values do produce undefined behavior if the poison becomes externally
> observable. A load or store to a poison address value is externally
> observable and I'd like to use that in a simple analysis pass to derive
> guarantees that certain overflows would produce undefined behavior, not just
> poison.
>
> Scalar evolution (and hence LSR) cannot currently make much use of the nsw
> and similar flags on instructions. That is because two instructions can map
> to the same scev even if one instruction has the nsw flag and the other one
> does not. If we applied the nsw flag to the scev, the scev for the
> instruction without the nsw flag would then incorrectly have the nsw flag.
>
> Scalar evolution would be able to use the nsw flag from an instruction for
> recurrences when the loop header dominates the entire loop, the instruction
> with nsw post-dominates the loop header and undefined behavior is guaranteed
> on wrap via the poison value analysis pass that I'd like to write.
>
> What do you think? Do we already have something similar to this?
>
> Bjarke
>
>
>
> *** PS: What got me thinking about this:
> My immediate motivation is that I'd like LSR to be able to create induction
> variables for expressions like &ptr[i + offset] where i and offset are 32
> bit integers, ptr is a loop-invariant 64 bit pointer, i is an induction
> variable and offset is loop-invariant. For that to happen, scalar evolution
> needs to propagate the nsw flag from i + offset to the scev so that it can
> transform
>
>   ((4 * (sext i32 {%offset,+,1}<nw><%loop> to i64))<nsw> + %ptr)<nsw>
>
> to
>
>   {((4 * (sext i32 %offset to i64)) + %ptr),+,4}<nsw><%loop>
>
> Currently the inner <nsw> is actually <nw>, which blocks the transformation
> (the outer two nsw's shouldn't currently be there either, as it's the same
> issue for inbounds on GEP: see llvm bug 23527)



More information about the llvm-dev mailing list