[PATCH] D40645: [SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs

Thu Aug 16 23:31:59 PDT 2018

Hi Roman,

Thanks for finding this issue. I think you are right, and the fact that we may end up assigining NoWrap flags after the SCEV was created is extremely annoying by many means. I think that the underlying problems against which it was made could've been fixed differently. We've put a lot of efforts to avoid creation of huge SCEVs.

As far as I'm aware, we only set NoWraps flag lately for AddRecs and never for Adds/Muls. If it is so, it still makes sense to do this to save some compile time for adds/muls. Please feel free to either prohibit this for AddRecs or just revert this patch with adding a TODO that it still is a valid thing for add/muls and checking in your example.

Thanks,
Max


From: rtereshin at apple.com [mailto:rtereshin at apple.com]
Sent: Friday, August 17, 2018 1:22 PM
To: Maxim Kazantsev <max.kazantsev at azul.com>
Cc: sanjoy at playingwithpointers.com; listmail at philipreames.com; javed.absar at arm.com; llvm-commits <llvm-commits at lists.llvm.org>; junbuml at codeaurora.org; wmi at google.com; marcello.maggioni at gmail.com; Justin Bogner <jbogner at apple.com>
Subject: Re: [PATCH] D40645: [SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs

Hi all,

I just hit a real-world scenario where the problem I've described in the previous comment actually occurs and causes 2 identical SCEV expressions to have different order of operands and therefore compare not equal, which in its turn prevents LoadStoreVectorizer from vectorizing a pair of consecutive loads.
With the patch reverted the issue goes away.

Please run

$ opt -load-store-vectorizer reproducer.ll -S -o -

to see the difference in vectorization.

If the following patch is also applied:

diff --git a/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp b/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
index e2ee87b2086..74be4d4c87b 100644
--- a/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ b/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -445,6 +445,12 @@ bool Vectorizer::lookThroughComplexAddresses(Value *PtrA, Value *PtrB,
   const SCEV *OffsetSCEVB = SE.getSCEV(OpB);
   const SCEV *C = SE.getConstant(IdxDiff.trunc(BitWidth));
   const SCEV *X = SE.getAddExpr(OffsetSCEVA, C);
+  if (OpA->getName() == "preheader.address0.idx" &&
+      OpB->getName() == "common.address.idx") {
+    OffsetSCEVA->dump();
+    X->dump();
+    OffsetSCEVB->dump();
+  }
   return X == OffsetSCEVB;
 }

you will see the following output as well:

(((56 + (zext i2 (trunc i32 %tmp2 to i2) to i32) + (4 * (zext i1 (trunc i32 (%tmp2 /u 4) to i1) to i32))<nuw><nsw>) * %tmp) + (4 * (zext i1 (trunc i32 (%tmp2 /u 2) to i1) to i32))<nuw><nsw>)
(1 + ((56 + (zext i2 (trunc i32 %tmp2 to i2) to i32) + (4 * (zext i1 (trunc i32 (%tmp2 /u 4) to i1) to i32))<nuw><nsw>) * %tmp) + (4 * (zext i1 (trunc i32 (%tmp2 /u 2) to i1) to i32))<nuw><nsw>)
(1 + (4 * (zext i1 (trunc i32 (%tmp2 /u 2) to i1) to i32))<nuw><nsw> + ((56 + (zext i2 (trunc i32 %tmp2 to i2) to i32) + (4 * (zext i1 (trunc i32 (%tmp2 /u 4) to i1) to i32))<nuw><nsw>) * %tmp))

As you can see, second and third SCEVAddExpr's are the same but sorted differently.

On a larger example (the source of the reproducer.ll attached, which is a bugpoint) I have seen even weirder behavior: adding a constant to an existing SCEV changes the order of the existing terms,
for instance, getAddExpr(1, ((A * B) + (C * D))) returns (1 + (C * D) + (A * B)).




Thanks,
Roman



target triple = "x86_64--"

@global_value0 = external constant i32
@global_value1 = external constant i32
@other_value = external global float
@a = external global float
@b = external global float
@c = external global float
@d = external global float
@plus1 = external global i32
@cnd = external global i8

; Function Attrs: nounwind
define void @main() local_unnamed_addr #0 {
entry:
  %tmp = load i32, i32* @global_value0, !range !0
  %tmp2 = load i32, i32* @global_value1
  %and.i.i = and i32 %tmp2, 2
  %add.nuw.nsw.i.i = add nuw nsw i32 %and.i.i, 0
  %mul.i.i = shl nuw nsw i32 %add.nuw.nsw.i.i, 1
  %and6.i.i = and i32 %tmp2, 3
  %and9.i.i = and i32 %tmp2, 4
  %add.nuw.nsw10.i.i = add nuw nsw i32 %and6.i.i, %and9.i.i
  %conv3.i42.i = add nuw nsw i32 %mul.i.i, 1
  %reass.add346.7 = add nuw nsw i32 %add.nuw.nsw10.i.i, 56
  %reass.mul347.7 = mul nuw nsw i32 %tmp, %reass.add346.7
  %add7.i.7 = add nuw nsw i32 %reass.mul347.7, 0
  %preheader.address0.idx = add nuw nsw i32 %add7.i.7, %mul.i.i
  %preheader.address0.idx.zext = zext i32 %preheader.address0.idx to i64
  %preheader.load0.address = getelementptr inbounds float, float* @other_value, i64 %preheader.address0.idx.zext
  %preheader.load0. = load float, float* %preheader.load0.address, align 4, !tbaa !1
  %common.address.idx = add nuw nsw i32 %add7.i.7, %conv3.i42.i
  %preheader.header.common.address.idx.zext = zext i32 %common.address.idx to i64
  %preheader.load1.address = getelementptr inbounds float, float* @other_value, i64 %preheader.header.common.address.idx.zext
  %preheader.load1. = load float, float* %preheader.load1.address, align 4, !tbaa !1
  br label %for.body23

for.body23:                                       ; preds = %for.body23, %entry
  %loop.header.load0.address = getelementptr inbounds float, float* @other_value, i64 %preheader.header.common.address.idx.zext
  %loop.header.load0. = load float, float* %loop.header.load0.address, align 4, !tbaa !1
  %reass.mul343.7 = mul nuw nsw i32 %reass.add346.7, 72
  %add7.i286.7.7 = add nuw nsw i32 %reass.mul343.7, 56
  %add9.i288.7.7 = add nuw nsw i32 %add7.i286.7.7, %mul.i.i
  %loop.header.address1.idx = add nuw nsw i32 %add9.i288.7.7, 1
  %loop.header.address1.idx.zext = zext i32 %loop.header.address1.idx to i64
  %loop.header.load1.address = getelementptr inbounds float, float* @other_value, i64 %loop.header.address1.idx.zext
  %loop.header.load1. = load float, float* %loop.header.load1.address, align 4, !tbaa !1
  store float %preheader.load0., float* @a, align 4, !tbaa !1
  store float %preheader.load1., float* @b, align 4, !tbaa !1
  store float %loop.header.load0., float* @c, align 4, !tbaa !1
  store float %loop.header.load1., float* @d, align 4, !tbaa !1
  %loaded.cnd = load i8, i8* @cnd
  %condition = trunc i8 %loaded.cnd to i1
  br i1 %condition, label %for.body23, label %exit

exit:
  ret void
}

attributes #0 = { nounwind }

!0 = !{i32 0, i32 65536}
!1 = !{!2, !2, i64 0}
!2 = !{!"float", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C++ TBAA"}


> On Aug 10, 2018, at 5:13 PM, Roman Tereshin via Phabricator <reviews at reviews.llvm.org<mailto:reviews at reviews.llvm.org>> wrote:
>
> rtereshin added inline comments.
> Herald added a subscriber: javed.absar.
>
>
> ================
> Comment at: llvm/trunk/lib/Analysis/ScalarEvolution.cpp:699
> +    if (LA->getNoWrapFlags() != RA->getNoWrapFlags())
> +      return (int)LA->getNoWrapFlags() - (int)RA->getNoWrapFlags();
> +
> ----------------
> Hi everyone,
>
> Unfortunately, wrapping flags are not a part of SCEV's identity (they do not participate in computing a hash value or in equality comparisons) and in fact they could be assigned after the fact w/o rebuilding a SCEV.
>
> At least, last time I checked it was that way. Grep for const_cast's to see quite a few of example, AFAIR all for AddRec's.
>
> So, if 2 expressions get built in 2 slightly different ways: one with flags set in the beginning, the other with the flags attached later on, we may end up with 2 expressions which are exactly the same but have their operands swapped in one of the commutative N-ary expressions, and at least one of them will have "sorted by complexity" invariant broken.
>
> 2 identical SCEV's won't compare equal by pointer comparison as they are supposed to.
>
> I didn't verify this with an experiment yet, please correct me if I'm wrong.
>
> Thanks,
> Roman
>
>
> Repository:
>  rL LLVM
>
> https://reviews.llvm.org/D40645
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180817/903a6d74/attachment.html>