[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Matthew Curtis mcurtis at codeaurora.org
Mon Dec 10 14:13:12 PST 2012


Hello all,

I wanted to get some feedback on this patch for ScalarEvolution.

It addresses a performance problem I am seeing for simple benchmark.

Starting with this C code:

01: signed char foo(void)
02: {
03:   const int count = 8000;
04:   signed char result = 0;
05:   int j;
06:
07:   for (j = 0; j < count; ++j) {
08:     result += (result_t)(3);
09:   }
10:
11:   return result;
12: }

I end up with this IR feeding into Induction Variable Simplification:

01: define signext i8 @foo() nounwind readnone {
02: entry:
03:   br label %for.body
04:
05: for.body:                                         ; preds = %entry, 
%for.body
06:   %j.04 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
07:   %result.03 = phi i32 [ 0, %entry ], [ %add, %for.body ]
08:   %conv2 = and i32 %result.03, 255
09:   %add = add nsw i32 %conv2, 3
10:   %inc = add nsw i32 %j.04, 1
11:   %cmp = icmp slt i32 %inc, 8000
12:   br i1 %cmp, label %for.body, label %for.end
13:
14: for.end:                                          ; preds = %for.body
15:   %conv1 = trunc i32 %add to i8
16:   ret i8 %conv1
17: }

Unfortunately, the 'and' on line 8 prevents Scalar Evolution from
being able to create an expression for '%add' that it knows how to
evaluate.

The patch detects this pattern in createNodeForPHI and creates an
equivalent expression that can be evaluated.

Note that SCEV translates the 'and' into
   ZeroExtend(Truncate(%result.03, i8), i32)

So in terms of SCEV expressions, we essentially have
   %add[n] = Add(ZeroExtend(Truncate(%add[n-1], i8), i32), 3)

(BTW, I'm no scholar here, just a layman, so my terminology is
probably all wrong).

The patch basically moves the ZeroExtend and Truncate outside of the
recurrence, though it must take into account that for iteration n, the
ZeroExtend and Truncate apply to the value at iteration n-1.

For a constant step this is accomplished by subtracting one step from
the recurrence, performing the Truncate and ZeroExtend, and then
adding the step to the result. In other words:

   %add[n] = Add(
               ZeroExtend(
                 Truncate(
                   Minus(AddRec(Start=0,Step=3)[n], 3),
                   i8),
                 i32),
               3)

It's a little more complicated when the Step is another recurrence,
but essentially the same.

Thoughts?

Matthew C.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

-------------- next part --------------
>From e9ca53595ee7a274308c823d14cccd5a8598814d Mon Sep 17 00:00:00 2001
From: Matthew Curtis <mcurtis at codeaurora.org>
Date: Mon, 3 Dec 2012 17:25:20 -0600
Subject: [PATCH] Teach ScalarEvolution to handle IV=add(zext(trunc(IV)),
 Accum)

Code generated for 8-bit and 16-bit induction variables commonly
produces this pattern (on hexagon), where the add is performed as a
32-bit operation, but is preceeded by a truncate and zero-extend (or
equivalent bitwise and).

This change teaches ScalarEvolution to recognize this pattern and
create the right SCEVExpr's so that the induction variable values can
be computed at compile-time.
---
 include/llvm/Analysis/ScalarEvolution.h            |    4 +
 lib/Analysis/ScalarEvolution.cpp                   |   69 +++++++++--
 .../ScalarEvolution/2012-11-30-AddOfAnd.ll         |  125 ++++++++++++++++++++
 3 files changed, 188 insertions(+), 10 deletions(-)
 create mode 100644 test/Analysis/ScalarEvolution/2012-11-30-AddOfAnd.ll

diff --git a/include/llvm/Analysis/ScalarEvolution.h b/include/llvm/Analysis/ScalarEvolution.h
index 235adca..79e545d 100644
--- a/include/llvm/Analysis/ScalarEvolution.h
+++ b/include/llvm/Analysis/ScalarEvolution.h
@@ -654,6 +654,10 @@ namespace llvm {
     const SCEV *getMinusSCEV(const SCEV *LHS, const SCEV *RHS,
                              SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap);
 
+    /// Return a SCEV that truncates 'V' to 'Ty' *and* then zero extends that
+    /// value back to 'V's original type.
+    const SCEV *getTruncateAndZeroExtend(const SCEV *V, Type *Ty);
+
     /// getTruncateOrZeroExtend - Return a SCEV corresponding to a conversion
     /// of the input value to the specified type.  If the type must be
     /// extended, it is zero extended.
diff --git a/lib/Analysis/ScalarEvolution.cpp b/lib/Analysis/ScalarEvolution.cpp
index 7bc848a..5ce4fd2 100644
--- a/lib/Analysis/ScalarEvolution.cpp
+++ b/lib/Analysis/ScalarEvolution.cpp
@@ -2770,6 +2770,17 @@ const SCEV *ScalarEvolution::getMinusSCEV(const SCEV *LHS, const SCEV *RHS,
   return getAddExpr(LHS, getNegativeSCEV(RHS), Flags);
 }
 
+const SCEV *
+ScalarEvolution::getTruncateAndZeroExtend(const SCEV *V, Type *Ty) {
+  Type *SrcTy = V->getType();
+  assert((SrcTy->isIntegerTy() || SrcTy->isPointerTy()) &&
+         (Ty->isIntegerTy() || Ty->isPointerTy()) &&
+         "Cannot truncate and zero extend with non-integer arguments!");
+  const SCEV *trunc = getTruncateExpr(V, Ty);
+  const SCEV *zext = getZeroExtendExpr(trunc, V->getType());
+  return zext;
+}
+
 /// getTruncateOrZeroExtend - Return a SCEV corresponding to a conversion of the
 /// input value to the specified type.  If the type must be extended, it is zero
 /// extended.
@@ -3029,13 +3040,22 @@ const SCEV *ScalarEvolution::createNodeForPHI(PHINode *PN) {
         if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(BEValue)) {
           // If there is a single occurrence of the symbolic value, replace it
           // with a recurrence.
+          const SCEVZeroExtendExpr *zext = 0;
+          const SCEVTruncateExpr *trunc = 0;
           unsigned FoundIndex = Add->getNumOperands();
-          for (unsigned i = 0, e = Add->getNumOperands(); i != e; ++i)
-            if (Add->getOperand(i) == SymbolicName)
-              if (FoundIndex == e) {
-                FoundIndex = i;
-                break;
-              }
+          for (unsigned i = 0, e = Add->getNumOperands(); i != e; ++i) {
+            const SCEV *Op = Add->getOperand(i);
+            if (Op == SymbolicName) {
+              FoundIndex = i;
+              break;
+            }
+            if ((zext = dyn_cast<SCEVZeroExtendExpr>(Op)) &&
+                (trunc = dyn_cast<SCEVTruncateExpr>(zext->getOperand())) &&
+                (trunc->getOperand() == SymbolicName)) {
+              FoundIndex = i;
+              break;
+            }
+          }
 
           if (FoundIndex != Add->getNumOperands()) {
             // Create an add with everything but the specified operand.
@@ -3044,10 +3064,10 @@ const SCEV *ScalarEvolution::createNodeForPHI(PHINode *PN) {
               if (i != FoundIndex)
                 Ops.push_back(Add->getOperand(i));
             const SCEV *Accum = getAddExpr(Ops);
-
+            bool AccumIsLoopInvariant = isLoopInvariant(Accum, L);
             // This is not a valid addrec if the step amount is varying each
             // loop iteration, but is not itself an addrec in this loop.
-            if (isLoopInvariant(Accum, L) ||
+            if (AccumIsLoopInvariant ||
                 (isa<SCEVAddRecExpr>(Accum) &&
                  cast<SCEVAddRecExpr>(Accum)->getLoop() == L)) {
               SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap;
@@ -3071,11 +3091,40 @@ const SCEV *ScalarEvolution::createNodeForPHI(PHINode *PN) {
               }
 
               const SCEV *StartVal = getSCEV(StartValueV);
-              const SCEV *PHISCEV = getAddRecExpr(StartVal, Accum, L, Flags);
+              const SCEV *PHISCEV;
+              if (zext && trunc) {
+                // We have an induction variable 'iv' of the form:
+                //   iv=add(zext(trunc(iv)), Accum)
+                // The basic strategy is to turn this into
+                //   iv'=add(iv',Accum)
+                //   iv=zext(trunc(iv'-Accum))+Accum
+                if (AccumIsLoopInvariant) {
+                  // Accum is a simple constant, just adjust the start value.
+                  PHISCEV = getMinusSCEV(StartVal, Accum);
+                  PHISCEV = getAddRecExpr(PHISCEV, Accum, L, Flags);
+                } else if (const SCEVAddRecExpr *AccumAddRec =
+                           dyn_cast<SCEVAddRecExpr>(Accum)) {
+                  // Accum is another recurrence, apply some recurrence algebra
+                  //       iv': {StartVal,+,Accum}
+                  //     Accum: {a0,+,a1}
+                  // iv'-accum: {StartVal-a0,+,Accum-a1}
+                  const SCEV *SV =
+                    getMinusSCEV(StartVal, AccumAddRec->getOperand(0));
+                  const SCEV *Ac =
+                    getMinusSCEV(Accum, AccumAddRec->getStepRecurrence(*this));
+                  PHISCEV = getAddRecExpr(SV, Ac, L, Flags);
+                } else {
+                  assert("Unexpected type of Accum");
+                }
+                PHISCEV = getTruncateAndZeroExtend(PHISCEV, trunc->getType());
+                PHISCEV = getAddExpr(PHISCEV, Accum);
+              } else {
+                PHISCEV = getAddRecExpr(StartVal, Accum, L, Flags);
+              }
 
               // Since the no-wrap flags are on the increment, they apply to the
               // post-incremented value as well.
-              if (isLoopInvariant(Accum, L))
+              if (AccumIsLoopInvariant)
                 (void)getAddRecExpr(getAddExpr(StartVal, Accum),
                                     Accum, L, Flags);
 
diff --git a/test/Analysis/ScalarEvolution/2012-11-30-AddOfAnd.ll b/test/Analysis/ScalarEvolution/2012-11-30-AddOfAnd.ll
new file mode 100644
index 0000000..2e8ebd4
--- /dev/null
+++ b/test/Analysis/ScalarEvolution/2012-11-30-AddOfAnd.ll
@@ -0,0 +1,125 @@
+; RUN: opt < %s -S -scalar-evolution -analyze | FileCheck %s
+;
+; Verify that ScalarEvolution handles induction variables of the form
+;   iv=add(and(iv,(2^n)-1), Accum)
+;
+
+; CHECK: @foo_6
+; CHECK: %add =
+; CHECK-NEXT: Exits: 64
+define i32 @foo_6() nounwind readnone {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %entry, %for.body
+  %storemerge2 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %conv = and i32 %0, 63
+  %add = add nsw i32 %conv, 3
+  %inc = add nsw i32 %storemerge2, 1
+  %cmp = icmp slt i32 %inc, 8000
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:                                          ; preds = %for.body
+  %fold = add i32 %0, 3
+  %conv2 = and i32 %fold, 63
+  ret i32 %conv2
+}
+
+; CHECK: @foo_8
+; CHECK: %add =
+; CHECK-NEXT: Exits: 192
+define i32 @foo_8() nounwind readnone {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %entry, %for.body
+  %storemerge2 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %conv = and i32 %0, 255
+  %add = add nsw i32 %conv, 3
+  %inc = add nsw i32 %storemerge2, 1
+  %cmp = icmp slt i32 %inc, 8000
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:                                          ; preds = %for.body
+  %fold = add i32 %0, 3
+  %conv2 = and i32 %fold, 255
+  ret i32 %conv2
+}
+
+; CHECK: @foo_11
+; CHECK: %add =
+; CHECK-NEXT: Exits: 1472
+define i32 @foo_11() nounwind readnone {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %entry, %for.body
+  %storemerge2 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %conv = and i32 %0, 2047
+  %add = add nsw i32 %conv, 3
+  %inc = add nsw i32 %storemerge2, 1
+  %cmp = icmp slt i32 %inc, 8000
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:                                          ; preds = %for.body
+  %fold = add i32 %0, 3
+  %conv2 = and i32 %fold, 2047
+  ret i32 %conv2
+}
+
+
+; CHECK: @foo_16
+; CHECK: %add =
+; CHECK-NEXT: Exits: 24000
+define i32 @foo_16() nounwind readnone {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %entry, %for.body
+  %storemerge2 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %conv = and i32 %0, 65535
+  %add = add nsw i32 %conv, 3
+  %inc = add nsw i32 %storemerge2, 1
+  %cmp = icmp slt i32 %inc, 8000
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:                                          ; preds = %for.body
+  %fold = add i32 %0, 3
+  %conv2 = and i32 %fold, 65535
+  ret i32 %conv2
+}
+
+; CHECK: @kernel
+; CHECK: %add =
+; CHECK-NEXT: Exits: (9 + (zext i8 (36 + (trunc i32 %0 to i8)) to i32))
+define signext i8 @kernel() nounwind readnone {
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:                              ; preds = %entry, %for.inc7
+  %storemerge7 = phi i32 [ 0, %entry ], [ %inc8, %for.inc7 ]
+  %0 = phi i32 [ 0, %entry ], [ %add, %for.inc7 ]
+  br label %for.body3
+
+for.body3:                                        ; preds = %for.cond1.preheader, %for.body3
+  %storemerge15 = phi i32 [ 0, %for.cond1.preheader ], [ %inc, %for.body3 ]
+  %1 = phi i32 [ %0, %for.cond1.preheader ], [ %add, %for.body3 ]
+  %conv2 = and i32 %1, 255
+  %add = add nsw i32 %conv2, %storemerge15
+  %conv6 = trunc i32 %add to i8
+  %inc = add nsw i32 %storemerge15, 1
+  %cmp2 = icmp slt i32 %inc, 10
+  br i1 %cmp2, label %for.body3, label %for.inc7
+
+for.inc7:                                         ; preds = %for.body3
+  %inc8 = add nsw i32 %storemerge7, 1
+  %cmp = icmp slt i32 %inc8, 10
+  br i1 %cmp, label %for.cond1.preheader, label %for.end9
+
+for.end9:                                         ; preds = %for.inc7
+  ret i8 %conv6
+}
-- 
1.7.8.3



More information about the llvm-dev mailing list