[PATCH] D45271: [LV] Introduce TTI::getMinimumVF

Thu Apr 5 08:11:48 PDT 2018

kparzysz added inline comments.

================
Comment at: test/Transforms/LoopVectorize/Hexagon/minimum-vf.ll:5
+; Check that TTI::getMinimumVF works.
+; CHECK: LV: Overriding calculated MaxVF(9) with target's minimum: 64
+
----------------
hsaito wrote:
> kparzysz wrote:
> > hsaito wrote:
> > > Please add a comment saying that hard coded 9 comes from the constant trip count, in case someone has to maintain the test case and/or validation later.
> > > 
> > I have reduced this testcase.  The MaxVF of 9 actually came from register usage calculation.  It was a coincidence that the iteration count was 9 as well.
> Can't go something as simple as the equivalent of this? If that's the case why?
> for(i=0;i<9;i++){
>    a[i]+=1; // int add
>    b[i]+=1; // char add
> }
> 
I've tried this testcase, let's call it `sl.c`:

```
int a[9];
char b[9];

void foo() {
  for (unsigned i = 0; i < 9; ++i) {
    a[i]++;
    b[i]++;
  }
}
```

Output from `clang -target hexagon -O2 -mhvx -mllvm -hexagon-autohvx -S sl.c -fno-unroll-loops -mllvm -debug-only=loop-vectorize`:

```
LV: Checking a loop in "foo" from sl.c
LV: Interleaving disabled by the pass manager
LV: Loop hints: force=? width=0 unroll=1
LV: Found a loop: for.body
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found a loop with a very small trip count. This loop is worth vectorizing only if no scalar iteration overheads are incurred.
LV: Found trip count: 9
LV: The Smallest and Widest types: 8 / 32 bits.
LV: The Widest register safe to use is: 512 bits.
LV(REG): Calculating max register usage:
LV: Found uniform instruction:   %exitcond = icmp eq i32 %inc3, 9
LV: Found uniform instruction:   %arrayidx = getelementptr inbounds [9 x i32], [9 x i32]* @a, i32 0, i32 %i.08
LV: Found uniform instruction:   %arrayidx1 = getelementptr inbounds [9 x i8], [9 x i8]* @b, i32 0, i32 %i.08
LV: Found uniform instruction:   %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found uniform instruction:   %inc3 = add nuw nsw i32 %i.08, 1
LV: Found scalar instruction:   %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found scalar instruction:   %inc3 = add nuw nsw i32 %i.08, 1
LV: Found uniform instruction:   %exitcond = icmp eq i32 %inc3, 9
LV: Found uniform instruction:   %arrayidx = getelementptr inbounds [9 x i32], [9 x i32]* @a, i32 0, i32 %i.08
LV: Found uniform instruction:   %arrayidx1 = getelementptr inbounds [9 x i8], [9 x i8]* @b, i32 0, i32 %i.08
LV: Found uniform instruction:   %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found uniform instruction:   %inc3 = add nuw nsw i32 %i.08, 1
LV: Found scalar instruction:   %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found scalar instruction:   %inc3 = add nuw nsw i32 %i.08, 1
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #5 Interval # 1
LV(REG): At #6 Interval # 2
LV(REG): At #7 Interval # 3
LV(REG): At #9 Interval # 1
LV(REG): At #10 Interval # 1
LV(REG): VF = 32
LV(REG): Found max usage: 2
LV(REG): Found invariant usage: 0
LV(REG): VF = 64
LV(REG): Found max usage: 4
LV(REG): Found invariant usage: 0
LV: Aborting. A tail loop is required with -Os/-Oz.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.
```

The VF calculated above is 64, which it what we would have wanted.  On the other hand, with the testcase from this patch, the relevant output looks like this:

```
...
LV(REG): At #106 Interval # 2
LV(REG): At #108 Interval # 1
LV(REG): At #109 Interval # 1
LV(REG): VF = 18
LV(REG): Found max usage: 36
LV(REG): Found invariant usage: 16
LV: Overriding calculated MaxVF(9) with target's minimum: 64
```

Repository:
  rL LLVM

https://reviews.llvm.org/D45271