[PATCH] D45271: [LV] Introduce TTI::getMinimumVF
Krzysztof Parzyszek via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 5 08:11:48 PDT 2018
kparzysz added inline comments.
================
Comment at: test/Transforms/LoopVectorize/Hexagon/minimum-vf.ll:5
+; Check that TTI::getMinimumVF works.
+; CHECK: LV: Overriding calculated MaxVF(9) with target's minimum: 64
+
----------------
hsaito wrote:
> kparzysz wrote:
> > hsaito wrote:
> > > Please add a comment saying that hard coded 9 comes from the constant trip count, in case someone has to maintain the test case and/or validation later.
> > >
> > I have reduced this testcase. The MaxVF of 9 actually came from register usage calculation. It was a coincidence that the iteration count was 9 as well.
> Can't go something as simple as the equivalent of this? If that's the case why?
> for(i=0;i<9;i++){
> a[i]+=1; // int add
> b[i]+=1; // char add
> }
>
I've tried this testcase, let's call it `sl.c`:
```
int a[9];
char b[9];
void foo() {
for (unsigned i = 0; i < 9; ++i) {
a[i]++;
b[i]++;
}
}
```
Output from `clang -target hexagon -O2 -mhvx -mllvm -hexagon-autohvx -S sl.c -fno-unroll-loops -mllvm -debug-only=loop-vectorize`:
```
LV: Checking a loop in "foo" from sl.c
LV: Interleaving disabled by the pass manager
LV: Loop hints: force=? width=0 unroll=1
LV: Found a loop: for.body
LV: Found an induction variable.
LV: We can vectorize this loop!
LV: Found a loop with a very small trip count. This loop is worth vectorizing only if no scalar iteration overheads are incurred.
LV: Found trip count: 9
LV: The Smallest and Widest types: 8 / 32 bits.
LV: The Widest register safe to use is: 512 bits.
LV(REG): Calculating max register usage:
LV: Found uniform instruction: %exitcond = icmp eq i32 %inc3, 9
LV: Found uniform instruction: %arrayidx = getelementptr inbounds [9 x i32], [9 x i32]* @a, i32 0, i32 %i.08
LV: Found uniform instruction: %arrayidx1 = getelementptr inbounds [9 x i8], [9 x i8]* @b, i32 0, i32 %i.08
LV: Found uniform instruction: %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found uniform instruction: %inc3 = add nuw nsw i32 %i.08, 1
LV: Found scalar instruction: %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found scalar instruction: %inc3 = add nuw nsw i32 %i.08, 1
LV: Found uniform instruction: %exitcond = icmp eq i32 %inc3, 9
LV: Found uniform instruction: %arrayidx = getelementptr inbounds [9 x i32], [9 x i32]* @a, i32 0, i32 %i.08
LV: Found uniform instruction: %arrayidx1 = getelementptr inbounds [9 x i8], [9 x i8]* @b, i32 0, i32 %i.08
LV: Found uniform instruction: %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found uniform instruction: %inc3 = add nuw nsw i32 %i.08, 1
LV: Found scalar instruction: %i.08 = phi i32 [ 0, %entry ], [ %inc3, %for.body ]
LV: Found scalar instruction: %inc3 = add nuw nsw i32 %i.08, 1
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 3
LV(REG): At #5 Interval # 1
LV(REG): At #6 Interval # 2
LV(REG): At #7 Interval # 3
LV(REG): At #9 Interval # 1
LV(REG): At #10 Interval # 1
LV(REG): VF = 32
LV(REG): Found max usage: 2
LV(REG): Found invariant usage: 0
LV(REG): VF = 64
LV(REG): Found max usage: 4
LV(REG): Found invariant usage: 0
LV: Aborting. A tail loop is required with -Os/-Oz.
LV: Vectorization is possible but not beneficial.
LV: Interleaving is not beneficial.
```
The VF calculated above is 64, which it what we would have wanted. On the other hand, with the testcase from this patch, the relevant output looks like this:
```
...
LV(REG): At #106 Interval # 2
LV(REG): At #108 Interval # 1
LV(REG): At #109 Interval # 1
LV(REG): VF = 18
LV(REG): Found max usage: 36
LV(REG): Found invariant usage: 16
LV: Overriding calculated MaxVF(9) with target's minimum: 64
```
Repository:
rL LLVM
https://reviews.llvm.org/D45271
More information about the llvm-commits
mailing list