[PATCH] D54468: [LoadStoreVectorizer] Fix infinite loop in reorder.
Roman Tereshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 12 19:21:47 PST 2019
rtereshin added inline comments.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:497
+ i--;
+ continue;
+ }
----------------
Hi Bevin,
Thanks for looking into this, apologies it took me so long to notice there is a review on me.
I don't think this is a reliable fix. It appears that it will only work if `SimplifyInstruction` is capable of simplifying-away all the "false" data dependencies between memory operations `Vectorizer::isConsecutiveAccess` is able to prove consecutive. I don't think there is anything (or should be) that guarantees such parity.
To illustrate the problem, let me start with a minimized version of the test you're adding in this patch, for which the `SimplifyInstruction`-based fix does work:
```lang=llvm
target triple = "aarch64"
; Function Attrs: noinline nounwind
define i32 @test([2 x i32]* %array) #0 {
entry:
%arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 0, i32 1
%t = load i32, i32* %arrayidx, align 4
%rem = urem i32 %t, 1
%arrayidx2 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %rem, i32 0
%v = load i32, i32* %arrayidx2, align 4
%r = add i32 %v, %t
ret i32 %r
}
attributes #0 = { noinline nounwind }
```
The false dependency is created by `%rem = urem i32 %t, 1` and it's trivial enough for `SimplifyInstruction` to constant fold, eliminating the dependency.
Here is the same test with the expression made more complex every so slightly:
```lang=llvm
target triple = "aarch64"
; Function Attrs: noinline nounwind
define i32 @test([2 x i32]* %array, i32 %idx) #0 {
entry:
%ptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx, i32 1
%t = load i32, i32* %ptr1, align 4
%t2 = mul i32 %t, 2
%idx.p.2t = add i32 %idx, %t2
%idx.p.t = sub i32 %idx.p.2t, %t
%idx.another = sub i32 %idx.p.t, %t
%ptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx.another, i32 0
%v = load i32, i32* %ptr0, align 4
%r = add i32 %v, %t
ret i32 %r
}
attributes #0 = { noinline nounwind }
```
The expression for `i32 %idx.another` is roughly `(((2 * %t) + %idx) - %t) - %t` and it's trivial enough for ScalarEvolution alone to see through:
```
%ptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx, i32 1
--> (4 + (8 * (sext i32 %idx to i64))<nsw> + %array) U: full-set S: full-set
%t = load i32, i32* %ptr1, align 4
--> %t U: full-set S: full-set
%t2 = mul i32 %t, 2
--> (2 * %t) U: [0,-1) S: [-2147483648,2147483647)
%idx.p.2t = add i32 %idx, %t2
--> ((2 * %t) + %idx) U: full-set S: full-set
%idx.p.t = sub i32 %idx.p.2t, %t
--> (%idx + %t) U: full-set S: full-set
%idx.another = sub i32 %idx.p.t, %t
--> %idx U: full-set S: full-set
%ptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %array, i32 %idx.another, i32 0
--> ((8 * (sext i32 %idx to i64))<nsw> + %array)<nsw> U: full-set S: full-set
```
but it isn't trivial enough for `SimplifyInstruction` to eliminate the false dependency: LSV hangs on this example with and w/o this patch both.
If we pursue roughly the same approach, we could use ScalarEvolution Expander instead: if we re-materialize the SCEV of the pointer being used when we issue the vectorized version of the load or store (and keep `reorder` method as it currently is TOT), we could probably avoid the problem more reliably just because `Vectorizer::isConsecutiveAccess` is largely based on ScalarEvolution and there is more parity between what they can and can not do (note above the SCEV for `%ptr0`: it has the false dependency on `%t` fully eliminated. If re-materialized from SCEV, it will be just fine).
However, it won't do entirely either. `Vectorizer::isConsecutiveAccess` does more than just applying SE. Most notably, it can see through `select`s, even nested ones. Here's a test-case, which is mostly derived from the previous one:
```lang=llvm
target triple = "aarch64"
; Function Attrs: noinline nounwind
define i32 @test(i32* %array, i32 %idx, i1 %cond, i1 %cond2) #0 {
entry:
%idx.p.7 = add nsw i32 %idx, 7
%idx.p.7.sext = sext i32 %idx.p.7 to i64
%ptr1.dummy = getelementptr inbounds i32, i32* %array, i64 21
%ptr1.true = getelementptr inbounds i32, i32* %array, i64 1
%ptr1.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.7.sext
%ptr1.tmp = select i1 %cond, i32* %ptr1.true, i32* %ptr1.false
%ptr1 = select i1 %cond2, i32* %ptr1.tmp, i32* %ptr1.dummy
%t = load i32, i32* %ptr1, align 4
%t2 = mul i32 %t, 2
%idx.p.2t = add i32 %idx, %t2
%idx.p.t = sub i32 %idx.p.2t, %t
%idx.another = sub i32 %idx.p.t, %t
%idx.p.6 = add i32 %idx.another, 6
%idx.p.6.sext = sext i32 %idx.p.6 to i64
%ptr0.dummy = getelementptr inbounds i32, i32* %array, i64 20
%ptr0.true = getelementptr inbounds i32, i32* %array, i64 0
%ptr0.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.6.sext
%ptr0.tmp = select i1 %cond, i32* %ptr0.true, i32* %ptr0.false
%ptr0 = select i1 %cond2, i32* %ptr0.tmp, i32* %ptr0.dummy
%v = load i32, i32* %ptr0, align 4
%r = add i32 %v, %t
ret i32 %r
}
attributes #0 = { noinline nounwind }
```
As before, it hangs LSV both w/ and w/o this patch applied. What's more, however, in this case a top-level SCEV of the `%ptr0` is nowhere near being free of the false dependency on `%t`:
```
%idx.p.7 = add nsw i32 %idx, 7
--> (7 + %idx) U: full-set S: full-set
%idx.p.7.sext = sext i32 %idx.p.7 to i64
--> (sext i32 (7 + %idx) to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)
%ptr1.dummy = getelementptr inbounds i32, i32* %array, i64 21
--> (84 + %array)<nsw> U: full-set S: full-set
%ptr1.true = getelementptr inbounds i32, i32* %array, i64 1
--> (4 + %array)<nsw> U: full-set S: full-set
%ptr1.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.7.sext
--> ((4 * (sext i32 (7 + %idx) to i64))<nsw> + %array)<nsw> U: full-set S: full-set
%ptr1.tmp = select i1 %cond, i32* %ptr1.true, i32* %ptr1.false
--> %ptr1.tmp U: full-set S: full-set
%ptr1 = select i1 %cond2, i32* %ptr1.tmp, i32* %ptr1.dummy
--> %ptr1 U: full-set S: full-set
%t = load i32, i32* %ptr1, align 4
--> %t U: full-set S: full-set
%t2 = mul i32 %t, 2
--> (2 * %t) U: [0,-1) S: [-2147483648,2147483647)
%idx.p.2t = add i32 %idx, %t2
--> ((2 * %t) + %idx) U: full-set S: full-set
%idx.p.t = sub i32 %idx.p.2t, %t
--> (%idx + %t) U: full-set S: full-set
%idx.another = sub i32 %idx.p.t, %t
--> %idx U: full-set S: full-set
%idx.p.6 = add i32 %idx.another, 6
--> (6 + %idx) U: full-set S: full-set
%idx.p.6.sext = sext i32 %idx.p.6 to i64
--> (sext i32 (6 + %idx) to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)
%ptr0.dummy = getelementptr inbounds i32, i32* %array, i64 20
--> (80 + %array)<nsw> U: full-set S: full-set
%ptr0.true = getelementptr inbounds i32, i32* %array, i64 0
--> %array U: full-set S: full-set
%ptr0.false = getelementptr inbounds i32, i32* %array, i64 %idx.p.6.sext
--> ((4 * (sext i32 (6 + %idx) to i64))<nsw> + %array)<nsw> U: full-set S: full-set
%ptr0.tmp = select i1 %cond, i32* %ptr0.true, i32* %ptr0.false
--> %ptr0.tmp U: full-set S: full-set
%ptr0 = select i1 %cond2, i32* %ptr0.tmp, i32* %ptr0.dummy
```
So rematerializing just the top-level SCEV with the Expander won't help here either.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D54468/new/
https://reviews.llvm.org/D54468
More information about the llvm-commits
mailing list