[llvm-dev] Imprecise information by SCEV after loop unrolling and Vectorization.
Mani, Suresh via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 24 03:58:44 PDT 2018
Hi,
I have a simple test case,
// 1.c
#include <stdio.h>
int b[1000];
main() {
int a[1000];
int sum = 0;
for (int i = 0; i < 1000; i++) {
a[i] = i + 10;
}
for (int i = 0; i < 1000; i += 1) {
sum = sum + a[i];
}
printf("sum = %ld\n", sum);
}
Compilation: clang -m64 -fuse-ld=lld -flto -S -emit-llvm -o 1.ll 1.c
SCEV DUMP: opt -S -loop-accesses -mem2reg -analyze -indvars -loop-simplify -scalar-evolution 1.ll
Due to unrolling and Vectorization, Scev determines that some of the access are outside the range [ 0 - 999 ] --------------- [ 0 - 1020 ] as highlighted below in red, not sure if this issue is a known issue in Scev ?
Printing analysis 'Loop Access Analysis' for function 'main':
vector.body33:
Memory dependences are safe
Dependences:
Run-time memory checks:
Grouped accesses:
Store to invariant address was not found in loop.
SCEV assumptions:
Expressions re-written:
vector.body:
Report: loop control flow is not understood by analyzer
Dependences:
Run-time memory checks:
Grouped accesses:
Store to invariant address was not found in loop.
SCEV assumptions:
Expressions re-written:
Printing analysis 'Promote Memory to Register' for function 'main':
Pass::print not implemented for pass: 'Promote Memory to Register'!
Printing analysis 'Induction Variable Simplification':
Pass::print not implemented for pass: 'Induction Variable Simplification'!
Printing analysis 'Induction Variable Simplification':
Pass::print not implemented for pass: 'Induction Variable Simplification'!
Printing analysis 'Canonicalize natural loops' for function 'main':
Pass::print not implemented for pass: 'Canonicalize natural loops'!
Printing analysis 'Scalar Evolution Analysis' for function 'main':
Classifying expressions for: @main
%a = alloca [1000 x i32], align 16
--> %a U: [0,-15) S: [-9223372036854775808,9223372036854775793)
%0 = bitcast [1000 x i32]* %a to i8*
--> %a U: [0,-15) S: [-9223372036854775808,9223372036854775793)
%index = phi i64 [ 0, %entry ], [ %index.next.3, %vector.body.1 ]
--> {0,+,32}<nuw><nsw><%vector.body> U: [0,993) S: [0,993) Exits: 992 LoopDispositions: { %vector.body: Computable }
%1 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index
--> {%a,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body: Computable }
%4 = bitcast i32* %1 to <4 x i32>*
--> {%a,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body: Computable }
%5 = getelementptr inbounds i32, i32* %1, i64 4
--> {(16 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body: Computable }
%6 = bitcast i32* %5 to <4 x i32>*
--> {(16 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body: Computable }
%index.next = or i64 %index, 8
--> {8,+,32}<nuw><nsw><%vector.body> U: [8,1001) S: [8,1001) Exits: 1000 LoopDispositions: { %vector.body: Computable }
%index37 = phi i64 [ %index.next38.4, %vector.body33 ], [ 0, %vector.body33.preheader ]
--> {0,+,40}<nuw><nsw><%vector.body33> U: [0,961) S: [0,961) Exits: 960 LoopDispositions: { %vector.body33: Computable }
%8 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index37
--> {%a,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3840 + %a) LoopDispositions: { %vector.body33: Computable }
%9 = bitcast i32* %8 to <4 x i32>*
--> {%a,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3840 + %a) LoopDispositions: { %vector.body33: Computable }
%10 = getelementptr inbounds i32, i32* %8, i64 4
--> {(16 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3856 + %a) LoopDispositions: { %vector.body33: Computable }
%11 = bitcast i32* %10 to <4 x i32>*
--> {(16 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3856 + %a) LoopDispositions: { %vector.body33: Computable }
%index.next38 = add nuw nsw i64 %index37, 8
--> {8,+,40}<nuw><nsw><%vector.body33> U: [8,969) S: [8,969) Exits: 968 LoopDispositions: { %vector.body33: Computable }
%14 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38
--> {(32 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3872 + %a) LoopDispositions: { %vector.body33: Computable }
%15 = bitcast i32* %14 to <4 x i32>*
--> {(32 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3872 + %a) LoopDispositions: { %vector.body33: Computable }
%16 = getelementptr inbounds i32, i32* %14, i64 4
--> {(48 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3888 + %a) LoopDispositions: { %vector.body33: Computable }
%17 = bitcast i32* %16 to <4 x i32>*
--> {(48 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3888 + %a) LoopDispositions: { %vector.body33: Computable }
%index.next38.1 = add nuw nsw i64 %index37, 16
--> {16,+,40}<nuw><nsw><%vector.body33> U: [16,977) S: [16,977) Exits: 976 LoopDispositions: { %vector.body33: Computable }
%20 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.1
--> {(64 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3904 + %a) LoopDispositions: { %vector.body33: Computable }
%21 = bitcast i32* %20 to <4 x i32>*
--> {(64 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3904 + %a) LoopDispositions: { %vector.body33: Computable }
%22 = getelementptr inbounds i32, i32* %20, i64 4
--> {(80 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3920 + %a) LoopDispositions: { %vector.body33: Computable }
%23 = bitcast i32* %22 to <4 x i32>*
--> {(80 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3920 + %a) LoopDispositions: { %vector.body33: Computable }
%index.next38.2 = add nuw nsw i64 %index37, 24
--> {24,+,40}<nuw><nsw><%vector.body33> U: [24,985) S: [24,985) Exits: 984 LoopDispositions: { %vector.body33: Computable }
%26 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.2
--> {(96 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3936 + %a) LoopDispositions: { %vector.body33: Computable }
%27 = bitcast i32* %26 to <4 x i32>*
--> {(96 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3936 + %a) LoopDispositions: { %vector.body33: Computable }
%28 = getelementptr inbounds i32, i32* %26, i64 4
--> {(112 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3952 + %a) LoopDispositions: { %vector.body33: Computable }
%29 = bitcast i32* %28 to <4 x i32>*
--> {(112 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3952 + %a) LoopDispositions: { %vector.body33: Computable }
%index.next38.3 = add nuw nsw i64 %index37, 32
--> {32,+,40}<nuw><nsw><%vector.body33> U: [32,993) S: [32,993) Exits: 992 LoopDispositions: { %vector.body33: Computable }
%32 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.3
--> {(128 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body33: Computable }
%33 = bitcast i32* %32 to <4 x i32>*
--> {(128 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body33: Computable }
%34 = getelementptr inbounds i32, i32* %32, i64 4
--> {(144 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body33: Computable }
%35 = bitcast i32* %34 to <4 x i32>*
--> {(144 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body33: Computable }
%index.next38.4 = add nuw nsw i64 %index37, 40
--> {40,+,40}<nuw><nsw><%vector.body33> U: [40,1001) S: [40,1001) Exits: 1000 LoopDispositions: { %vector.body33: Computable }
%39 = extractelement <4 x i32> %bin.rdx46, i32 0
--> %39 U: full-set S: full-set
%call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %39)
--> %call U: full-set S: full-set
%40 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next
--> {(32 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4000 + %a) LoopDispositions: { %vector.body: Computable }
%43 = bitcast i32* %40 to <4 x i32>*
--> {(32 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4000 + %a) LoopDispositions: { %vector.body: Computable }
%44 = getelementptr inbounds i32, i32* %40, i64 4
--> {(48 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4016 + %a) LoopDispositions: { %vector.body: Computable } <-------------------------------------------OUTSIDE RANGE 4016/4 = 1004
%45 = bitcast i32* %44 to <4 x i32>*
--> {(48 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4016 + %a) LoopDispositions: { %vector.body: Computable }
%index.next.1 = or i64 %index, 16
--> {16,+,32}<nuw><nsw><%vector.body> U: [16,1009) S: [16,1009) Exits: 1008 LoopDispositions: { %vector.body: Computable }
%46 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next.1
--> {(64 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4032 + %a) LoopDispositions: { %vector.body: Computable } <-------------------------------------------OUTSIDE RANGE 4032/4 = 1008
%49 = bitcast i32* %46 to <4 x i32>*
--> {(64 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4032 + %a) LoopDispositions: { %vector.body: Computable }
%50 = getelementptr inbounds i32, i32* %46, i64 4
--> {(80 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4048 + %a) LoopDispositions: { %vector.body: Computable }
%51 = bitcast i32* %50 to <4 x i32>*
--> {(80 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4048 + %a) LoopDispositions: { %vector.body: Computable }
%index.next.2 = or i64 %index, 24
--> {24,+,32}<nuw><nsw><%vector.body> U: [24,1017) S: [24,1017) Exits: 1016 LoopDispositions: { %vector.body: Computable }
%52 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next.2
--> {(96 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4064 + %a) LoopDispositions: { %vector.body: Computable }
%55 = bitcast i32* %52 to <4 x i32>*
--> {(96 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4064 + %a) LoopDispositions: { %vector.body: Computable }
%56 = getelementptr inbounds i32, i32* %52, i64 4
--> {(112 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4080 + %a) LoopDispositions: { %vector.body: Computable }
%57 = bitcast i32* %56 to <4 x i32>*
--> {(112 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4080 + %a) LoopDispositions: { %vector.body: Computable } <-------------------------------------------OUTSIDE RANGE 4080/4 = 1020
%index.next.3 = add nuw nsw i64 %index, 32
--> {32,+,32}<nuw><nsw><%vector.body> U: [32,1025) S: [32,1025) Exits: 1024 LoopDispositions: { %vector.body: Computable }
Determining loop execution counts for: @main
Loop %vector.body33: backedge-taken count is 24
Loop %vector.body33: max backedge-taken count is 24
Loop %vector.body33: Predicated backedge-taken count is 24
Predicates:
Loop %vector.body33: Trip multiple is 25
Loop %vector.body: backedge-taken count is 31
Loop %vector.body: max backedge-taken count is 31
Loop %vector.body: Predicated backedge-taken count is 31
Predicates:
Loop %vector.body: Trip multiple is 32
$
Thanks
M Suresh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180824/66210244/attachment-0001.html>
More information about the llvm-dev
mailing list