[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
Sebastian Pop
spop at codeaurora.org
Thu Jan 26 13:36:56 PST 2012
On Thu, Jan 26, 2012 at 3:19 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote:
>> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> > Thanks! Did you compile with any non-default flags other than -mllvm
>> > -vectorize?
>>
>> I used -O3 and -vectorize, no other non-default flags.
>
> If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c
> then I get no vectorization at all (the output is identical to that
> without the -vectorize). What target triple is your clang targeting?
>
Target: arm-none-linux-gnueabi
> If I include -mllvm -debug-only=bb-vectorize then the relevant output
> is:
> BBV: fusing loop #1 for entry in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.body in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.end in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.cond7.preheader in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.body10 in main...
> BBV: found 16 instructions with candidate pairs
> BBV: found 62 pair connections.
> BBV: selected 0 pairs.
> BBV: done!
> BBV: fusing loop #1 for for.inc45 in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.end47 in main...
> BBV: found 3 instructions with candidate pairs
> BBV: found 0 pair connections.
> BBV: done!
>
> -Hal
>
Here is my output:
clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c -mllvm
-debug-only=bb-vectorize
BBV: fusing loop #1 for entry in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.body in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.end in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.cond7.preheader in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.body10 in main...
BBV: found 22 instructions with candidate pairs
BBV: found 82 pair connections.
BBV: selected pairs in the best tree for: %0 = load i8* %r.063,
align 1, !tbaa !0
BBV: selected pair: %mul23 = mul nsw i32 %conv14, 234 <-> %mul35 =
mul nsw i32 %conv15, 543
BBV: selected pair: %0 = load i8* %r.063, align 1, !tbaa !0 <-> %1
= load i8* %incdec.ptr11, align 1, !tbaa !0
BBV: selected pair: %conv14 = zext i8 %0 to i32 <-> %conv15 = zext
i8 %1 to i32
BBV: selected pair: %add26 = add i32 %mul25, %mul23 <-> %add36 =
add i32 %mul35, %mul33
BBV: selected pair: %mul = mul nsw i32 %conv14, 123 <-> %mul16 =
mul nsw i32 %conv15, 321
BBV: selected pair: %conv30 = trunc i32 %add29 to i8 <-> %conv40 =
trunc i32 %add39 to i8
BBV: selected pair: %mul25 = mul nsw i32 %conv15, 432 <-> %mul33 =
mul nsw i32 %conv14, 345
BBV: selected pair: %add29 = add i32 %add26, %mul28 <-> %add39 =
add i32 %add36, %mul38
BBV: selected pair: store i8 %conv30, i8* %incdec.ptr21, align 1,
!tbaa !0 <-> store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0
BBV: selected pairs in the best tree for: %conv14 = zext i8 %0 to i32
BBV: selected pair: %mul23 = mul nsw i32 %conv14, 234 <-> %mul35 =
mul nsw i32 %conv15, 543
BBV: selected pair: %conv14 = zext i8 %0 to i32 <-> %conv15 = zext
i8 %1 to i32
BBV: selected pair: %mul = mul nsw i32 %conv14, 123 <-> %mul16 =
mul nsw i32 %conv15, 321
BBV: selected pair: %add26 = add i32 %mul25, %mul23 <-> %add36 =
add i32 %mul35, %mul33
BBV: selected pair: %conv30 = trunc i32 %add29 to i8 <-> %conv40 =
trunc i32 %add39 to i8
BBV: selected pair: %mul25 = mul nsw i32 %conv15, 432 <-> %mul33 =
mul nsw i32 %conv14, 345
BBV: selected pair: %add29 = add i32 %add26, %mul28 <-> %add39 =
add i32 %add36, %mul38
BBV: selected pair: store i8 %conv30, i8* %incdec.ptr21, align 1,
!tbaa !0 <-> store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0
BBV: selected 9 pairs.
BBV: initial:
for.body10: ; preds =
%for.body10, %for.cond7.preheader
%w.065 = phi i8* [ %call1, %for.cond7.preheader ], [ %incdec.ptr41,
%for.body10 ]
%i.164 = phi i32 [ 0, %for.cond7.preheader ], [ %inc43, %for.body10 ]
%r.063 = phi i8* [ %call, %for.cond7.preheader ], [ %incdec.ptr13,
%for.body10 ]
%incdec.ptr11 = getelementptr inbounds i8* %r.063, i32 1
%0 = load i8* %r.063, align 1, !tbaa !0
%incdec.ptr12 = getelementptr inbounds i8* %r.063, i32 2
%1 = load i8* %incdec.ptr11, align 1, !tbaa !0
%incdec.ptr13 = getelementptr inbounds i8* %r.063, i32 3
%2 = load i8* %incdec.ptr12, align 1, !tbaa !0
%conv14 = zext i8 %0 to i32
%mul = mul nsw i32 %conv14, 123
%conv15 = zext i8 %1 to i32
%mul16 = mul nsw i32 %conv15, 321
%conv17 = zext i8 %2 to i32
%mul18 = mul nsw i32 %conv17, 567
%add = add i32 %mul16, %mul
%add19 = add i32 %add, %mul18
%conv20 = trunc i32 %add19 to i8
%incdec.ptr21 = getelementptr inbounds i8* %w.065, i32 1
store i8 %conv20, i8* %w.065, align 1, !tbaa !0
%mul23 = mul nsw i32 %conv14, 234
%mul25 = mul nsw i32 %conv15, 432
%mul28 = mul nsw i32 %conv17, 987
%add26 = add i32 %mul25, %mul23
%add29 = add i32 %add26, %mul28
%conv30 = trunc i32 %add29 to i8
%incdec.ptr31 = getelementptr inbounds i8* %w.065, i32 2
store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0
%mul33 = mul nsw i32 %conv14, 345
%mul35 = mul nsw i32 %conv15, 543
%mul38 = mul nsw i32 %conv17, 789
%add36 = add i32 %mul35, %mul33
%add39 = add i32 %add36, %mul38
%conv40 = trunc i32 %add39 to i8
%incdec.ptr41 = getelementptr inbounds i8* %w.065, i32 3
store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0
%inc43 = add nsw i32 %i.164, 1
%exitcond = icmp eq i32 %inc43, 10000
br i1 %exitcond, label %for.inc45, label %for.body10
BBV: fusing: %0 = load i8* %r.063, align 1, !tbaa !0 <-> %1 = load
i8* %incdec.ptr11, align 1, !tbaa !0
BBV: fusing: %conv14 = zext i8 %2 to i32 <-> %conv15 = zext i8 %3 to i32
BBV: moving: %mul = mul nsw i32 %5, 123 to after %conv14.v.r2 =
extractelement <2 x i32> %conv14, i32 1
BBV: fusing: %mul = mul nsw i32 %conv14.v.r1, 123 <-> %mul16 = mul
nsw i32 %conv14.v.r2, 321
BBV: fusing: %mul23 = mul nsw i32 %conv14.v.r1, 234 <-> %mul35 =
mul nsw i32 %conv14.v.r2, 543
BBV: moving: %add26 = add i32 %mul25, %5 to after %mul23.v.r2 =
extractelement <2 x i32> %mul23, i32 1
BBV: moving: %add29 = add i32 %add26, %mul28 to after %add26 = add
i32 %mul25, %5
BBV: moving: %conv30 = trunc i32 %add29 to i8 to after %add29 =
add i32 %add26, %mul28
BBV: moving: store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0
to after %conv30 = trunc i32 %add29 to i8
BBV: fusing: %mul25 = mul nsw i32 %conv14.v.r2, 432 <-> %mul33 =
mul nsw i32 %conv14.v.r1, 345
BBV: fusing: %add26 = add i32 %mul25.v.r1, %mul23.v.r1 <-> %add36
= add i32 %mul23.v.r2, %mul25.v.r2
BBV: moving: %add29 = add i32 %5, %mul28 to after %add26.v.r2 =
extractelement <2 x i32> %add26, i32 1
BBV: moving: %conv30 = trunc i32 %add29 to i8 to after %add29 =
add i32 %5, %mul28
BBV: moving: store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0
to after %conv30 = trunc i32 %add29 to i8
BBV: fusing: %add29 = add i32 %add26.v.r1, %mul28 <-> %add39 = add
i32 %add26.v.r2, %mul38
BBV: moving: %conv30 = trunc i32 %5 to i8 to after %add29.v.r2 =
extractelement <2 x i32> %add29, i32 1
BBV: moving: store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0
to after %conv30 = trunc i32 %5 to i8
BBV: fusing: %conv30 = trunc i32 %add29.v.r1 to i8 <-> %conv40 =
trunc i32 %add29.v.r2 to i8
BBV: moving: store i8 %5, i8* %incdec.ptr21, align 1, !tbaa !0 to
after %conv30.v.r2 = extractelement <2 x i8> %conv30, i32 1
BBV: fusing: store i8 %conv30.v.r1, i8* %incdec.ptr21, align 1,
!tbaa !0 <-> store i8 %conv30.v.r2, i8* %incdec.ptr31, align 1,
!tbaa !0
BBV: final:
for.body10: ; preds =
%for.body10, %for.cond7.preheader
%w.065 = phi i8* [ %call1, %for.cond7.preheader ], [ %incdec.ptr41,
%for.body10 ]
%i.164 = phi i32 [ 0, %for.cond7.preheader ], [ %inc43, %for.body10 ]
%r.063 = phi i8* [ %call, %for.cond7.preheader ], [ %incdec.ptr13,
%for.body10 ]
%incdec.ptr11 = getelementptr inbounds i8* %r.063, i32 1
%0 = bitcast i8* %r.063 to <2 x i8>*
%incdec.ptr12 = getelementptr inbounds i8* %r.063, i32 2
%1 = load <2 x i8>* %0, align 1, !tbaa !0
%2 = extractelement <2 x i8> %1, i32 0
%3 = extractelement <2 x i8> %1, i32 1
%incdec.ptr13 = getelementptr inbounds i8* %r.063, i32 3
%4 = load i8* %incdec.ptr12, align 1, !tbaa !0
%conv14 = zext <2 x i8> %1 to <2 x i32>
%conv14.v.r1 = extractelement <2 x i32> %conv14, i32 0
%conv14.v.r2 = extractelement <2 x i32> %conv14, i32 1
%mul.v.i1.1 = insertelement <2 x i32> undef, i32 123, i32 0
%mul.v.i1.2 = insertelement <2 x i32> %mul.v.i1.1, i32 321, i32 1
%mul = mul nsw <2 x i32> %conv14, %mul.v.i1.2
%mul.v.r1 = extractelement <2 x i32> %mul, i32 0
%mul.v.r2 = extractelement <2 x i32> %mul, i32 1
%conv17 = zext i8 %4 to i32
%mul18 = mul nsw i32 %conv17, 567
%add = add i32 %mul.v.r2, %mul.v.r1
%add19 = add i32 %add, %mul18
%conv20 = trunc i32 %add19 to i8
%incdec.ptr21 = getelementptr inbounds i8* %w.065, i32 1
store i8 %conv20, i8* %w.065, align 1, !tbaa !0
%mul23.v.i1.1 = insertelement <2 x i32> undef, i32 234, i32 0
%mul25.v.i1.1 = insertelement <2 x i32> undef, i32 432, i32 0
%mul28 = mul nsw i32 %conv17, 987
%incdec.ptr31 = getelementptr inbounds i8* %w.065, i32 2
%mul25.v.i1.2 = insertelement <2 x i32> %mul25.v.i1.1, i32 345, i32 1
%mul25.v.i0 = shufflevector <2 x i32> %conv14, <2 x i32> undef, <2 x
i32> <i32 1, i32 0>
%mul25 = mul nsw <2 x i32> %mul25.v.i0, %mul25.v.i1.2
%mul25.v.r1 = extractelement <2 x i32> %mul25, i32 0
%mul25.v.r2 = extractelement <2 x i32> %mul25, i32 1
%mul23.v.i1.2 = insertelement <2 x i32> %mul23.v.i1.1, i32 543, i32 1
%mul23 = mul nsw <2 x i32> %conv14, %mul23.v.i1.2
%mul23.v.r1 = extractelement <2 x i32> %mul23, i32 0
%mul23.v.r2 = extractelement <2 x i32> %mul23, i32 1
%mul38 = mul nsw i32 %conv17, 789
%add26.v.i1 = shufflevector <2 x i32> %mul23, <2 x i32> %mul25, <2 x
i32> <i32 0, i32 3>
%add26.v.i0 = shufflevector <2 x i32> %mul25, <2 x i32> %mul23, <2 x
i32> <i32 0, i32 3>
%add26 = add <2 x i32> %add26.v.i0, %add26.v.i1
%add26.v.r1 = extractelement <2 x i32> %add26, i32 0
%add26.v.r2 = extractelement <2 x i32> %add26, i32 1
%add29.v.i1.1 = insertelement <2 x i32> undef, i32 %mul28, i32 0
%add29.v.i1.2 = insertelement <2 x i32> %add29.v.i1.1, i32 %mul38, i32 1
%add29 = add <2 x i32> %add26, %add29.v.i1.2
%add29.v.r1 = extractelement <2 x i32> %add29, i32 0
%add29.v.r2 = extractelement <2 x i32> %add29, i32 1
%conv30 = trunc <2 x i32> %add29 to <2 x i8>
%conv30.v.r1 = extractelement <2 x i8> %conv30, i32 0
%conv30.v.r2 = extractelement <2 x i8> %conv30, i32 1
%5 = bitcast i8* %incdec.ptr21 to <2 x i8>*
%incdec.ptr41 = getelementptr inbounds i8* %w.065, i32 3
store <2 x i8> %conv30, <2 x i8>* %5, align 1, !tbaa !0
%inc43 = add nsw i32 %i.164, 1
%exitcond = icmp eq i32 %inc43, 10000
br i1 %exitcond, label %for.inc45, label %for.body10
BBV: fusing loop #2 for for.body10 in main...
BBV: found 27 instructions with candidate pairs
BBV: found 33 pair connections.
BBV: selected 0 pairs.
BBV: done!
BBV: fusing loop #1 for for.inc45 in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.end47 in main...
BBV: found 5 instructions with candidate pairs
BBV: found 2 pair connections.
BBV: selected 0 pairs.
BBV: done!
See also the attached test.ll (if that helps).
Sebastian
--
Qualcomm Innovation Center, Inc is a member of Code Aurora Forum
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.ll
Type: application/octet-stream
Size: 5983 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120126/f91d93eb/attachment.obj>
More information about the llvm-dev
mailing list