[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Sebastian Pop spop at codeaurora.org
Thu Jan 26 13:36:56 PST 2012


On Thu, Jan 26, 2012 at 3:19 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote:
>> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> > Thanks! Did you compile with any non-default flags other than -mllvm
>> > -vectorize?
>>
>> I used -O3 and -vectorize, no other non-default flags.
>
> If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c
> then I get no vectorization at all (the output is identical to that
> without the -vectorize). What target triple is your clang targeting?
>

Target: arm-none-linux-gnueabi

> If I include -mllvm -debug-only=bb-vectorize then the relevant output
> is:
> BBV: fusing loop #1 for entry in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.body in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.end in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.cond7.preheader in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.body10 in main...
> BBV: found 16 instructions with candidate pairs
> BBV: found 62 pair connections.
> BBV: selected 0 pairs.
> BBV: done!
> BBV: fusing loop #1 for for.inc45 in main...
> BBV: found 0 instructions with candidate pairs
> BBV: done!
> BBV: fusing loop #1 for for.end47 in main...
> BBV: found 3 instructions with candidate pairs
> BBV: found 0 pair connections.
> BBV: done!
>
>  -Hal
>

Here is my output:

clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c -mllvm
-debug-only=bb-vectorize
BBV: fusing loop #1 for entry in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.body in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.end in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.cond7.preheader in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.body10 in main...
BBV: found 22 instructions with candidate pairs
BBV: found 82 pair connections.
BBV: selected pairs in the best tree for:   %0 = load i8* %r.063,
align 1, !tbaa !0
BBV: selected pair:   %mul23 = mul nsw i32 %conv14, 234 <->   %mul35 =
mul nsw i32 %conv15, 543
BBV: selected pair:   %0 = load i8* %r.063, align 1, !tbaa !0 <->   %1
= load i8* %incdec.ptr11, align 1, !tbaa !0
BBV: selected pair:   %conv14 = zext i8 %0 to i32 <->   %conv15 = zext
i8 %1 to i32
BBV: selected pair:   %add26 = add i32 %mul25, %mul23 <->   %add36 =
add i32 %mul35, %mul33
BBV: selected pair:   %mul = mul nsw i32 %conv14, 123 <->   %mul16 =
mul nsw i32 %conv15, 321
BBV: selected pair:   %conv30 = trunc i32 %add29 to i8 <->   %conv40 =
trunc i32 %add39 to i8
BBV: selected pair:   %mul25 = mul nsw i32 %conv15, 432 <->   %mul33 =
mul nsw i32 %conv14, 345
BBV: selected pair:   %add29 = add i32 %add26, %mul28 <->   %add39 =
add i32 %add36, %mul38
BBV: selected pair:   store i8 %conv30, i8* %incdec.ptr21, align 1,
!tbaa !0 <->   store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0
BBV: selected pairs in the best tree for:   %conv14 = zext i8 %0 to i32
BBV: selected pair:   %mul23 = mul nsw i32 %conv14, 234 <->   %mul35 =
mul nsw i32 %conv15, 543
BBV: selected pair:   %conv14 = zext i8 %0 to i32 <->   %conv15 = zext
i8 %1 to i32
BBV: selected pair:   %mul = mul nsw i32 %conv14, 123 <->   %mul16 =
mul nsw i32 %conv15, 321
BBV: selected pair:   %add26 = add i32 %mul25, %mul23 <->   %add36 =
add i32 %mul35, %mul33
BBV: selected pair:   %conv30 = trunc i32 %add29 to i8 <->   %conv40 =
trunc i32 %add39 to i8
BBV: selected pair:   %mul25 = mul nsw i32 %conv15, 432 <->   %mul33 =
mul nsw i32 %conv14, 345
BBV: selected pair:   %add29 = add i32 %add26, %mul28 <->   %add39 =
add i32 %add36, %mul38
BBV: selected pair:   store i8 %conv30, i8* %incdec.ptr21, align 1,
!tbaa !0 <->   store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0
BBV: selected 9 pairs.
BBV: initial:

for.body10:                                       ; preds =
%for.body10, %for.cond7.preheader
  %w.065 = phi i8* [ %call1, %for.cond7.preheader ], [ %incdec.ptr41,
%for.body10 ]
  %i.164 = phi i32 [ 0, %for.cond7.preheader ], [ %inc43, %for.body10 ]
  %r.063 = phi i8* [ %call, %for.cond7.preheader ], [ %incdec.ptr13,
%for.body10 ]
  %incdec.ptr11 = getelementptr inbounds i8* %r.063, i32 1
  %0 = load i8* %r.063, align 1, !tbaa !0
  %incdec.ptr12 = getelementptr inbounds i8* %r.063, i32 2
  %1 = load i8* %incdec.ptr11, align 1, !tbaa !0
  %incdec.ptr13 = getelementptr inbounds i8* %r.063, i32 3
  %2 = load i8* %incdec.ptr12, align 1, !tbaa !0
  %conv14 = zext i8 %0 to i32
  %mul = mul nsw i32 %conv14, 123
  %conv15 = zext i8 %1 to i32
  %mul16 = mul nsw i32 %conv15, 321
  %conv17 = zext i8 %2 to i32
  %mul18 = mul nsw i32 %conv17, 567
  %add = add i32 %mul16, %mul
  %add19 = add i32 %add, %mul18
  %conv20 = trunc i32 %add19 to i8
  %incdec.ptr21 = getelementptr inbounds i8* %w.065, i32 1
  store i8 %conv20, i8* %w.065, align 1, !tbaa !0
  %mul23 = mul nsw i32 %conv14, 234
  %mul25 = mul nsw i32 %conv15, 432
  %mul28 = mul nsw i32 %conv17, 987
  %add26 = add i32 %mul25, %mul23
  %add29 = add i32 %add26, %mul28
  %conv30 = trunc i32 %add29 to i8
  %incdec.ptr31 = getelementptr inbounds i8* %w.065, i32 2
  store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0
  %mul33 = mul nsw i32 %conv14, 345
  %mul35 = mul nsw i32 %conv15, 543
  %mul38 = mul nsw i32 %conv17, 789
  %add36 = add i32 %mul35, %mul33
  %add39 = add i32 %add36, %mul38
  %conv40 = trunc i32 %add39 to i8
  %incdec.ptr41 = getelementptr inbounds i8* %w.065, i32 3
  store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0
  %inc43 = add nsw i32 %i.164, 1
  %exitcond = icmp eq i32 %inc43, 10000
  br i1 %exitcond, label %for.inc45, label %for.body10

BBV: fusing:   %0 = load i8* %r.063, align 1, !tbaa !0 <->   %1 = load
i8* %incdec.ptr11, align 1, !tbaa !0
BBV: fusing:   %conv14 = zext i8 %2 to i32 <->   %conv15 = zext i8 %3 to i32
BBV: moving:   %mul = mul nsw i32 %5, 123 to after   %conv14.v.r2 =
extractelement <2 x i32> %conv14, i32 1
BBV: fusing:   %mul = mul nsw i32 %conv14.v.r1, 123 <->   %mul16 = mul
nsw i32 %conv14.v.r2, 321
BBV: fusing:   %mul23 = mul nsw i32 %conv14.v.r1, 234 <->   %mul35 =
mul nsw i32 %conv14.v.r2, 543
BBV: moving:   %add26 = add i32 %mul25, %5 to after   %mul23.v.r2 =
extractelement <2 x i32> %mul23, i32 1
BBV: moving:   %add29 = add i32 %add26, %mul28 to after   %add26 = add
i32 %mul25, %5
BBV: moving:   %conv30 = trunc i32 %add29 to i8 to after   %add29 =
add i32 %add26, %mul28
BBV: moving:   store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0
to after   %conv30 = trunc i32 %add29 to i8
BBV: fusing:   %mul25 = mul nsw i32 %conv14.v.r2, 432 <->   %mul33 =
mul nsw i32 %conv14.v.r1, 345
BBV: fusing:   %add26 = add i32 %mul25.v.r1, %mul23.v.r1 <->   %add36
= add i32 %mul23.v.r2, %mul25.v.r2
BBV: moving:   %add29 = add i32 %5, %mul28 to after   %add26.v.r2 =
extractelement <2 x i32> %add26, i32 1
BBV: moving:   %conv30 = trunc i32 %add29 to i8 to after   %add29 =
add i32 %5, %mul28
BBV: moving:   store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0
to after   %conv30 = trunc i32 %add29 to i8
BBV: fusing:   %add29 = add i32 %add26.v.r1, %mul28 <->   %add39 = add
i32 %add26.v.r2, %mul38
BBV: moving:   %conv30 = trunc i32 %5 to i8 to after   %add29.v.r2 =
extractelement <2 x i32> %add29, i32 1
BBV: moving:   store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0
to after   %conv30 = trunc i32 %5 to i8
BBV: fusing:   %conv30 = trunc i32 %add29.v.r1 to i8 <->   %conv40 =
trunc i32 %add29.v.r2 to i8
BBV: moving:   store i8 %5, i8* %incdec.ptr21, align 1, !tbaa !0 to
after   %conv30.v.r2 = extractelement <2 x i8> %conv30, i32 1
BBV: fusing:   store i8 %conv30.v.r1, i8* %incdec.ptr21, align 1,
!tbaa !0 <->   store i8 %conv30.v.r2, i8* %incdec.ptr31, align 1,
!tbaa !0
BBV: final:

for.body10:                                       ; preds =
%for.body10, %for.cond7.preheader
  %w.065 = phi i8* [ %call1, %for.cond7.preheader ], [ %incdec.ptr41,
%for.body10 ]
  %i.164 = phi i32 [ 0, %for.cond7.preheader ], [ %inc43, %for.body10 ]
  %r.063 = phi i8* [ %call, %for.cond7.preheader ], [ %incdec.ptr13,
%for.body10 ]
  %incdec.ptr11 = getelementptr inbounds i8* %r.063, i32 1
  %0 = bitcast i8* %r.063 to <2 x i8>*
  %incdec.ptr12 = getelementptr inbounds i8* %r.063, i32 2
  %1 = load <2 x i8>* %0, align 1, !tbaa !0
  %2 = extractelement <2 x i8> %1, i32 0
  %3 = extractelement <2 x i8> %1, i32 1
  %incdec.ptr13 = getelementptr inbounds i8* %r.063, i32 3
  %4 = load i8* %incdec.ptr12, align 1, !tbaa !0
  %conv14 = zext <2 x i8> %1 to <2 x i32>
  %conv14.v.r1 = extractelement <2 x i32> %conv14, i32 0
  %conv14.v.r2 = extractelement <2 x i32> %conv14, i32 1
  %mul.v.i1.1 = insertelement <2 x i32> undef, i32 123, i32 0
  %mul.v.i1.2 = insertelement <2 x i32> %mul.v.i1.1, i32 321, i32 1
  %mul = mul nsw <2 x i32> %conv14, %mul.v.i1.2
  %mul.v.r1 = extractelement <2 x i32> %mul, i32 0
  %mul.v.r2 = extractelement <2 x i32> %mul, i32 1
  %conv17 = zext i8 %4 to i32
  %mul18 = mul nsw i32 %conv17, 567
  %add = add i32 %mul.v.r2, %mul.v.r1
  %add19 = add i32 %add, %mul18
  %conv20 = trunc i32 %add19 to i8
  %incdec.ptr21 = getelementptr inbounds i8* %w.065, i32 1
  store i8 %conv20, i8* %w.065, align 1, !tbaa !0
  %mul23.v.i1.1 = insertelement <2 x i32> undef, i32 234, i32 0
  %mul25.v.i1.1 = insertelement <2 x i32> undef, i32 432, i32 0
  %mul28 = mul nsw i32 %conv17, 987
  %incdec.ptr31 = getelementptr inbounds i8* %w.065, i32 2
  %mul25.v.i1.2 = insertelement <2 x i32> %mul25.v.i1.1, i32 345, i32 1
  %mul25.v.i0 = shufflevector <2 x i32> %conv14, <2 x i32> undef, <2 x
i32> <i32 1, i32 0>
  %mul25 = mul nsw <2 x i32> %mul25.v.i0, %mul25.v.i1.2
  %mul25.v.r1 = extractelement <2 x i32> %mul25, i32 0
  %mul25.v.r2 = extractelement <2 x i32> %mul25, i32 1
  %mul23.v.i1.2 = insertelement <2 x i32> %mul23.v.i1.1, i32 543, i32 1
  %mul23 = mul nsw <2 x i32> %conv14, %mul23.v.i1.2
  %mul23.v.r1 = extractelement <2 x i32> %mul23, i32 0
  %mul23.v.r2 = extractelement <2 x i32> %mul23, i32 1
  %mul38 = mul nsw i32 %conv17, 789
  %add26.v.i1 = shufflevector <2 x i32> %mul23, <2 x i32> %mul25, <2 x
i32> <i32 0, i32 3>
  %add26.v.i0 = shufflevector <2 x i32> %mul25, <2 x i32> %mul23, <2 x
i32> <i32 0, i32 3>
  %add26 = add <2 x i32> %add26.v.i0, %add26.v.i1
  %add26.v.r1 = extractelement <2 x i32> %add26, i32 0
  %add26.v.r2 = extractelement <2 x i32> %add26, i32 1
  %add29.v.i1.1 = insertelement <2 x i32> undef, i32 %mul28, i32 0
  %add29.v.i1.2 = insertelement <2 x i32> %add29.v.i1.1, i32 %mul38, i32 1
  %add29 = add <2 x i32> %add26, %add29.v.i1.2
  %add29.v.r1 = extractelement <2 x i32> %add29, i32 0
  %add29.v.r2 = extractelement <2 x i32> %add29, i32 1
  %conv30 = trunc <2 x i32> %add29 to <2 x i8>
  %conv30.v.r1 = extractelement <2 x i8> %conv30, i32 0
  %conv30.v.r2 = extractelement <2 x i8> %conv30, i32 1
  %5 = bitcast i8* %incdec.ptr21 to <2 x i8>*
  %incdec.ptr41 = getelementptr inbounds i8* %w.065, i32 3
  store <2 x i8> %conv30, <2 x i8>* %5, align 1, !tbaa !0
  %inc43 = add nsw i32 %i.164, 1
  %exitcond = icmp eq i32 %inc43, 10000
  br i1 %exitcond, label %for.inc45, label %for.body10

BBV: fusing loop #2 for for.body10 in main...
BBV: found 27 instructions with candidate pairs
BBV: found 33 pair connections.
BBV: selected 0 pairs.
BBV: done!
BBV: fusing loop #1 for for.inc45 in main...
BBV: found 0 instructions with candidate pairs
BBV: done!
BBV: fusing loop #1 for for.end47 in main...
BBV: found 5 instructions with candidate pairs
BBV: found 2 pair connections.
BBV: selected 0 pairs.
BBV: done!

See also the attached test.ll (if that helps).

Sebastian
--
Qualcomm Innovation Center, Inc is a member of Code Aurora Forum
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.ll
Type: application/octet-stream
Size: 5983 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120126/f91d93eb/attachment.obj>


More information about the llvm-dev mailing list