[llvm-bugs] [Bug 30713] New: Vectorized loops do not get proper handling by LoopStrenghtReduction
via llvm-bugs
llvm-bugs at lists.llvm.org
Sun Oct 16 07:07:45 PDT 2016
https://llvm.org/bugs/show_bug.cgi?id=30713
Bug ID: 30713
Summary: Vectorized loops do not get proper handling by
LoopStrenghtReduction
Product: new-bugs
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: paulsson at linux.vnet.ibm.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
I find that many loops that get vectorized suffer from lack of handling of the
LSR pass.
I first thought this was related to the fact of vectorized addressing
computations, but even when I (on my experimental branch), forced the
vectorizer to scalarize all address computations, the LSR pass does nothing.
The vectorization factor was 2.
What I see is that IVUsers do not recognize the loads as IV users in the
vectorized loop. It turns out that the vectorized loops 'sext' instruction is
!isInteresting(), because this SCEV:
(sext i32 {(1 + (2 * %mm.651771))<nuw><nsw>,+,4}<%vector.body3549> to i64)
for it is not an SCEVAddRecExpr.
In the scalar loop, the sext is however interesting, with this SCEVAddRecExpr:
{(sext i32 (-1 + (2 * %kk1108.11765.in.ph)) to i64),+,-2}<nsw><%for.body1153>
The basic use of the PHINode is following similar patterns:
phi -> add -> shl -> mul -> sext -> gep -> bitcast -> load
phi -> sub -> add -> add -> shl -> sext -> gep -> bitcast -> load
I would like to ask for any input on if the IVUsers / LSR is supposed to work
well with vectorized loops, and if anyone has encountered this before?
/Jonas
My example looks like this:
Loop before vectorize pass:
for.body1153: ; preds = %for.body1153,
%for.body1153.preheader
%kk1108.11765.in = phi i32 [ %kk1108.11765, %for.body1153 ], [
%kstart1109.11769, %for.body1153.preheader ]
%mm.661764 = phi i32 [ %inc1171, %for.body1153 ], [ %mm.651771,
%for.body1153.preheader ]
%ii1106.11763 = phi i32 [ %inc1169, %for.body1153 ], [ %add1150,
%for.body1153.preheader ]
%kk1108.11765 = add nsw i32 %kk1108.11765.in, -1
%mul1154 = shl nsw i32 %kk1108.11765, 1
%idxprom1155 = sext i32 %mul1154 to i64
%arrayidx1156 = getelementptr inbounds double, double* %call113, i64
%idxprom1155
%387 = bitcast double* %arrayidx1156 to i64*
%388 = load i64, i64* %387, align 8, !tbaa !19
%mul1157 = shl nsw i32 %mm.661764, 1
%idxprom1158 = sext i32 %mul1157 to i64
%arrayidx1159 = getelementptr inbounds double, double* %dvec, i64
%idxprom1158
%389 = bitcast double* %arrayidx1159 to i64*
store i64 %388, i64* %389, align 8, !tbaa !19
%add1161 = or i32 %mul1154, 1
%idxprom1162 = sext i32 %add1161 to i64
%arrayidx1163 = getelementptr inbounds double, double* %call113, i64
%idxprom1162
%390 = bitcast double* %arrayidx1163 to i64*
%391 = load i64, i64* %390, align 8, !tbaa !19
%add1165 = or i32 %mul1157, 1
%idxprom1166 = sext i32 %add1165 to i64
%arrayidx1167 = getelementptr inbounds double, double* %dvec, i64
%idxprom1166
%392 = bitcast double* %arrayidx1167 to i64*
store i64 %391, i64* %392, align 8, !tbaa !19
%inc1169 = add nuw nsw i32 %ii1106.11763, 1
%inc1171 = add nsw i32 %mm.661764, 1
%exitcond2440 = icmp eq i32 %inc1169, %3
br i1 %exitcond2440, label %for.end1172.loopexit, label %for.body1153
Loop after vectorize pass (with scalarized address computations to try to help
LSR):
vector.body3549: ; preds = %vector.body3549,
%vector.ph3582
%index3583 = phi i32 [ 0, %vector.ph3582 ], [ %index.next3584,
%vector.body3549 ]
%offset.idx3592 = sub i32 %kstart1109.11769, %index3583
%broadcast.splatinsert3593 = insertelement <2 x i32> undef, i32
%offset.idx3592, i32 0
%broadcast.splat3594 = shufflevector <2 x i32> %broadcast.splatinsert3593, <2
x i32> undef, <2 x i32> zeroinitializer
%induction3595 = add <2 x i32> %broadcast.splat3594, <i32 0, i32 -1>
%418 = add i32 %offset.idx3592, 0
%419 = add i32 %offset.idx3592, -1
%offset.idx3596 = add i32 %mm.651771, %index3583
%broadcast.splatinsert3597 = insertelement <2 x i32> undef, i32
%offset.idx3596, i32 0
%broadcast.splat3598 = shufflevector <2 x i32> %broadcast.splatinsert3597, <2
x i32> undef, <2 x i32> zeroinitializer
%induction3599 = add <2 x i32> %broadcast.splat3598, <i32 0, i32 1>
%420 = add i32 %offset.idx3596, 0
%offset.idx3600 = add i32 %add1150, %index3583
%broadcast.splatinsert3601 = insertelement <2 x i32> undef, i32
%offset.idx3600, i32 0
%broadcast.splat3602 = shufflevector <2 x i32> %broadcast.splatinsert3601, <2
x i32> undef, <2 x i32> zeroinitializer
%induction3603 = add <2 x i32> %broadcast.splat3602, <i32 0, i32 1>
%421 = add i32 %offset.idx3600, 0
%422 = add nsw i32 %418, -1
%423 = add nsw i32 %419, -1
%424 = shl nsw i32 %422, 1
%425 = shl nsw i32 %423, 1
%426 = sext i32 %424 to i64
%427 = sext i32 %425 to i64
%428 = getelementptr inbounds double, double* %call113, i64 %426
%429 = getelementptr inbounds double, double* %call113, i64 %427
%430 = bitcast double* %428 to i64*
%431 = bitcast double* %429 to i64*
%432 = load i64, i64* %430, align 8, !tbaa !19, !alias.scope !21
%433 = load i64, i64* %431, align 8, !tbaa !19, !alias.scope !21
%434 = insertelement <2 x i64> undef, i64 %432, i32 0
%435 = insertelement <2 x i64> %434, i64 %433, i32 1
%436 = shl nsw i32 %420, 1
%437 = sext i32 %436 to i64
%438 = getelementptr inbounds double, double* %dvec, i64 %437
%439 = bitcast double* %438 to i64*
%440 = or i32 %424, 1
%441 = or i32 %425, 1
%442 = sext i32 %440 to i64
%443 = sext i32 %441 to i64
%444 = getelementptr inbounds double, double* %call113, i64 %442
%445 = getelementptr inbounds double, double* %call113, i64 %443
%446 = bitcast double* %444 to i64*
%447 = bitcast double* %445 to i64*
%448 = load i64, i64* %446, align 8, !tbaa !19, !alias.scope !24
%449 = load i64, i64* %447, align 8, !tbaa !19, !alias.scope !24
%450 = insertelement <2 x i64> undef, i64 %448, i32 0
%451 = insertelement <2 x i64> %450, i64 %449, i32 1
%452 = or i32 %436, 1
%453 = sext i32 %452 to i64
%454 = getelementptr inbounds double, double* %dvec, i64 %453
%455 = bitcast double* %454 to i64*
%456 = getelementptr i64, i64* %455, i32 -1
%457 = bitcast i64* %456 to <4 x i64>*
%458 = shufflevector <2 x i64> %435, <2 x i64> %451, <4 x i32> <i32 0, i32 1,
i32 2, i32 3>
%interleaved.vec3604 = shufflevector <4 x i64> %458, <4 x i64> undef, <4 x
i32> <i32 0, i32 2, i32 1, i32 3>
store <4 x i64> %interleaved.vec3604, <4 x i64>* %457, align 8, !tbaa !19,
!alias.scope !26, !noalias !28
%459 = add nuw nsw i32 %421, 1
%460 = add nsw i32 %420, 1
%461 = icmp eq i32 %459, %3
%index.next3584 = add i32 %index3583, 2
%462 = icmp eq i32 %index.next3584, %n.vec3555
br i1 %462, label %middle.block3550, label %vector.body3549, !llvm.loop !29
Final vectorized MachineLoop :
Final Header:
vector.body3549: ; preds = %vector.body3549,
%vector.body3549.preheader
%lsr.iv467 = phi i32 [ %lsr.iv.next468, %vector.body3549 ], [ %lsr.iv465,
%vector.body3549.preheader ]
%lsr.iv461 = phi i32 [ %lsr.iv.next462, %vector.body3549 ], [ %749,
%vector.body3549.preheader ]
%lsr.iv459 = phi i32 [ %lsr.iv.next460, %vector.body3549 ], [ %727,
%vector.body3549.preheader ]
%750 = add i32 %lsr.iv467, -1
%751 = add i32 %lsr.iv467, -3
%752 = sext i32 %750 to i64
%753 = sext i32 %751 to i64
%754 = getelementptr inbounds double, double* %call113, i64 %752
%755 = getelementptr inbounds double, double* %call113, i64 %753
%756 = bitcast double* %754 to i64*
%757 = bitcast double* %755 to i64*
%758 = load i64, i64* %756, align 8, !tbaa !19, !alias.scope !93
%759 = load i64, i64* %757, align 8, !tbaa !19, !alias.scope !93
%760 = insertelement <2 x i64> undef, i64 %758, i32 0
%761 = insertelement <2 x i64> %760, i64 %759, i32 1
%762 = add i32 %lsr.iv467, -2
%763 = sext i32 %lsr.iv467 to i64
%764 = sext i32 %762 to i64
%765 = getelementptr inbounds double, double* %call113, i64 %763
%766 = getelementptr inbounds double, double* %call113, i64 %764
%767 = bitcast double* %765 to i64*
%768 = bitcast double* %766 to i64*
%769 = load i64, i64* %767, align 8, !tbaa !19, !alias.scope !96
%770 = load i64, i64* %768, align 8, !tbaa !19, !alias.scope !96
%771 = insertelement <2 x i64> undef, i64 %769, i32 0
%772 = insertelement <2 x i64> %771, i64 %770, i32 1
%773 = sext i32 %lsr.iv461 to i64
%774 = getelementptr inbounds double, double* %dvec, i64 %773
%775 = getelementptr double, double* %774, i64 -1
%776 = bitcast double* %775 to <4 x i64>*
%interleaved.vec3604 = shufflevector <2 x i64> %761, <2 x i64> %772, <4 x
i32> <i32 0, i32 2, i32 1, i32 3>
store <4 x i64> %interleaved.vec3604, <4 x i64>* %776, align 8, !tbaa !19,
!alias.scope !98, !noalias !100
%lsr.iv.next460 = add i32 %lsr.iv459, -2
%lsr.iv.next462 = add i32 %lsr.iv461, 4
%lsr.iv.next468 = add i32 %lsr.iv467, -4
%777 = icmp eq i32 %lsr.iv.next460, 0
br i1 %777, label %middle.block3550, label %vector.body3549, !llvm.loop !101
Final scalar MachineLoop :
Final Header:
for.body1153: ; preds = %for.body1153,
%for.body1153.preheader4255
%lsr.iv483 = phi i32 [ %lsr.iv.next484, %for.body1153 ], [ %785,
%for.body1153.preheader4255 ]
%lsr.iv479 = phi double* [ %scevgep480, %for.body1153 ], [ %scevgep478,
%for.body1153.preheader4255 ]
%lsr.iv474 = phi double* [ %scevgep475, %for.body1153 ], [ %scevgep473,
%for.body1153.preheader4255 ]
%lsr.iv470 = phi double* [ %scevgep471, %for.body1153 ], [ %scevgep469,
%for.body1153.preheader4255 ]
%lsr.iv479481 = bitcast double* %lsr.iv479 to i64*
%lsr.iv474476 = bitcast double* %lsr.iv474 to i64*
%lsr.iv470472 = bitcast double* %lsr.iv470 to i64*
%786 = load i64, i64* %lsr.iv474476, align 8, !tbaa !19
%scevgep482 = getelementptr i64, i64* %lsr.iv479481, i64 -1
store i64 %786, i64* %scevgep482, align 8, !tbaa !19
%787 = load i64, i64* %lsr.iv470472, align 8, !tbaa !19
store i64 %787, i64* %lsr.iv479481, align 8, !tbaa !19
%scevgep471 = getelementptr double, double* %lsr.iv470, i64 -2
%scevgep475 = getelementptr double, double* %lsr.iv474, i64 -2
%scevgep480 = getelementptr double, double* %lsr.iv479, i64 2
%lsr.iv.next484 = add i32 %lsr.iv483, -1
%exitcond2440 = icmp eq i32 %lsr.iv.next484, 0
br i1 %exitcond2440, label %for.end1172.loopexit, label %for.body1153,
!llvm.loop !102
Final vectorized MachineLoop :
BB#261: derived from LLVM BB %vector.body3549
Live Ins: %R6D %R9D %R0H %R0L %R1L %R3L %R4L %R5L %R7L %R8L %R10L %R11L
%R12L %R13L
Predecessors according to CFG: BB#260 BB#261
%R2D<def> = LGFR %R1L
%R2D<def> = SLLG %R2D<kill>, %noreg, 3
%R2D<def> = LG %R6D, 0, %R2D<kill>;
mem:LD8[%767](tbaa=!20)(alias.scope=!97)
%R14L<def> = AHIK %R1L, -2, %CC<imp-def,dead>
%R14D<def> = LGFR %R14L<kill>
%R14D<def> = SLLG %R14D<kill>, %noreg, 3
%R14D<def> = LG %R6D, 0, %R14D<kill>;
mem:LD8[%768](tbaa=!20)(alias.scope=!97)
%V0<def> = VLVGP %R2D<kill>, %R14D<kill>
%R2L<def> = AHIK %R1L, -3, %CC<imp-def,dead>
%R2D<def> = LGFR %R2L<kill>
%R2D<def> = SLLG %R2D<kill>, %noreg, 3
%R2D<def> = LG %R6D, 0, %R2D<kill>;
mem:LD8[%757](tbaa=!20)(alias.scope=!94)
%R14L<def> = AHIK %R1L, -1, %CC<imp-def,dead>
%R14D<def> = LGFR %R14L<kill>
%R14D<def> = SLLG %R14D<kill>, %noreg, 3
%R14D<def> = LG %R6D, 0, %R14D<kill>;
mem:LD8[%756](tbaa=!20)(alias.scope=!94)
%V1<def> = VLVGP %R14D<kill>, %R2D<kill>
%V2<def> = VMRLG %V1, %V0
%R2D<def> = LGFR %R11L
%R2D<def> = SLLG %R2D<kill>, %noreg, 3
VST %V2<kill>, %R9D, 8, %R2D;
mem:ST16[%776+16](align=8)(tbaa=!20)(alias.scope=!99)(noalias=!97,!94)
%V0<def> = VMRHG %V1<kill>, %V0<kill>
%R2D<def> = LAY %R9D, -8, %R2D<kill>
VST %V0<kill>, %R2D<kill>, 0, %noreg;
mem:ST16[%776](align=8)(tbaa=!20)(alias.scope=!99)(noalias=!97,!94)
%R1L<def,tied1> = AHI %R1L<kill,tied0>, -4, %CC<imp-def,dead>
%R11L<def,tied1> = AHI %R11L<kill,tied0>, 4, %CC<imp-def,dead>
%R5L<def,tied1> = AHI %R5L<kill,tied0>, -2, %CC<imp-def>
BRC 15, 7, <BB#261>, %CC<imp-use,kill>
Successors according to CFG: BB#262(0x04000000 / 0x80000000 = 3.12%)
BB#261(0x7c000000 / 0x80000000 = 96.88%)
Final scalar MachineLoop :
BB#265: derived from LLVM BB %for.body1153
Live Ins: %R1D %R2D %R5D %R6D %R0H %R3L %R4L %R7L %R8L %R9L %R10L %R11L
%R12L %R13L
Predecessors according to CFG: BB#264 BB#265
%R14D<def> = LG %R5D, 0, %noreg; mem:LD8[%lsr.iv474476](tbaa=!20)
STG %R14D<kill>, %R1D, -8, %noreg; mem:ST8[%scevgep482](tbaa=!20)
%R14D<def> = LG %R2D, 0, %noreg; mem:LD8[%lsr.iv470472](tbaa=!20)
STG %R14D<kill>, %R1D, 0, %noreg; mem:ST8[%lsr.iv479481](tbaa=!20)
%R1D<def> = LA %R1D<kill>, 16, %noreg
%R5D<def> = LAY %R5D<kill>, -16, %noreg
%R2D<def> = LAY %R2D<kill>, -16, %noreg
%R4L<def,tied1> = BRCT %R4L<kill,tied0>, <BB#265>, %CC<imp-def,dead>
Successors according to CFG: BB#266(0x04000000 / 0x80000000 = 3.12%)
BB#265(0x7c000000 / 0x80000000 = 96.88%)
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161016/fa3acc17/attachment-0001.html>
More information about the llvm-bugs
mailing list