[llvm-dev] [RFC] New pass: LoopExitValues
Steve King via llvm-dev
llvm-dev at lists.llvm.org
Tue Sep 15 12:27:44 PDT 2015
On Mon, Sep 14, 2015 at 2:10 PM, Wei Mi <wmi at google.com> wrote:
>
> I tried the patch on the matrix_mul testcase and saw where the
> redundency was removed. My patch in http://reviews.llvm.org/D12090
> cannot handle the case. I feel like it is a different problem.
>
> void matrix_mul(unsigned int Size, unsigned int *Dst, unsigned int
> *Src, unsigned int Val) {
> for (int Outer = 0; Outer < Size; ++Outer)
> for (int Inner = 0; Inner < Size; ++Inner)
> Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val;
> }
>
> "Outer * Size + Inner" is an induction variable in inner loop. Its
> SCEV is {{0,+,%Size}<outerloop>,+,1}<innerloop>, so the initial value
> of the induction variable for inner loop is {0, +, %Size}<outerloop>,
> .i.e, an "lsr.iv.next = lsr.iv + %Size" will be generated at the end
> of every iteration of outerloop to prepare the initial value of the
> induction variable in the next outerloop iteration.
>
> However, the iteration number of inner loop happens to be "Size" too,
> so the exit value of "Outer * Size + Inner" in inner loop happens to
> be the same with the initial value of "Outer * Size + Inner" in the
> next iteration of outerloop. So the exit value of the induction
> variable can be reused and "lsr.iv.next = lsr.iv + %Size" which is
> used to compute the initial value of "Outer * Size + Inner" can be
> omitted.
>
Thanks for experimenting with the patch! If I understand correctly,
you show that the redundancy is inherent in the loop and not a
byproduct of other passes. Following this insight, changing just one
line of the IR produces assembly *identical* to the effect of the new
LEV pass. Specifically, in the outer loop, the multiply can be
replaced with a PHI node that recycles the %add value coming from the
inner loop.
Should the front end be responsible for this sort of optimization?
Original:
%mul = mul i32 %Outer.026, %Size
Replacing with this line has same effect as LEV pass:
%mul = phi i32 [ %add, %for.cond.cleanup.3 ], [ 0, %entry ]
The original IR for the matrix_mul.c test case:
define void @matrix_mul(i32 %Size, i32* nocapture %Dst, i32* nocapture
readonly %Src, i32 %Val) #0 {
entry:
%cmp.25 = icmp eq i32 %Size, 0
br i1 %cmp.25, label %for.cond.cleanup, label %for.body.4.lr.ph
for.body.4.lr.ph: ; preds = %entry,
%for.cond.cleanup.3
%Outer.026 = phi i32 [ %inc10, %for.cond.cleanup.3 ], [ 0, %entry ]
;****************
%mul = mul i32 %Outer.026, %Size
;****************
br label %for.body.4
for.cond.cleanup: ; preds =
%for.cond.cleanup.3, %entry
ret void
for.cond.cleanup.3: ; preds = %for.body.4
%inc10 = add nuw nsw i32 %Outer.026, 1
%exitcond28 = icmp eq i32 %inc10, %Size
br i1 %exitcond28, label %for.cond.cleanup, label %for.body.4.lr.ph
for.body.4: ; preds =
%for.body.4, %for.body.4.lr.ph
%Inner.024 = phi i32 [ 0, %for.body.4.lr.ph ], [ %inc, %for.body.4 ]
%add = add i32 %Inner.024, %mul
%arrayidx = getelementptr inbounds i32, i32* %Src, i32 %add
%0 = load i32, i32* %arrayidx, align 4, !tbaa !1
%mul5 = mul i32 %0, %Val
%arrayidx8 = getelementptr inbounds i32, i32* %Dst, i32 %add
store i32 %mul5, i32* %arrayidx8, align 4, !tbaa !1
%inc = add nuw nsw i32 %Inner.024, 1
%exitcond = icmp eq i32 %inc, %Size
br i1 %exitcond, label %for.cond.cleanup.3, label %for.body.4
}
More information about the llvm-dev
mailing list