[llvm-dev] LoopStrengthReduction generates false code

Eli Friedman via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 9 12:56:49 PDT 2020


Hmm.  Then I'm not sure; at first glance, the debug output looks fine either way.  Could you show the IR after LSR, and explain why it's wrong?

-Eli

> -----Original Message-----
> From: Boris Boesler <baembel at gmx.de>
> Sent: Tuesday, June 9, 2020 11:59 AM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: [EXT] Re: [llvm-dev] LoopStrengthReduction generates false code
>
> Hm, no. I expect byte addresses - everywhere. The compiler should not know
> that the arch needs word addresses. During lowering LOAD and STORE get
> explicit conversion operations for the memory address. Even if my arch was
> byte addressed the code would be false/illegal.
>
> Boris
>
> > Am 09.06.2020 um 19:36 schrieb Eli Friedman <efriedma at quicinc.com>:
> >
> > Blindly guessing here, "memory is not byte addressed", but you never fixed
> ScalarEvolution to handle that, so it's modeling the GEP in a way you're not
> expecting.
> >
> > -Eli
> >
> >> -----Original Message-----
> >> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Boris
> Boesler
> >> via llvm-dev
> >> Sent: Tuesday, June 9, 2020 1:17 AM
> >> To: llvm-dev at lists.llvm.org
> >> Subject: [EXT] [llvm-dev] LoopStrengthReduction generates false code
> >>
> >> Hi.
> >>
> >> In my backend I get false code after using StrengthLoopReduction. In the
> >> generated code the loop index variable is multiplied by 8 (correct,
> everything
> >> is 64 bit aligned) to get an address offset, and the index variable is
> >> incremented by 1*8, which is not correct. It should be incremented by 1
> >> only. The factor 8 appears again.
> >>
> >> I compared the debug output (-debug-only=scalar-evolution,loop-reduce)
> for
> >> my backend and the ARM backend, but simply can't read/understand it.
> >> They differ in factors 4 vs 8 (ok), but there are more differences, probably
> >> caused by the implementation of TargetTransformInfo for ARM, while I
> >> haven't implemented it for my arch, yet.
> >>
> >> How can I debug this further? In my arch everything is 64 bit aligned
> (factor 8
> >> in many passes) and the memory is not byte addressed.
> >>
> >> Thanks,
> >> Boris
> >>
> >> ----8<----
> >>
> >> LLVM assembly:
> >>
> >> @buffer = common dso_local global [10 x i32] zeroinitializer, align 4
> >>
> >> ; Function Attrs: nounwind
> >> define dso_local void @some_main(i32* %result) local_unnamed_addr #0
> {
> >> entry:
> >>  tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x
> i32]*
> >> @buffer, i32 0, i32 0)) #2
> >>  br label %while.body
> >>
> >> while.body:                                       ; preds = %entry, %while.body
> >>  %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]
> >>  %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 0,
> >> i32 %i.010
> >>  %0 = load i32, i32* %arrayidx, align 4, !tbaa !2
> >>  %cmp1 = icmp ne i32 %0, -559038737
> >>  %inc = add nuw nsw i32 %i.010, 1
> >>  %cmp11 = icmp eq i32 %i.010, 0
> >>  %cmp = or i1 %cmp11, %cmp1
> >>  br i1 %cmp, label %while.body, label %while.end
> >>
> >> while.end:                                        ; preds = %while.body
> >>  %arrayidx2 = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32
> 0,
> >> i32 %i.010
> >>  %1 = load i32, i32* %arrayidx2, align 4, !tbaa !2
> >>  store volatile i32 %1, i32* %result, align 4, !tbaa !2
> >>  ret void
> >> }
> >>
> >> declare dso_local void @fill_array(i32*) local_unnamed_addr #1
> >>
> >> attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-
> math"="false"
> >> "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-
> pointer-
> >> elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false"
> >> "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-
> >> math"="false" "no-trapping-math"="false" "stack-protector-buffer-
> size"="8"
> >> "unsafe-fp-math"="false" "use-soft-float"="false" }
> >> attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-
> >> tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-
> >> elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false"
> >> "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-
> trapping-
> >> math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false"
> >> "use-soft-float"="false" }
> >> attributes #2 = { nounwind }
> >>
> >> !llvm.module.flags = !{!0}
> >> !llvm.ident = !{!1}
> >>
> >> !0 = !{i32 1, !"wchar_size", i32 4}
> >> !1 = !{!"clang version 7.0.1 (tags/RELEASE_701/final)"}
> >> !2 = !{!3, !3, i64 0}
> >> !3 = !{!"int", !4, i64 0}
> >> !4 = !{!"omnipotent char", !5, i64 0}
> >> !5 = !{!"Simple C/C++ TBAA"}
> >>
> >>
> >> (-debug-only=scalar-evolution,loop-reduce) for my arch:
> >>
> >> LSR on loop %while.body:
> >> Collecting IV Chains.
> >> IV Chain#0 Head: (  %0 = load i32, i32* %arrayidx, align 4, !tbaa !2)
> >> IV={@buffer,+,8}<nsw><%while.body>
> >> IV Chain#1 Head: (  %cmp11 = icmp eq i32 %i.010, 0)
> >> IV={0,+,1}<nuw><nsw><%while.body>
> >> IV Chain#1  Inc: (  %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) IV+1
> >> Chain:   %cmp11 = icmp eq i32 %i.010, 0 Cost: 0
> >> LSR has identified the following interesting factors and types: *8
> >> LSR is examining the following fixup sites:
> >>  UserInst=%cmp11, OperandValToReplace=%i.010
> >>  UserInst=%0, OperandValToReplace=%arrayidx
> >> LSR found 2 uses:
> >> LSR is examining the following uses:
> >>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
> >>    reg({0,+,-1}<nw><%while.body>)
> >>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
> type:
> >> i32*
> >>    reg({@buffer,+,8}<nsw><%while.body>)
> >>
> >> After generating reuse formulae:
> >> LSR is examining the following uses:
> >>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
> >>    reg({0,+,-1}<nw><%while.body>)
> >>    reg({0,+,8}<nuw><nsw><%while.body>)
> >>    reg({0,+,1}<nuw><nsw><%while.body>)
> >>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
> type:
> >> i32*
> >>    reg({@buffer,+,8}<nsw><%while.body>)
> >>    reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>)
> >> Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type:
> i32
> >>  Filtering out formula reg({0,+,1}<nuw><nsw><%while.body>)
> >>    in favor of formula reg({0,+,-1}<nw><%while.body>)
> >> Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0},
> >> widest fixup type: i32*
> >>
> >> After filtering out undesirable candidates:
> >> LSR is examining the following uses:
> >>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
> >>    reg({0,+,-1}<nw><%while.body>)
> >>    reg({0,+,8}<nuw><nsw><%while.body>)
> >>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
> type:
> >> i32*
> >>    reg({@buffer,+,8}<nsw><%while.body>)
> >>    reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>)
> >> New best at 2 instructions 2 regs, with addrec cost 2.
> >> Regs: {0,+,-1}<nw><%while.body> {@buffer,+,8}<nsw><%while.body>
> >> New best at 2 instructions 2 regs, with addrec cost 1, plus 1 base add.
> >> Regs: {0,+,8}<nuw><nsw><%while.body> @buffer
> >>
> >> The chosen solution requires 2 instructions 2 regs, with addrec cost 1,
> plus 1
> >> base add:
> >>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
> >>    reg({0,+,8}<nuw><nsw><%while.body>)
> >>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
> type:
> >> i32*
> >>    reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>)
> >>
> >>
> >> (-debug-only=scalar-evolution,loop-reduce) for ARM:
> >>
> >> LSR on loop %while.body:
> >> Collecting IV Chains.
> >> IV Chain#0 Head: (  %0 = load i32, i32* %arrayidx, align 4, !tbaa !2)
> >> IV={@buffer,+,4}<nsw><%while.body>
> >> IV Chain#1 Head: (  %cmp11 = icmp eq i32 %i.010, 0)
> >> IV={0,+,1}<nuw><nsw><%while.body>
> >> IV Chain#1  Inc: (  %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) IV+1
> >> Chain:   %cmp11 = icmp eq i32 %i.010, 0 Cost: 0
> >> LSR has identified the following interesting factors and types: *4
> >> LSR is examining the following fixup sites:
> >>  UserInst=%cmp11, OperandValToReplace=%i.010
> >>  UserInst=%0, OperandValToReplace=%arrayidx
> >> LSR found 2 uses:
> >> LSR is examining the following uses:
> >>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
> >>    reg({0,+,-1}<nw><%while.body>)
> >>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
> type:
> >> i32*
> >>    reg({@buffer,+,4}<nsw><%while.body>)
> >>
> >> After generating reuse formulae:
> >> LSR is examining the following uses:
> >>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
> >>    reg({0,+,-1}<nw><%while.body>)
> >>    reg({0,+,4}<nuw><nsw><%while.body>)
> >>    reg({0,+,1}<nuw><nsw><%while.body>)
> >>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
> type:
> >> i32*
> >>    reg({@buffer,+,4}<nsw><%while.body>)
> >>    reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>)
> >>    -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>)
> >>    reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>)
> >>    reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>)
> >>    reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>)
> >> Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type:
> i32
> >> Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0},
> >> widest fixup type: i32*
> >>  Filtering out formula -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>)
> >>    in favor of formula reg({@buffer,+,4}<nsw><%while.body>)
> >>  Filtering out formula reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>)
> >>    in favor of formula reg({@buffer,+,4}<nsw><%while.body>)
> >>
> >> After filtering out undesirable candidates:
> >> LSR is examining the following uses:
> >>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
> >>    reg({0,+,-1}<nw><%while.body>)
> >>    reg({0,+,4}<nuw><nsw><%while.body>)
> >>    reg({0,+,1}<nuw><nsw><%while.body>)
> >>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
> type:
> >> i32*
> >>    reg({@buffer,+,4}<nsw><%while.body>)
> >>    reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>)
> >>    reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>)
> >>    reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>)
> >> New best at 1 instruction 2 regs, with addrec cost 1.
> >> Regs: {0,+,-1}<nw><%while.body> @buffer
> >>
> >> The chosen solution requires 1 instruction 2 regs, with addrec cost 1:
> >>  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
> >>    reg({0,+,-1}<nw><%while.body>)
> >>  LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup
> type:
> >> i32*
> >>    reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>)
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list