[llvm-dev] LoopStrengthReduction generates false code
Boris Boesler via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 9 01:17:20 PDT 2020
Hi.
In my backend I get false code after using StrengthLoopReduction. In the generated code the loop index variable is multiplied by 8 (correct, everything is 64 bit aligned) to get an address offset, and the index variable is incremented by 1*8, which is not correct. It should be incremented by 1 only. The factor 8 appears again.
I compared the debug output (-debug-only=scalar-evolution,loop-reduce) for my backend and the ARM backend, but simply can't read/understand it. They differ in factors 4 vs 8 (ok), but there are more differences, probably caused by the implementation of TargetTransformInfo for ARM, while I haven't implemented it for my arch, yet.
How can I debug this further? In my arch everything is 64 bit aligned (factor 8 in many passes) and the memory is not byte addressed.
Thanks,
Boris
----8<----
LLVM assembly:
@buffer = common dso_local global [10 x i32] zeroinitializer, align 4
; Function Attrs: nounwind
define dso_local void @some_main(i32* %result) local_unnamed_addr #0 {
entry:
tail call void @fill_array(i32* getelementptr inbounds ([10 x i32], [10 x i32]* @buffer, i32 0, i32 0)) #2
br label %while.body
while.body: ; preds = %entry, %while.body
%i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]
%arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 0, i32 %i.010
%0 = load i32, i32* %arrayidx, align 4, !tbaa !2
%cmp1 = icmp ne i32 %0, -559038737
%inc = add nuw nsw i32 %i.010, 1
%cmp11 = icmp eq i32 %i.010, 0
%cmp = or i1 %cmp11, %cmp1
br i1 %cmp, label %while.body, label %while.end
while.end: ; preds = %while.body
%arrayidx2 = getelementptr inbounds [10 x i32], [10 x i32]* @buffer, i32 0, i32 %i.010
%1 = load i32, i32* %arrayidx2, align 4, !tbaa !2
store volatile i32 %1, i32* %result, align 4, !tbaa !2
ret void
}
declare dso_local void @fill_array(i32*) local_unnamed_addr #1
attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 7.0.1 (tags/RELEASE_701/final)"}
!2 = !{!3, !3, i64 0}
!3 = !{!"int", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}
(-debug-only=scalar-evolution,loop-reduce) for my arch:
LSR on loop %while.body:
Collecting IV Chains.
IV Chain#0 Head: ( %0 = load i32, i32* %arrayidx, align 4, !tbaa !2) IV={@buffer,+,8}<nsw><%while.body>
IV Chain#1 Head: ( %cmp11 = icmp eq i32 %i.010, 0) IV={0,+,1}<nuw><nsw><%while.body>
IV Chain#1 Inc: ( %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) IV+1
Chain: %cmp11 = icmp eq i32 %i.010, 0 Cost: 0
LSR has identified the following interesting factors and types: *8
LSR is examining the following fixup sites:
UserInst=%cmp11, OperandValToReplace=%i.010
UserInst=%0, OperandValToReplace=%arrayidx
LSR found 2 uses:
LSR is examining the following uses:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({0,+,-1}<nw><%while.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
reg({@buffer,+,8}<nsw><%while.body>)
After generating reuse formulae:
LSR is examining the following uses:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({0,+,-1}<nw><%while.body>)
reg({0,+,8}<nuw><nsw><%while.body>)
reg({0,+,1}<nuw><nsw><%while.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
reg({@buffer,+,8}<nsw><%while.body>)
reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>)
Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
Filtering out formula reg({0,+,1}<nuw><nsw><%while.body>)
in favor of formula reg({0,+,-1}<nw><%while.body>)
Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
After filtering out undesirable candidates:
LSR is examining the following uses:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({0,+,-1}<nw><%while.body>)
reg({0,+,8}<nuw><nsw><%while.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
reg({@buffer,+,8}<nsw><%while.body>)
reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>)
New best at 2 instructions 2 regs, with addrec cost 2.
Regs: {0,+,-1}<nw><%while.body> {@buffer,+,8}<nsw><%while.body>
New best at 2 instructions 2 regs, with addrec cost 1, plus 1 base add.
Regs: {0,+,8}<nuw><nsw><%while.body> @buffer
The chosen solution requires 2 instructions 2 regs, with addrec cost 1, plus 1 base add:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({0,+,8}<nuw><nsw><%while.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
reg(@buffer) + 1*reg({0,+,8}<nuw><nsw><%while.body>)
(-debug-only=scalar-evolution,loop-reduce) for ARM:
LSR on loop %while.body:
Collecting IV Chains.
IV Chain#0 Head: ( %0 = load i32, i32* %arrayidx, align 4, !tbaa !2) IV={@buffer,+,4}<nsw><%while.body>
IV Chain#1 Head: ( %cmp11 = icmp eq i32 %i.010, 0) IV={0,+,1}<nuw><nsw><%while.body>
IV Chain#1 Inc: ( %i.010 = phi i32 [ 0, %entry ], [ %inc, %while.body ]) IV+1
Chain: %cmp11 = icmp eq i32 %i.010, 0 Cost: 0
LSR has identified the following interesting factors and types: *4
LSR is examining the following fixup sites:
UserInst=%cmp11, OperandValToReplace=%i.010
UserInst=%0, OperandValToReplace=%arrayidx
LSR found 2 uses:
LSR is examining the following uses:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({0,+,-1}<nw><%while.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
reg({@buffer,+,4}<nsw><%while.body>)
After generating reuse formulae:
LSR is examining the following uses:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({0,+,-1}<nw><%while.body>)
reg({0,+,4}<nuw><nsw><%while.body>)
reg({0,+,1}<nuw><nsw><%while.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
reg({@buffer,+,4}<nsw><%while.body>)
reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>)
-1*reg({(-1 * @buffer),+,-4}<nw><%while.body>)
reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>)
reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>)
reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>)
Filtering for use LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
Filtering for use LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
Filtering out formula -1*reg({(-1 * @buffer),+,-4}<nw><%while.body>)
in favor of formula reg({@buffer,+,4}<nsw><%while.body>)
Filtering out formula reg(@buffer) + -1*reg({0,+,-4}<nw><%while.body>)
in favor of formula reg({@buffer,+,4}<nsw><%while.body>)
After filtering out undesirable candidates:
LSR is examining the following uses:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({0,+,-1}<nw><%while.body>)
reg({0,+,4}<nuw><nsw><%while.body>)
reg({0,+,1}<nuw><nsw><%while.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
reg({@buffer,+,4}<nsw><%while.body>)
reg(@buffer) + 1*reg({0,+,4}<nuw><nsw><%while.body>)
reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>)
reg(@buffer) + 4*reg({0,+,1}<nuw><nsw><%while.body>)
New best at 1 instruction 2 regs, with addrec cost 1.
Regs: {0,+,-1}<nw><%while.body> @buffer
The chosen solution requires 1 instruction 2 regs, with addrec cost 1:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({0,+,-1}<nw><%while.body>)
LSR Use: Kind=Address of i32 in addrspace(0), Offsets={0}, widest fixup type: i32*
reg(@buffer) + -4*reg({0,+,-1}<nw><%while.body>)
More information about the llvm-dev
mailing list