[LLVMbugs] [Bug 2056] New: Significant missed block merging opportunity
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Sun Feb 17 22:26:16 PST 2008
http://llvm.org/bugs/show_bug.cgi?id=2056
Summary: Significant missed block merging opportunity
Product: libraries
Version: 2.2
Platform: PC
OS/Version: All
Status: NEW
Keywords: code-quality
Severity: normal
Priority: P2
Component: Scalar Optimizations
AssignedTo: unassignedbugs at nondot.org
ReportedBy: sabre at nondot.org
CC: llvmbugs at cs.uiuc.edu
Applying this patch to simplifycfg disables cloning of return instructions into
predecessor blocks. This should be a simple code size win, and it allows
simplifycfg to be better at turning simple cfg's into select instructions:
--- SimplifyCFG.cpp (revision 47234)
+++ SimplifyCFG.cpp (working copy)
@@ -1245,7 +1245,7 @@
}
// If we found some, do the transformation!
- if (!UncondBranchPreds.empty()) {
+ if (0 && !UncondBranchPreds.empty()) {
while (!UncondBranchPreds.empty()) {
BasicBlock *Pred = UncondBranchPreds.back();
DOUT << "FOLDING: " << *BB
However, this causes significant code size pessimizations. In the case where
the return block is cloned, the .llvm.bc file ends up with lots of blocks like
this (examples from 'agrep' in multisource/benchmarks):
bb393.i.i: ; preds = %bb388.i.i
%tmp394.i.i = load i32* @num_of_matched, align 4 ; <i32>
[#uses=1]
%tmp395.i.i = add i32 %tmp394.i.i, 1 ; <i32> [#uses=1]
store i32 %tmp395.i.i, i32* @num_of_matched, align 4
%tmp396.i.i = call i32 @puts( i8* getelementptr ([256 x i8]*
@CurrentFileName, i32 0, i32 0) ) nounwind ; <i32> [#uses=0]
ret void
When the return is not cloned, we end up with lots of code like this:
bb393.i.i: ; preds = %bb388.i.i
%tmp394.i.i = load i32* @num_of_matched, align 4 ; <i32>
[#uses=1]
%tmp395.i.i = add i32 %tmp394.i.i, 1 ; <i32> [#uses=1]
store i32 %tmp395.i.i, i32* @num_of_matched, align 4
%tmp396.i.i = call i32 @puts( i8* getelementptr ([256 x i8]*
@CurrentFileName, i32 0, i32 0) ) nounwind ; <i32> [#uses=0]
br label %UnifiedReturnBlock
where UnifiedReturnBlock does a simple return.
The major difference between these is that in the former case, the code
generator's block merging pass is able to merge away all of these common
blocks, but when the return is shared, this optimization doesn't happen. Since
this sort of thing really happens due to inlining (e.g where F calls G multiple
times), we should be able to eliminate this redundancy at the LLVM level, for
example in simplifycfg.
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list