[LLVMbugs] [Bug 2056] New: Significant missed block merging opportunity

Sun Feb 17 22:26:16 PST 2008

http://llvm.org/bugs/show_bug.cgi?id=2056

           Summary: Significant missed block merging opportunity
           Product: libraries
           Version: 2.2
          Platform: PC
        OS/Version: All
            Status: NEW
          Keywords: code-quality
          Severity: normal
          Priority: P2
         Component: Scalar Optimizations
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: sabre at nondot.org
                CC: llvmbugs at cs.uiuc.edu


Applying this patch to simplifycfg disables cloning of return instructions into
predecessor blocks.  This should be a simple code size win, and it allows
simplifycfg to be better at turning simple cfg's into select instructions:

--- SimplifyCFG.cpp     (revision 47234)
+++ SimplifyCFG.cpp     (working copy)
@@ -1245,7 +1245,7 @@
       }

       // If we found some, do the transformation!
-      if (!UncondBranchPreds.empty()) {
+      if (0 && !UncondBranchPreds.empty()) {
         while (!UncondBranchPreds.empty()) {
           BasicBlock *Pred = UncondBranchPreds.back();
           DOUT << "FOLDING: " << *BB

However, this causes significant code size pessimizations.  In the case where
the return  block is cloned, the .llvm.bc file ends up with lots of blocks like
this (examples from 'agrep' in multisource/benchmarks):

bb393.i.i:              ; preds = %bb388.i.i
        %tmp394.i.i = load i32* @num_of_matched, align 4                ; <i32>
[#uses=1]
        %tmp395.i.i = add i32 %tmp394.i.i, 1            ; <i32> [#uses=1]
        store i32 %tmp395.i.i, i32* @num_of_matched, align 4
        %tmp396.i.i = call i32 @puts( i8* getelementptr ([256 x i8]*
@CurrentFileName, i32 0, i32 0) ) nounwind                 ; <i32> [#uses=0]
        ret void

When the return is not cloned, we end up with lots of code like this:

bb393.i.i:              ; preds = %bb388.i.i
        %tmp394.i.i = load i32* @num_of_matched, align 4                ; <i32>
[#uses=1]
        %tmp395.i.i = add i32 %tmp394.i.i, 1            ; <i32> [#uses=1]
        store i32 %tmp395.i.i, i32* @num_of_matched, align 4
        %tmp396.i.i = call i32 @puts( i8* getelementptr ([256 x i8]*
@CurrentFileName, i32 0, i32 0) ) nounwind                 ; <i32> [#uses=0]
        br label %UnifiedReturnBlock

where UnifiedReturnBlock does a simple return.

The major difference between these is that in the former case, the code
generator's block merging pass is able to merge away all of these common
blocks, but when the return is shared, this optimization doesn't happen.  Since
this sort of thing really happens due to inlining (e.g where F calls G multiple
times), we should be able to eliminate this redundancy at the LLVM level, for
example in simplifycfg.


-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.