[LLVMbugs] [Bug 10872] New: [loop-idiom] GVN fails to remove loads after loop-idiom recognition

Tue Sep 6 10:30:51 PDT 2011

http://llvm.org/bugs/show_bug.cgi?id=10872

           Summary: [loop-idiom] GVN fails to remove loads after
                    loop-idiom recognition
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: ASSIGNED
          Severity: normal
          Priority: P
         Component: Scalar Optimizations
        AssignedTo: resistor at mac.com
        ReportedBy: atrick at apple.com
                CC: llvmbugs at cs.uiuc.edu

Test case: SingleSource/Benchmarks/Stanford test.simple.Puzzle on A9 is 14%
slower with -unroll-scev.

Running with -O3 optimizes to O3.ll (0.54s)

-unroll-scev produces O3-unroll-scev.ll (0.64s)

These runs were using r138990. -unroll-scev will soon be default, but
-disable-unroll-scev will be available.

llc -mcpu=cortex-a9 -relocation-model=pic -disable-fp-elim
-disable-non-leaf-fp-elim O3.ll

-unroll-scev exposes more opportunities for memset_pattern, resulting in:

  call void @memset_pattern16(i8* bitcast (i32* getelementptr inbounds ([13 x
[512 x i32]]* @p, i32 0, i32 6, i32 0) to i8*), i8* bitcast ([4 x i32]*
@.memset_pattern3 to i8*), i32 12) nounwind

This is fine, but then GVN fails to remove the subsequent loads:

  %tmp2.i = load i32* getelementptr inbounds ([13 x i32]* @piecemax, i32 0, i32
0), align 4, !tbaa !3

for.end.i:                                        ; preds = %for.inc.i8,
%if.then
  %tmp14.i = load i32* getelementptr inbounds ([13 x i32]* @class, i32 0, i32
0), align 4, !tbaa !3

Removing %tmp2.i exposes lots of constant folding, but it is the removal of
%tmp14.i that speeds up the benchmark.

Also rdar://10065079

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.