[LLVMbugs] [Bug 13105] New: suboptimal loop block placement in TSCP

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Wed Jun 13 09:24:44 PDT 2012


http://llvm.org/bugs/show_bug.cgi?id=13105

             Bug #: 13105
           Summary: suboptimal loop block placement in TSCP
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Common Code Generator Code
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: benny.kra at gmail.com
                CC: chandlerc at gmail.com, llvmbugs at cs.uiuc.edu
    Classification: Unclassified


TSCP's eval.c has the following loop

---8<---
    for (i = 0; i < 64; ++i) {
        if (color[i] == EMPTY)
            continue;
        if (piece[i] == PAWN) {
            pawn_mat[color[i]] += piece_value[PAWN];
            ...
        }
        else
            piece_mat[color[i]] += piece_value[piece[i]];
    }
--->8---

the continue forms a tight loop (a chess field is always >=50% empty) with the
loop exit block and should be emitted next to each other. But we currently
emit:

---8<---
    .align    4, 0x90
LBB0_1:                                 ## %for.body5
                                        ## =>This Inner Loop Header: Depth=1
    movslq    (%r10,%rsi,4), %rbx
    cmpq    $6, %rbx
    je    LBB0_9
## BB#2:                                ## %if.end
                                        ##   in Loop: Header=BB0_1 Depth=1
    movslq    (%r11,%rsi,4), %rcx
    testq    %rcx, %rcx
    je    LBB0_3
## BB#8:                                ## %if.else48
                                        ##   in Loop: Header=BB0_1 Depth=1
    movl    (%rdx,%rcx,4), %ecx
    addl    %ecx, (%r9,%rbx,4)
LBB0_9:                                 ## %for.inc59
                                        ##   in Loop: Header=BB0_1 Depth=1
    incq    %rsi
    cmpl    $64, %esi
    jne    LBB0_1
--->8---

Where LBB0_1 is the "if (color[i] == EMPTY)" part and LBB0_9 contains the loop
backedge. If LBB0_9 is rotated before LBB0_1, TCSP's benchmark mode speeds up
by ~5% on my westmere mbp.

$ cd tscp181
$ clang -O3 *.c -o tscp
$ echo bench | ./tscp | grep Best
Best time: 773 ms
$ echo bench | ./tscp-blocks-reordered | grep Best
Best time: 722 ms

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.



More information about the llvm-bugs mailing list