[llvm-bugs] [Bug 24850] New: LLVM built 445.gobmk is 17% slower than gcc on power8

Wed Sep 16 14:42:03 PDT 2015

https://llvm.org/bugs/show_bug.cgi?id=24850

            Bug ID: 24850
           Summary: LLVM built 445.gobmk is 17% slower than gcc on power8
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: PowerPC
          Assignee: unassignedbugs at nondot.org
          Reporter: carrot at google.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

LLVM built 445.gobmk is 17% slower than gcc built binary on power8.

gcc   438s
llvm  512s

For input data trevord.tst, llvm is 18% slower.

The problem is in function popgo. In gcc built binary it consumes 4.11% of
time, in llvm built binary it consumes 13.98% of time.

The related code snippet is in engine/board.c:

struct change_stack_entry {
  int *address;
  int value;
};
static struct change_stack_entry *change_stack_pointer;

#define POP_MOVE()\
  while ((--change_stack_pointer)->address)\
  *(change_stack_pointer->address) =\
  change_stack_pointer->value

LLVM generated code sequence is:

   68.05 :        1000a9f0:   ld      r3,-22832(r29)           // A
    0.66 :        1000a9f4:   addi    r4,r3,-16
    0.17 :        1000a9f8:   std     r4,-22832(r29)            // B
    0.02 :        1000a9fc:   ori     r2,r2,0
   14.30 :        1000aa00:   ld      r4,-16(r3)
    0.00 :        1000aa04:   cmpldi  r4,0
    0.00 :        1000aa08:   beq     1000aa18 <popgo+0xa8>
    0.53 :        1000aa0c:   lwz     r3,-8(r3)
    0.11 :        1000aa10:   stw     r3,0(r4)
    0.00 :        1000aa14:   b       1000a9f0 <popgo+0x80>

Instruction A reads variable change_stack_pointer, instruction B writes
change_stack_pointer.

GCC generated code sequence is:

   48.30 :        10010280:   lwz     r8,24(r9)
    0.00 :        10010284:   mr      r7,r9
    0.00 :        10010288:   addi    r9,r9,-16
    0.63 :        1001028c:   stw     r8,0(r10)
    0.00 :        10010290:   ld      r10,16(r9)
    0.00 :        10010294:   cmpdi   cr7,r10,0
    0.00 :        10010298:   bne     cr7,10010280 <popgo+0x90>
   15.54 :        1001029c:   nop

Note that variable change_stack_pointer is in register r9, it reads it at the
start of the function, and writes it after the loop. Since the address of
change_stack_pointer is never assigned to another variable, and it's a static
variable, so it can't be aliased with any other pointer, so it is safe to do
this optimization. 

Even if I add -fstrict-aliasing explicitly to llvm command line, it can move
the read of change_stack_pointer out of the loop, but still contains write of
change_stack_pointer in the loop.

Command line options are:

-DSPEC_CPU -DNDEBUG -DHAVE_CONFIG_H -I. -I.. -I../include -I./include
-fno-strict-aliasing -O2 -m64 -mvsx -mcpu=power8    -DSPEC_CPU_LP64

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150916/f9d9798d/attachment.html>