[LLVMbugs] [Bug 5615] New: poor x86-64 performance depending on code alignment

Wed Nov 25 13:47:09 PST 2009

http://llvm.org/bugs/show_bug.cgi?id=5615

           Summary: poor x86-64 performance depending on code alignment
           Product: libraries
           Version: trunk
          Platform: Macintosh
        OS/Version: MacOS X
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: bob.wilson at apple.com
                CC: llvmbugs at cs.uiuc.edu


See pr3120 for background and testcase.  The "switched interpreter" runtime
degraded from 240 to 584 when I changed llvm to tail duplicate indirect
branches.  The change affected code that does not run for the "switched
interpreter", and the changes are located after the switched interpreter code. 
The only effect of the change (aside from the "threaded interpreter" code) was
that the linker adjusted the starting offsets of various functions.  The
"interpret_switch" function is aligned to a 16-byte boundary but that is
apparently not good enough to get consistently good performance.

When the start of that function was at 0x100001830, the performance was very
good (240).  At 0x100001810, it was bad (584).  At 0x100001820, it was even
worse (617).  The latter case occurred when I manually deleted some .align
directives from the threaded interpreter assembly.  When I edited the assembly
to increase the alignment of interpret_switch to 32 bytes, that function was
placed at 0x100001800 and performance improved (291).

I'm not sure what is causing these performance variations.  The branch
predictor in some x86 processors fetches aligned blocks of 32-bytes, so it may
be related to that.  If it was that simple, I don't know how to explain the
huge differences in performance when the function was at 1800 vs 1820 or 1810
vs. 1830.

To reproduce, run "make CC=llvm-gcc-4.2" with the testcase from pr3120 and then
run the test with "intrp data/*".

I used the following patch to make the test run only the "switched"
interpreter:

--- interpret.c.orig    2009-11-25 13:41:47.000000000 -0800
+++ interpret.c 2009-11-25 13:41:20.000000000 -0800
@@ -145,6 +145,7 @@
 Interpreter interpreters[] =
   {
     { &interpret_switch, "Switched interpreter" },
+    { NULL, NULL },
     { &interpret_threaded, "Threaded interpreter" },
     { &interpret_recursive, "Recursive interpreter" },
 #ifdef __BLOCKS__

The current llvm trunk has the "bad" behavior.  If you manually edit the
assembly for interpret.c and remove the .align directives from the inside for
the interpret_threaded function, the offset of interpret_switch should change.


-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.