[llvm-bugs] [Bug 24570] New: Minor alignment change may cause big performance variation in MultiSource/Benchmarks/Ptrdist/ks

via llvm-bugs llvm-bugs at lists.llvm.org
Tue Aug 25 09:22:51 PDT 2015


https://llvm.org/bugs/show_bug.cgi?id=24570

            Bug ID: 24570
           Summary: Minor alignment change may cause big performance
                    variation in MultiSource/Benchmarks/Ptrdist/ks
           Product: Test Suite
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Programs Tests
          Assignee: unassignedbugs at nondot.org
          Reporter: wmi at google.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

When I patched the change in http://reviews.llvm.org/D12107 to r243652, I saw
-15% perf regression in MultiSource/Benchmarks/Ptrdist/ks.

Analysis using perf showed the regression was caused by a minor alignment
change. The alignment change prevented a cmp and jmp pair inside the kernel
loop of the test from being macro-fused, which increased uops-retired events a
lot.

With the patch
  4012b0:       48 39 41 08             cmp    %rax,0x8(%rcx)
  4012b4:       75 04                   jne    4012ba <FindMaxGpAndSwap+0xda>
  4012b6:       f3 0f 58 e5             addss  %xmm5,%xmm4
  4012ba:       48 8b 09                mov    (%rcx),%rcx
  4012bd:       48 85 c9                test   %rcx,%rcx
  4012c0:       75 ee                   jne    4012b0 <FindMaxGpAndSwap+0xd0>

Without the patch
  4012a0:       48 39 41 08             cmp    %rax,0x8(%rcx)
  4012a4:       75 04                   jne    4012aa <FindMaxGpAndSwap+0xda>
  4012a6:       f3 0f 58 e5             addss  %xmm5,%xmm4
  4012aa:       48 8b 09                mov    (%rcx),%rcx
  4012ad:       48 85 c9                test   %rcx,%rcx
  4012b0:       75 ee                   jne    4012a0 <FindMaxGpAndSwap+0xd0> 

With the patch, the test and jne pair in the kernel loop crossed 32 bytes
boundary, so they cannot be macro-fused.

The bug was filed to track a perf flaky test on one hand, and to record a
microarchitecture dependent perf tuning opportunity on another hand.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150825/e35a6c13/attachment.html>


More information about the llvm-bugs mailing list