<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Performance degradation of memcmp(size=24) after commit 308322"
href="https://bugs.llvm.org/show_bug.cgi?id=33914">33914</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Performance degradation of memcmp(size=24) after commit 308322
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>zvi.rackover@intel.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>We observed a 13% degradation in an internal benchmark after commit 308322.
The minimal reproducer:
define i32 @foo(i8* %A, i8* %B) {
%res = call i32 @memcmp(i8* %A, i8* %B, i64 24)
ret i32 %res
}
declare i32 @memcmp(i8* nocapture, i8* nocapture, i64) local_unnamed_addr #5
Before the commit the call to memcmp was lowered to a call to glibc's memcmp
which was dispatched to __memcmp_sse4_1. The hot code in __memcmp_sse4_1 was
doing 1-XMM-load pair+ptest+jcc+8-byte-load-pair+cmp+jcc:
...
│ movdqu -0x18(%rdi),%xmm2
│ movdqu -0x18(%rsi),%xmm1
11.11 │ pxor %xmm1,%xmm2
7.41 │ ptest %xmm2,%xmm0
14.81 │ ↓ jae 15e8
14.81 │ mov -0x8(%rsi),%rcx
│ mov -0x8(%rdi),%rax
│ cmp %rax,%rcx
│ ↓ jne 1603
│ xor %eax,%eax
11.11 │ ← retq
...
After the commit the memcmp is expanded inline to three
8-byte-load-pairs+cmp+jcc's:
...
# # BB#0: # %loadbb
movbeq (%rdi), %rcx
movbeq (%rsi), %rdx
cmpq %rdx, %rcx
jne .LBB0_1
# BB#2: # %loadbb1
movbeq 8(%rdi), %rcx
movbeq 8(%rsi), %rdx
cmpq %rdx, %rcx
jne .LBB0_1
# BB#3: # %loadbb2
movbeq 16(%rdi), %rcx
movbeq 16(%rsi), %rdx
xorl %eax, %eax
cmpq %rdx, %rcx
jne .LBB0_1
# BB#4: # %endblock
retq
.LBB0_1: # %res_block
cmpq %rdx, %rcx
movl $-1, %ecx
movl $1, %eax
cmovbl %ecx, %eax
retq
...
Options for fixing:
1. Improve the inline expansion to generate a similar sequence to glibc's: 1
16-byte pair load + ptest + jcc + 8-byte load + cmp + jmp
2. call libc's memcmp
I would like to request this commit be reverted until we get this issue fixed.
Thanks.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>