[llvm-commits] [llvm] r169791 - in /llvm/trunk: include/llvm/Target/ lib/CodeGen/SelectionDAG/ lib/Target/ARM/ lib/Target/Mips/ lib/Target/X86/ test/CodeGen/ARM/ test/CodeGen/X86/

Tue Dec 11 13:48:04 PST 2012

On Mon, Dec 10, 2012 at 5:35 PM, Evan Cheng <evan.cheng at apple.com> wrote:
>
> On Dec 10, 2012, at 3:35 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>
>> On Mon, Dec 10, 2012 at 3:21 PM, Evan Cheng <evan.cheng at apple.com> wrote:
>>> Author: evancheng
>>> Date: Mon Dec 10 17:21:26 2012
>>> New Revision: 169791
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=169791&view=rev
>>> Log:
>>> Some enhancements for memcpy / memset inline expansion.
>>> 1. Teach it to use overlapping unaligned load / store to copy / set the trailing
>>>   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
>>> 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
>>>   x86 and ARM.
>>
>> This won't work correctly on x86 if we don't have SSE2.  (Loading an
>> f64 into an x87 register is a lossy operation.)
>
> That should not happen with this patch.

No?

$ clang -S -o - -x c -m32 -msse -mno-sse2 - -O2 -march=corei7-avx
#include <stdlib.h>
#include <string.h>
void f(void* a, void* b) {
  memcpy(a,b,24);
}
^D      .section	__TEXT,__text,regular,pure_instructions
	.globl	_f
	.align	4, 0x90
_f:                                     ## @f
## BB#0:                                ## %cond.end
	pushl	%ebp
	movl	%esp, %ebp
	movl	12(%ebp), %eax
	fldl	16(%eax)
	movl	8(%ebp), %ecx
	fstpl	16(%ecx)
	movups	(%eax), %xmm0
	movups	%xmm0, (%ecx)
	popl	%ebp
	ret

.subsections_via_symbols

-Eli