[llvm-commits] [llvm] r169791 - in /llvm/trunk: include/llvm/Target/ lib/CodeGen/SelectionDAG/ lib/Target/ARM/ lib/Target/Mips/ lib/Target/X86/ test/CodeGen/ARM/ test/CodeGen/X86/

Tue Dec 11 16:43:16 PST 2012

r169944

Evan
On Dec 11, 2012, at 1:48 PM, Eli Friedman <eli.friedman at gmail.com> wrote:

> On Mon, Dec 10, 2012 at 5:35 PM, Evan Cheng <evan.cheng at apple.com> wrote:
>> 
>> On Dec 10, 2012, at 3:35 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> 
>>> On Mon, Dec 10, 2012 at 3:21 PM, Evan Cheng <evan.cheng at apple.com> wrote:
>>>> Author: evancheng
>>>> Date: Mon Dec 10 17:21:26 2012
>>>> New Revision: 169791
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=169791&view=rev
>>>> Log:
>>>> Some enhancements for memcpy / memset inline expansion.
>>>> 1. Teach it to use overlapping unaligned load / store to copy / set the trailing
>>>>  bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
>>>> 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
>>>>  x86 and ARM.
>>> 
>>> This won't work correctly on x86 if we don't have SSE2.  (Loading an
>>> f64 into an x87 register is a lossy operation.)
>> 
>> That should not happen with this patch.
> 
> No?
> 
> $ clang -S -o - -x c -m32 -msse -mno-sse2 - -O2 -march=corei7-avx
> #include <stdlib.h>
> #include <string.h>
> void f(void* a, void* b) {
>  memcpy(a,b,24);
> }
> ^D      .section	__TEXT,__text,regular,pure_instructions
> 	.globl	_f
> 	.align	4, 0x90
> _f:                                     ## @f
> ## BB#0:                                ## %cond.end
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	movl	12(%ebp), %eax
> 	fldl	16(%eax)
> 	movl	8(%ebp), %ecx
> 	fstpl	16(%ecx)
> 	movups	(%eax), %xmm0
> 	movups	%xmm0, (%ecx)
> 	popl	%ebp
> 	ret
> 
> 
> .subsections_via_symbols
> 
> 
> -Eli