[llvm-commits] [llvm] r51060 - /llvm/trunk/lib/Target/X86/README-SSE.txt

Evan Cheng evan.cheng at apple.com
Tue May 13 12:05:54 PDT 2008


On May 13, 2008, at 11:48 AM, Chris Lattner wrote:

> Author: lattner
> Date: Tue May 13 13:48:54 2008
> New Revision: 51060
>
> URL: http://llvm.org/viewvc/llvm-project?rev=51060&view=rev
> Log:
> add a note
>
> Modified:
>    llvm/trunk/lib/Target/X86/README-SSE.txt
>
> Modified: llvm/trunk/lib/Target/X86/README-SSE.txt
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/README-SSE.txt?rev=51060&r1=51059&r2=51060&view=diff
>
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> --- llvm/trunk/lib/Target/X86/README-SSE.txt (original)
> +++ llvm/trunk/lib/Target/X86/README-SSE.txt Tue May 13 13:48:54 2008
> @@ -764,4 +764,28 @@
>
> // 
> = 
> = 
> = 
> --------------------------------------------------------------------- 
> ===//
>
> +Consider:
> +#include <emmintrin.h>
> +__m128 foo2 (float x) {
> + return _mm_set_ps (0, 0, x, 0);
> +}
> +
> +In x86-32 mode, we generate this spiffy code:
> +
> +_foo2:
> +	movss	4(%esp), %xmm0
> +	pshufd	$81, %xmm0, %xmm0
> +	ret
> +
> +in x86-64 mode, we generate this code, which could be better:
> +
> +_foo2:
> +	xorps	%xmm1, %xmm1
> +	movss	%xmm0, %xmm1
> +	pshufd	$81, %xmm1, %xmm0
> +	ret

True.

>
> +
> +In sse4 mode, we could use insertps to make both better.

This is not clear. Is insertps fastish (I suppose the load folding  
variant is better than 2 instruction)? Right now the x86 backend is  
very reluctant to make use of any extracting and insertion instructions.

Evan

>
> +
> +// 
> = 
> = 
> = 
> --------------------------------------------------------------------- 
> ===//
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list