[llvm-commits] [llvm] r51060 - /llvm/trunk/lib/Target/X86/README-SSE.txt
Evan Cheng
evan.cheng at apple.com
Tue May 13 12:05:54 PDT 2008
On May 13, 2008, at 11:48 AM, Chris Lattner wrote:
> Author: lattner
> Date: Tue May 13 13:48:54 2008
> New Revision: 51060
>
> URL: http://llvm.org/viewvc/llvm-project?rev=51060&view=rev
> Log:
> add a note
>
> Modified:
> llvm/trunk/lib/Target/X86/README-SSE.txt
>
> Modified: llvm/trunk/lib/Target/X86/README-SSE.txt
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/README-SSE.txt?rev=51060&r1=51059&r2=51060&view=diff
>
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> --- llvm/trunk/lib/Target/X86/README-SSE.txt (original)
> +++ llvm/trunk/lib/Target/X86/README-SSE.txt Tue May 13 13:48:54 2008
> @@ -764,4 +764,28 @@
>
> //
> =
> =
> =
> ---------------------------------------------------------------------
> ===//
>
> +Consider:
> +#include <emmintrin.h>
> +__m128 foo2 (float x) {
> + return _mm_set_ps (0, 0, x, 0);
> +}
> +
> +In x86-32 mode, we generate this spiffy code:
> +
> +_foo2:
> + movss 4(%esp), %xmm0
> + pshufd $81, %xmm0, %xmm0
> + ret
> +
> +in x86-64 mode, we generate this code, which could be better:
> +
> +_foo2:
> + xorps %xmm1, %xmm1
> + movss %xmm0, %xmm1
> + pshufd $81, %xmm1, %xmm0
> + ret
True.
>
> +
> +In sse4 mode, we could use insertps to make both better.
This is not clear. Is insertps fastish (I suppose the load folding
variant is better than 2 instruction)? Right now the x86 backend is
very reluctant to make use of any extracting and insertion instructions.
Evan
>
> +
> +//
> =
> =
> =
> ---------------------------------------------------------------------
> ===//
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list