[llvm] r219033 - [x86] Add a really preposterous number of patterns for matching all of
Adam Nemet
anemet at apple.com
Fri Oct 3 16:23:45 PDT 2014
On Oct 3, 2014, at 3:43 PM, Chandler Carruth <chandlerc at gmail.com> wrote:
> Author: chandlerc
> Date: Fri Oct 3 17:43:17 2014
> New Revision: 219033
>
> URL: http://llvm.org/viewvc/llvm-project?rev=219033&view=rev
> Log:
> [x86] Add a really preposterous number of patterns for matching all of
> the various ways in which blends can be used to do vector element
> insertion for lowering with the scalar math instruction forms that
> effectively re-blend with the high elements after performing the
> operation.
>
> This then allows me to bail on the element insertion lowering path when
> we have SSE4.1 and are going to be doing a normal blend, which in turn
> restores the last of the blends lost from the new vector shuffle
> lowering when I got it to prioritize insertion in other cases (for
> example when we don't *have* a blend instruction).
>
> Without the patterns, using blends here would have regressed
> sse-scalar-fp-arith.ll *completely* with the new vector shuffle
> lowering. For completeness, I've added RUN-lines with the new lowering
> here. This is somewhat superfluous as I'm about to flip the default, but
> hey, it shows that this actually significantly changed behavior.
>
> The patterns I've added are just ridiculously repetative. Suggestions on
> making them better very much welcome. In particular, handling the
> commuted form of the v2f64 patterns is somewhat obnoxious.
>
> Modified:
> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> llvm/trunk/lib/Target/X86/X86InstrSSE.td
> llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll
> llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v2.ll
> llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v4.ll
> llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=219033&r1=219032&r2=219033&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Oct 3 17:43:17 2014
> @@ -7830,6 +7830,11 @@ static SDValue lowerVectorShuffleAsEleme
> V1Mask[V2Index] = -1;
> if (!isNoopShuffleMask(V1Mask))
> return SDValue();
> + // This is essentially a special case blend operation, but if we have
> + // general purpose blend operations, they are always faster. Bail and let
> + // the rest of the lowering handle these as blends.
> + if (Subtarget->hasSSE41())
> + return SDValue();
>
> // Otherwise, use MOVSD or MOVSS.
> assert((EltVT == MVT::f32 || EltVT == MVT::f64) &&
>
> Modified: llvm/trunk/lib/Target/X86/X86InstrSSE.td
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrSSE.td?rev=219033&r1=219032&r2=219033&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86InstrSSE.td (original)
> +++ llvm/trunk/lib/Target/X86/X86InstrSSE.td Fri Oct 3 17:43:17 2014
> @@ -3125,7 +3125,6 @@ let Predicates = [UseSSE1] in {
>
> let Predicates = [UseSSE2] in {
> // SSE2 patterns to select scalar double-precision fp arithmetic instructions
> -
> def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fadd
> (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> FR64:$src))))),
> @@ -3145,10 +3144,10 @@ let Predicates = [UseSSE2] in {
> }
>
> let Predicates = [UseSSE41] in {
> - // If the subtarget has SSE4.1 but not AVX, the vector insert
> - // instruction is lowered into a X86insertps rather than a X86Movss.
> - // When selecting SSE scalar single-precision fp arithmetic instructions,
> - // make sure that we correctly match the X86insertps.
> + // If the subtarget has SSE4.1 but not AVX, the vector insert instruction is
> + // lowered into a X86insertps or a X86Blendi rather than a X86Movss. When
> + // selecting SSE scalar single-precision fp arithmetic instructions, make
> + // sure that we correctly match them.
>
> def : Pat<(v4f32 (X86insertps (v4f32 VR128:$dst), (v4f32 (scalar_to_vector
> (fadd (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> @@ -3166,6 +3165,57 @@ let Predicates = [UseSSE41] in {
> (fdiv (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> FR32:$src))), (iPTR 0))),
> (DIVSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> +
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector (fadd
> + (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> + FR32:$src))), (i8 1))),
> + (ADDSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector (fsub
> + (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> + FR32:$src))), (i8 1))),
> + (SUBSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector (fmul
> + (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> + FR32:$src))), (i8 1))),
> + (MULSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector (fdiv
> + (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> + FR32:$src))), (i8 1))),
> + (DIVSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> +
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fadd
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (i8 1))),
> + (ADDSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fsub
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (i8 1))),
> + (SUBSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fmul
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (i8 1))),
> + (MULSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fdiv
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (i8 1))),
> + (DIVSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
It’s hard to see what’s different across these. Is it something like:
for vt in [v4f32, v2f64, other vector VT]:
for op in [fad, fsub, fmul, other ops]:
def : Pat<>
?
Can’t we use a multiclass parameterized with the VT with the ops as the different defs in the multiclass? !foreach may be another option.
Adam
> +
> + def : Pat<(v2f64 (X86Blendi (v2f64 (scalar_to_vector (fadd
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (v2f64 VR128:$dst), (i8 2))),
> + (ADDSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 (scalar_to_vector (fsub
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (v2f64 VR128:$dst), (i8 2))),
> + (SUBSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 (scalar_to_vector (fmul
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (v2f64 VR128:$dst), (i8 2))),
> + (MULSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 (scalar_to_vector (fdiv
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (v2f64 VR128:$dst), (i8 2))),
> + (DIVSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> }
>
> let Predicates = [HasAVX] in {
> @@ -3204,6 +3254,57 @@ let Predicates = [HasAVX] in {
> (fdiv (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> FR32:$src))), (iPTR 0))),
> (VDIVSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> +
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector (fadd
> + (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> + FR32:$src))), (i8 1))),
> + (VADDSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector (fsub
> + (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> + FR32:$src))), (i8 1))),
> + (VSUBSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector (fmul
> + (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> + FR32:$src))), (i8 1))),
> + (VMULSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst), (v4f32 (scalar_to_vector (fdiv
> + (f32 (vector_extract (v4f32 VR128:$dst), (iPTR 0))),
> + FR32:$src))), (i8 1))),
> + (VDIVSSrr_Int v4f32:$dst, (COPY_TO_REGCLASS FR32:$src, VR128))>;
> +
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fadd
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (i8 1))),
> + (VADDSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fsub
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (i8 1))),
> + (VSUBSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fmul
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (i8 1))),
> + (VMULSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst), (v2f64 (scalar_to_vector (fdiv
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (i8 1))),
> + (VDIVSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> +
> + def : Pat<(v2f64 (X86Blendi (v2f64 (scalar_to_vector (fadd
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (v2f64 VR128:$dst), (i8 2))),
> + (VADDSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 (scalar_to_vector (fsub
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (v2f64 VR128:$dst), (i8 2))),
> + (VSUBSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 (scalar_to_vector (fmul
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (v2f64 VR128:$dst), (i8 2))),
> + (VMULSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 (scalar_to_vector (fdiv
> + (f64 (vector_extract (v2f64 VR128:$dst), (iPTR 0))),
> + FR64:$src))), (v2f64 VR128:$dst), (i8 2))),
> + (VDIVSDrr_Int v2f64:$dst, (COPY_TO_REGCLASS FR64:$src, VR128))>;
> }
>
> // Patterns used to select SSE scalar fp arithmetic instructions from
> @@ -3258,6 +3359,49 @@ let Predicates = [UseSSE2] in {
> (DIVSDrr_Int v2f64:$dst, v2f64:$src)>;
> }
>
> +let Predicates = [UseSSE41] in {
> + // With SSE4.1 we may see these operations using X86Blendi rather than
> + // X86Movs{s,d}.
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
> + (fadd (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
> + (ADDSSrr_Int v4f32:$dst, v4f32:$src)>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
> + (fsub (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
> + (SUBSSrr_Int v4f32:$dst, v4f32:$src)>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
> + (fmul (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
> + (MULSSrr_Int v4f32:$dst, v4f32:$src)>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
> + (fdiv (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
> + (DIVSSrr_Int v4f32:$dst, v4f32:$src)>;
> +
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
> + (fadd (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
> + (ADDSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
> + (fsub (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
> + (SUBSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
> + (fmul (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
> + (MULSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
> + (fdiv (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
> + (DIVSDrr_Int v2f64:$dst, v2f64:$src)>;
> +
> + def : Pat<(v2f64 (X86Blendi (fadd (v2f64 VR128:$dst), (v2f64 VR128:$src)),
> + (v2f64 VR128:$dst), (i8 2))),
> + (ADDSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (fsub (v2f64 VR128:$dst), (v2f64 VR128:$src)),
> + (v2f64 VR128:$dst), (i8 2))),
> + (SUBSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (fmul (v2f64 VR128:$dst), (v2f64 VR128:$src)),
> + (v2f64 VR128:$dst), (i8 2))),
> + (MULSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (fdiv (v2f64 VR128:$dst), (v2f64 VR128:$src)),
> + (v2f64 VR128:$dst), (i8 2))),
> + (DIVSDrr_Int v2f64:$dst, v2f64:$src)>;
> +}
> +
> let Predicates = [HasAVX] in {
> // The following patterns select AVX Scalar single/double precision fp
> // arithmetic instructions from a packed single precision fp instruction
> @@ -3287,6 +3431,46 @@ let Predicates = [HasAVX] in {
> def : Pat<(v2f64 (X86Movsd (v2f64 VR128:$dst),
> (fdiv (v2f64 VR128:$dst), (v2f64 VR128:$src)))),
> (VDIVSDrr_Int v2f64:$dst, v2f64:$src)>;
> +
> + // Also handle X86Blendi-based patterns.
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
> + (fadd (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
> + (VADDSSrr_Int v4f32:$dst, v4f32:$src)>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
> + (fsub (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
> + (VSUBSSrr_Int v4f32:$dst, v4f32:$src)>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
> + (fmul (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
> + (VMULSSrr_Int v4f32:$dst, v4f32:$src)>;
> + def : Pat<(v4f32 (X86Blendi (v4f32 VR128:$dst),
> + (fdiv (v4f32 VR128:$dst), (v4f32 VR128:$src)), (i8 1))),
> + (VDIVSSrr_Int v4f32:$dst, v4f32:$src)>;
> +
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
> + (fadd (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
> + (VADDSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
> + (fsub (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
> + (VSUBSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
> + (fmul (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
> + (VMULSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (v2f64 VR128:$dst),
> + (fdiv (v2f64 VR128:$dst), (v2f64 VR128:$src)), (i8 1))),
> + (VDIVSDrr_Int v2f64:$dst, v2f64:$src)>;
> +
> + def : Pat<(v2f64 (X86Blendi (fadd (v2f64 VR128:$dst), (v2f64 VR128:$src)),
> + (v2f64 VR128:$dst), (i8 2))),
> + (VADDSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (fsub (v2f64 VR128:$dst), (v2f64 VR128:$src)),
> + (v2f64 VR128:$dst), (i8 2))),
> + (VSUBSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (fmul (v2f64 VR128:$dst), (v2f64 VR128:$src)),
> + (v2f64 VR128:$dst), (i8 2))),
> + (VMULSDrr_Int v2f64:$dst, v2f64:$src)>;
> + def : Pat<(v2f64 (X86Blendi (fdiv (v2f64 VR128:$dst), (v2f64 VR128:$src)),
> + (v2f64 VR128:$dst), (i8 2))),
> + (VDIVSDrr_Int v2f64:$dst, v2f64:$src)>;
> }
>
> /// Unop Arithmetic
>
> Modified: llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll?rev=219033&r1=219032&r2=219033&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/sse-scalar-fp-arith.ll Fri Oct 3 17:43:17 2014
> @@ -1,6 +1,9 @@
> ; RUN: llc -mcpu=x86-64 -mattr=+sse2 < %s | FileCheck --check-prefix=SSE --check-prefix=SSE2 %s
> +; RUN: llc -mcpu=x86-64 -mattr=+sse2 < %s -x86-experimental-vector-shuffle-lowering | FileCheck --check-prefix=SSE --check-prefix=SSE2 %s
> ; RUN: llc -mcpu=x86-64 -mattr=+sse4.1 < %s | FileCheck --check-prefix=SSE --check-prefix=SSE41 %s
> +; RUN: llc -mcpu=x86-64 -mattr=+sse4.1 < %s -x86-experimental-vector-shuffle-lowering | FileCheck --check-prefix=SSE --check-prefix=SSE41 %s
> ; RUN: llc -mcpu=x86-64 -mattr=+avx < %s | FileCheck --check-prefix=AVX %s
> +; RUN: llc -mcpu=x86-64 -mattr=+avx < %s -x86-experimental-vector-shuffle-lowering | FileCheck --check-prefix=AVX %s
>
> target triple = "x86_64-unknown-unknown"
>
>
> Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v2.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v2.ll?rev=219033&r1=219032&r2=219033&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v2.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v2.ll Fri Oct 3 17:43:17 2014
> @@ -211,28 +211,61 @@ define <2 x double> @shuffle_v2f64_33(<2
> ret <2 x double> %shuffle
> }
> define <2 x double> @shuffle_v2f64_03(<2 x double> %a, <2 x double> %b) {
> -; SSE-LABEL: shuffle_v2f64_03:
> -; SSE: # BB#0:
> -; SSE-NEXT: movsd %xmm0, %xmm1
> -; SSE-NEXT: movaps %xmm1, %xmm0
> -; SSE-NEXT: retq
> +; SSE2-LABEL: shuffle_v2f64_03:
> +; SSE2: # BB#0:
> +; SSE2-NEXT: movsd %xmm0, %xmm1
> +; SSE2-NEXT: movaps %xmm1, %xmm0
> +; SSE2-NEXT: retq
> +;
> +; SSE3-LABEL: shuffle_v2f64_03:
> +; SSE3: # BB#0:
> +; SSE3-NEXT: movsd %xmm0, %xmm1
> +; SSE3-NEXT: movaps %xmm1, %xmm0
> +; SSE3-NEXT: retq
> +;
> +; SSSE3-LABEL: shuffle_v2f64_03:
> +; SSSE3: # BB#0:
> +; SSSE3-NEXT: movsd %xmm0, %xmm1
> +; SSSE3-NEXT: movaps %xmm1, %xmm0
> +; SSSE3-NEXT: retq
> +;
> +; SSE41-LABEL: shuffle_v2f64_03:
> +; SSE41: # BB#0:
> +; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
> +; SSE41-NEXT: retq
> ;
> ; AVX-LABEL: shuffle_v2f64_03:
> ; AVX: # BB#0:
> -; AVX-NEXT: vmovsd %xmm0, %xmm1, %xmm0
> +; AVX-NEXT: vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
> ; AVX-NEXT: retq
> %shuffle = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 0, i32 3>
> ret <2 x double> %shuffle
> }
> define <2 x double> @shuffle_v2f64_21(<2 x double> %a, <2 x double> %b) {
> -; SSE-LABEL: shuffle_v2f64_21:
> -; SSE: # BB#0:
> -; SSE-NEXT: movsd %xmm1, %xmm0
> -; SSE-NEXT: retq
> +; SSE2-LABEL: shuffle_v2f64_21:
> +; SSE2: # BB#0:
> +; SSE2-NEXT: movsd %xmm1, %xmm0
> +; SSE2-NEXT: retq
> +;
> +; SSE3-LABEL: shuffle_v2f64_21:
> +; SSE3: # BB#0:
> +; SSE3-NEXT: movsd %xmm1, %xmm0
> +; SSE3-NEXT: retq
> +;
> +; SSSE3-LABEL: shuffle_v2f64_21:
> +; SSSE3: # BB#0:
> +; SSSE3-NEXT: movsd %xmm1, %xmm0
> +; SSSE3-NEXT: retq
> +;
> +; SSE41-LABEL: shuffle_v2f64_21:
> +; SSE41: # BB#0:
> +; SSE41-NEXT: blendpd {{.*#+}} xmm1 = xmm1[0],xmm0[1]
> +; SSE41-NEXT: movapd %xmm1, %xmm0
> +; SSE41-NEXT: retq
> ;
> ; AVX-LABEL: shuffle_v2f64_21:
> ; AVX: # BB#0:
> -; AVX-NEXT: vmovsd %xmm1, %xmm0, %xmm0
> +; AVX-NEXT: vblendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
> ; AVX-NEXT: retq
> %shuffle = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 2, i32 1>
> ret <2 x double> %shuffle
> @@ -753,16 +786,35 @@ define <2 x double> @shuffle_v2f64_z0(<2
> }
>
> define <2 x double> @shuffle_v2f64_z1(<2 x double> %a) {
> -; SSE-LABEL: shuffle_v2f64_z1:
> -; SSE: # BB#0:
> -; SSE-NEXT: xorps %xmm1, %xmm1
> -; SSE-NEXT: movsd %xmm1, %xmm0
> -; SSE-NEXT: retq
> +; SSE2-LABEL: shuffle_v2f64_z1:
> +; SSE2: # BB#0:
> +; SSE2-NEXT: xorps %xmm1, %xmm1
> +; SSE2-NEXT: movsd %xmm1, %xmm0
> +; SSE2-NEXT: retq
> +;
> +; SSE3-LABEL: shuffle_v2f64_z1:
> +; SSE3: # BB#0:
> +; SSE3-NEXT: xorps %xmm1, %xmm1
> +; SSE3-NEXT: movsd %xmm1, %xmm0
> +; SSE3-NEXT: retq
> +;
> +; SSSE3-LABEL: shuffle_v2f64_z1:
> +; SSSE3: # BB#0:
> +; SSSE3-NEXT: xorps %xmm1, %xmm1
> +; SSSE3-NEXT: movsd %xmm1, %xmm0
> +; SSSE3-NEXT: retq
> +;
> +; SSE41-LABEL: shuffle_v2f64_z1:
> +; SSE41: # BB#0:
> +; SSE41-NEXT: xorpd %xmm1, %xmm1
> +; SSE41-NEXT: blendpd {{.*#+}} xmm1 = xmm1[0],xmm0[1]
> +; SSE41-NEXT: movapd %xmm1, %xmm0
> +; SSE41-NEXT: retq
> ;
> ; AVX-LABEL: shuffle_v2f64_z1:
> ; AVX: # BB#0:
> -; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1
> -; AVX-NEXT: vmovsd %xmm1, %xmm0, %xmm0
> +; AVX-NEXT: vxorpd %xmm1, %xmm1, %xmm1
> +; AVX-NEXT: vblendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
> ; AVX-NEXT: retq
> %shuffle = shufflevector <2 x double> %a, <2 x double> zeroinitializer, <2 x i32> <i32 2, i32 1>
> ret <2 x double> %shuffle
>
> Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v4.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v4.ll?rev=219033&r1=219032&r2=219033&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v4.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v4.ll Fri Oct 3 17:43:17 2014
> @@ -55,7 +55,7 @@ define <4 x double> @shuffle_v4f64_0300(
> ; AVX1: # BB#0:
> ; AVX1-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[2,3,0,1]
> ; AVX1-NEXT: vpermilpd {{.*#+}} ymm1 = ymm1[0,1,2,2]
> -; AVX1-NEXT: vmovsd %xmm0, %xmm1, %xmm0
> +; AVX1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
> ; AVX1-NEXT: retq
> ;
> ; AVX2-LABEL: shuffle_v4f64_0300:
> @@ -382,7 +382,7 @@ define <4 x i64> @shuffle_v4i64_0300(<4
> ; AVX1: # BB#0:
> ; AVX1-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[2,3,0,1]
> ; AVX1-NEXT: vpermilpd {{.*#+}} ymm1 = ymm1[0,1,2,2]
> -; AVX1-NEXT: vmovsd %xmm0, %xmm1, %xmm0
> +; AVX1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
> ; AVX1-NEXT: retq
> ;
> ; AVX2-LABEL: shuffle_v4i64_0300:
> @@ -518,7 +518,7 @@ define <4 x i64> @shuffle_v4i64_4012(<4
> ; AVX1-NEXT: vshufpd {{.*#+}} xmm2 = xmm0[1],xmm2[0]
> ; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0,0]
> ; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
> -; AVX1-NEXT: vmovsd %xmm1, %xmm0, %xmm0
> +; AVX1-NEXT: vblendpd {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3]
> ; AVX1-NEXT: retq
> ;
> ; AVX2-LABEL: shuffle_v4i64_4012:
> @@ -654,7 +654,7 @@ define <4 x i64> @stress_test1(<4 x i64>
> ; AVX1-NEXT: vpermilpd {{.*#+}} xmm2 = xmm1[1,0]
> ; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
> ; AVX1-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,2]
> -; AVX1-NEXT: vmovsd %xmm0, %xmm1, %xmm0
> +; AVX1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
> ; AVX1-NEXT: retq
> ;
> ; AVX2-LABEL: stress_test1:
>
> Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll?rev=219033&r1=219032&r2=219033&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll Fri Oct 3 17:43:17 2014
> @@ -91,7 +91,7 @@ define <8 x double> @shuffle_v8f64_70000
> ; ALL-NEXT: vextractf64x4 $1, %zmm0, %ymm1
> ; ALL-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[3,1,2,3]
> ; ALL-NEXT: vbroadcastsd %xmm0, %ymm0
> -; ALL-NEXT: vmovsd %xmm1, %xmm0, %xmm1
> +; ALL-NEXT: vblendpd {{.*#+}} ymm1 = ymm1[0],ymm0[1,2,3]
> ; ALL-NEXT: vinsertf64x4 $1, %ymm0, %zmm1, %zmm0
> ; ALL-NEXT: retq
> %shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
> @@ -275,12 +275,12 @@ define <8 x double> @shuffle_v8f64_08192
> define <8 x double> @shuffle_v8f64_08991abb(<8 x double> %a, <8 x double> %b) {
> ; ALL-LABEL: shuffle_v8f64_08991abb:
> ; ALL: # BB#0:
> -; ALL-NEXT: vpermpd {{.*#+}} ymm2 = ymm1[0,0,1,1]
> -; ALL-NEXT: vmovsd %xmm0, %xmm2, %xmm2
> -; ALL-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,2]
> -; ALL-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,2,3,3]
> -; ALL-NEXT: vmovsd %xmm0, %xmm1, %xmm0
> -; ALL-NEXT: vinsertf64x4 $1, %ymm0, %zmm2, %zmm0
> +; ALL-NEXT: vpermilpd {{.*#+}} ymm2 = ymm0[1,0,2,2]
> +; ALL-NEXT: vpermpd {{.*#+}} ymm3 = ymm1[0,2,3,3]
> +; ALL-NEXT: vblendpd {{.*#+}} ymm2 = ymm2[0],ymm3[1,2,3]
> +; ALL-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[0,0,1,1]
> +; ALL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
> +; ALL-NEXT: vinsertf64x4 $1, %ymm2, %zmm0, %zmm0
> ; ALL-NEXT: retq
> %shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 0, i32 8, i32 9, i32 9, i32 1, i32 10, i32 11, i32 11>
> ret <8 x double> %shuffle
> @@ -303,11 +303,11 @@ define <8 x double> @shuffle_v8f64_091b2
> define <8 x double> @shuffle_v8f64_09ab1def(<8 x double> %a, <8 x double> %b) {
> ; ALL-LABEL: shuffle_v8f64_09ab1def:
> ; ALL: # BB#0:
> -; ALL-NEXT: vmovsd %xmm0, %xmm1, %xmm2
> -; ALL-NEXT: vextractf64x4 $1, %zmm1, %ymm1
> -; ALL-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,2]
> -; ALL-NEXT: vmovsd %xmm0, %xmm1, %xmm0
> -; ALL-NEXT: vinsertf64x4 $1, %ymm0, %zmm2, %zmm0
> +; ALL-NEXT: vextractf64x4 $1, %zmm1, %ymm2
> +; ALL-NEXT: vpermilpd {{.*#+}} ymm3 = ymm0[1,0,2,2]
> +; ALL-NEXT: vblendpd {{.*#+}} ymm2 = ymm3[0],ymm2[1,2,3]
> +; ALL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
> +; ALL-NEXT: vinsertf64x4 $1, %ymm2, %zmm0, %zmm0
> ; ALL-NEXT: retq
> %shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 0, i32 9, i32 10, i32 11, i32 1, i32 13, i32 14, i32 15>
> ret <8 x double> %shuffle
> @@ -721,7 +721,7 @@ define <8 x double> @shuffle_v8f64_f5112
> ; ALL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm2[1],ymm0[2,3]
> ; ALL-NEXT: vextractf64x4 $1, %zmm1, %ymm1
> ; ALL-NEXT: vpermpd {{.*#+}} ymm1 = ymm1[3,1,2,3]
> -; ALL-NEXT: vmovsd %xmm1, %xmm0, %xmm0
> +; ALL-NEXT: vblendpd {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3]
> ; ALL-NEXT: vinsertf64x4 $1, %ymm3, %zmm0, %zmm0
> ; ALL-NEXT: retq
> %shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 15, i32 5, i32 1, i32 1, i32 2, i32 3, i32 5, i32 10>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list