[llvm] 1fed131 - [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE
Eric Christopher via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 6 16:58:50 PDT 2020
Hi Nemanja!
Running into a compiler crash with this building skia (https://skia.org/)
for power after this patch. I'll see what I can do to get a testcase (if it
doesn't reproduce for you), but would you mind terribly reverting in the
meantime?
Thanks!
-eric
On Thu, Jun 18, 2020 at 7:55 PM Nemanja Ivanovic via llvm-commits <
llvm-commits at lists.llvm.org> wrote:
>
> Author: Nemanja Ivanovic
> Date: 2020-06-18T21:54:22-05:00
> New Revision: 1fed131660b2c5d3ea7007e273a7a5da80699445
>
> URL:
> https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445
> DIFF:
> https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445.diff
>
> LOG: [PowerPC] Canonicalize shuffles to match more single-instruction
> masks on LE
>
> We currently miss a number of opportunities to emit single-instruction
> VMRG[LH][BHW] instructions for shuffles on little endian subtargets.
> Although
> this in itself is not a huge performance opportunity since loading the
> permute
> vector for a VPERM can always be pulled out of loops, producing such merge
> instructions is useful to downstream optimizations.
> Since VPERM is essentially opaque to all subsequent optimizations, we want
> to
> avoid it as much as possible. Other permute instructions have semantics
> that can
> be reasoned about much more easily in later optimizations.
>
> This patch does the following:
> - Canonicalize shuffles so that the first element comes from the first
> vector
> (since that's what most of the mask matching functions want)
> - Switch the elements that come from splat vectors so that they match the
> corresponding elements from the other vector (to allow for merges)
> - Adds debugging messages for when a shuffle is matched to a VPERM so that
> anyone interested in improving this further can get the info for their
> code
>
> Differential revision: https://reviews.llvm.org/D77448
>
> Added:
>
>
> Modified:
> llvm/lib/Target/PowerPC/PPCISelLowering.cpp
> llvm/lib/Target/PowerPC/PPCISelLowering.h
> llvm/lib/Target/PowerPC/PPCInstrVSX.td
> llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
> llvm/test/CodeGen/PowerPC/build-vector-tests.ll
> llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
> llvm/test/CodeGen/PowerPC/fp-strict-round.ll
> llvm/test/CodeGen/PowerPC/load-and-splat.ll
> llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
> llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
> llvm/test/CodeGen/PowerPC/pr25080.ll
> llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
> llvm/test/CodeGen/PowerPC/pr38087.ll
> llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
> llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
> llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
> llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
> llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
> llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
> llvm/test/CodeGen/PowerPC/swaps-le-5.ll
> llvm/test/CodeGen/PowerPC/swaps-le-6.ll
> llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
> llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
> llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
> llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
> llvm/test/CodeGen/PowerPC/vsx.ll
> llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>
> Removed:
>
>
>
>
> ################################################################################
> diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
> b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
> index d7698a5ec962..28bd80610c84 100644
> --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
> +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
> @@ -125,6 +125,7 @@ cl::desc("use absolute jump tables on ppc"),
> cl::Hidden);
>
> STATISTIC(NumTailCalls, "Number of tail calls");
> STATISTIC(NumSiblingCalls, "Number of sibling calls");
> +STATISTIC(ShufflesHandledWithVPERM, "Number of shuffles lowered to a
> VPERM");
>
> static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int);
>
> @@ -1505,6 +1506,8 @@ const char
> *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
> case PPCISD::MTVSRZ: return "PPCISD::MTVSRZ";
> case PPCISD::SINT_VEC_TO_FP: return "PPCISD::SINT_VEC_TO_FP";
> case PPCISD::UINT_VEC_TO_FP: return "PPCISD::UINT_VEC_TO_FP";
> + case PPCISD::SCALAR_TO_VECTOR_PERMUTED:
> + return "PPCISD::SCALAR_TO_VECTOR_PERMUTED";
> case PPCISD::ANDI_rec_1_EQ_BIT:
> return "PPCISD::ANDI_rec_1_EQ_BIT";
> case PPCISD::ANDI_rec_1_GT_BIT:
> @@ -2716,7 +2719,8 @@ static bool usePartialVectorLoads(SDNode *N, const
> PPCSubtarget& ST) {
> for (SDNode::use_iterator UI = LD->use_begin(), UE = LD->use_end();
> UI != UE; ++UI)
> if (UI.getUse().get().getResNo() == 0 &&
> - UI->getOpcode() != ISD::SCALAR_TO_VECTOR)
> + UI->getOpcode() != ISD::SCALAR_TO_VECTOR &&
> + UI->getOpcode() != PPCISD::SCALAR_TO_VECTOR_PERMUTED)
> return false;
>
> return true;
> @@ -9041,7 +9045,8 @@ static const SDValue *getNormalLoadInput(const
> SDValue &Op) {
> const SDValue *InputLoad = &Op;
> if (InputLoad->getOpcode() == ISD::BITCAST)
> InputLoad = &InputLoad->getOperand(0);
> - if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR)
> + if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR ||
> + InputLoad->getOpcode() == PPCISD::SCALAR_TO_VECTOR_PERMUTED)
> InputLoad = &InputLoad->getOperand(0);
> if (InputLoad->getOpcode() != ISD::LOAD)
> return nullptr;
> @@ -9690,6 +9695,15 @@ SDValue
> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
> SDValue V1 = Op.getOperand(0);
> SDValue V2 = Op.getOperand(1);
> ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);
> +
> + // Any nodes that were combined in the target-independent combiner prior
> + // to vector legalization will not be sent to the target combine. Try to
> + // combine it here.
> + if (SDValue NewShuffle = combineVectorShuffle(SVOp, DAG)) {
> + DAG.ReplaceAllUsesOfValueWith(Op, NewShuffle);
> + Op = NewShuffle;
> + SVOp = cast<ShuffleVectorSDNode>(Op);
> + }
> EVT VT = Op.getValueType();
> bool isLittleEndian = Subtarget.isLittleEndian();
>
> @@ -9715,6 +9729,11 @@ SDValue
> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
> Offset = isLittleEndian ? (3 - SplatIdx) * 4 : SplatIdx * 4;
> else
> Offset = isLittleEndian ? (1 - SplatIdx) * 8 : SplatIdx * 8;
> +
> + // If we are loading a partial vector, it does not make sense to
> adjust
> + // the base pointer. This happens with (splat (s_to_v_permuted
> (ld))).
> + if (LD->getMemoryVT().getSizeInBits() == (IsFourByte ? 32 : 64))
> + Offset = 0;
> SDValue BasePtr = LD->getBasePtr();
> if (Offset != 0)
> BasePtr = DAG.getNode(ISD::ADD, dl,
> getPointerTy(DAG.getDataLayout()),
> @@ -9988,7 +10007,13 @@ SDValue
> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
> MVT::i32));
> }
>
> + ShufflesHandledWithVPERM++;
> SDValue VPermMask = DAG.getBuildVector(MVT::v16i8, dl, ResultMask);
> + LLVM_DEBUG(dbgs() << "Emitting a VPERM for the following shuffle:\n");
> + LLVM_DEBUG(SVOp->dump());
> + LLVM_DEBUG(dbgs() << "With the following permute control vector:\n");
> + LLVM_DEBUG(VPermMask.dump());
> +
> if (isLittleEndian)
> return DAG.getNode(PPCISD::VPERM, dl, V1.getValueType(),
> V2, V1, VPermMask);
> @@ -14114,6 +14139,199 @@ SDValue
> PPCTargetLowering::combineStoreFPToInt(SDNode *N,
> return Val;
> }
>
> +static bool isAlternatingShuffMask(const ArrayRef<int> &Mask, int
> NumElts) {
> + // Check that the source of the element keeps flipping
> + // (i.e. Mask[i] < NumElts -> Mask[i+i] >= NumElts).
> + bool PrevElemFromFirstVec = Mask[0] < NumElts;
> + for (int i = 1, e = Mask.size(); i < e; i++) {
> + if (PrevElemFromFirstVec && Mask[i] < NumElts)
> + return false;
> + if (!PrevElemFromFirstVec && Mask[i] >= NumElts)
> + return false;
> + PrevElemFromFirstVec = !PrevElemFromFirstVec;
> + }
> + return true;
> +}
> +
> +static bool isSplatBV(SDValue Op) {
> + if (Op.getOpcode() != ISD::BUILD_VECTOR)
> + return false;
> + SDValue FirstOp;
> +
> + // Find first non-undef input.
> + for (int i = 0, e = Op.getNumOperands(); i < e; i++) {
> + FirstOp = Op.getOperand(i);
> + if (!FirstOp.isUndef())
> + break;
> + }
> +
> + // All inputs are undef or the same as the first non-undef input.
> + for (int i = 1, e = Op.getNumOperands(); i < e; i++)
> + if (Op.getOperand(i) != FirstOp && !Op.getOperand(i).isUndef())
> + return false;
> + return true;
> +}
> +
> +static SDValue isScalarToVec(SDValue Op) {
> + if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR)
> + return Op;
> + if (Op.getOpcode() != ISD::BITCAST)
> + return SDValue();
> + Op = Op.getOperand(0);
> + if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR)
> + return Op;
> + return SDValue();
> +}
> +
> +static void fixupShuffleMaskForPermutedSToV(SmallVectorImpl<int> &ShuffV,
> + int LHSMaxIdx, int RHSMinIdx,
> + int RHSMaxIdx, int HalfVec) {
> + for (int i = 0, e = ShuffV.size(); i < e; i++) {
> + int Idx = ShuffV[i];
> + if ((Idx >= 0 && Idx < LHSMaxIdx) || (Idx >= RHSMinIdx && Idx <
> RHSMaxIdx))
> + ShuffV[i] += HalfVec;
> + }
> + return;
> +}
> +
> +// Replace a SCALAR_TO_VECTOR with a SCALAR_TO_VECTOR_PERMUTED except if
> +// the original is:
> +// (<n x Ty> (scalar_to_vector (Ty (extract_elt <n x Ty> %a, C))))
> +// In such a case, just change the shuffle mask to extract the element
> +// from the permuted index.
> +static SDValue getSToVPermuted(SDValue OrigSToV, SelectionDAG &DAG) {
> + SDLoc dl(OrigSToV);
> + EVT VT = OrigSToV.getValueType();
> + assert(OrigSToV.getOpcode() == ISD::SCALAR_TO_VECTOR &&
> + "Expecting a SCALAR_TO_VECTOR here");
> + SDValue Input = OrigSToV.getOperand(0);
> +
> + if (Input.getOpcode() == ISD::EXTRACT_VECTOR_ELT) {
> + ConstantSDNode *Idx = dyn_cast<ConstantSDNode>(Input.getOperand(1));
> + SDValue OrigVector = Input.getOperand(0);
> +
> + // Can't handle non-const element indices or
> diff erent vector types
> + // for the input to the extract and the output of the
> scalar_to_vector.
> + if (Idx && VT == OrigVector.getValueType()) {
> + SmallVector<int, 16> NewMask(VT.getVectorNumElements(), -1);
> + NewMask[VT.getVectorNumElements() / 2] = Idx->getZExtValue();
> + return DAG.getVectorShuffle(VT, dl, OrigVector, OrigVector,
> NewMask);
> + }
> + }
> + return DAG.getNode(PPCISD::SCALAR_TO_VECTOR_PERMUTED, dl, VT,
> + OrigSToV.getOperand(0));
> +}
> +
> +// On little endian subtargets, combine shuffles such as:
> +// vector_shuffle<16,1,17,3,18,5,19,7,20,9,21,11,22,13,23,15>, <zero>, %b
> +// into:
> +// vector_shuffle<16,0,17,1,18,2,19,3,20,4,21,5,22,6,23,7>, <zero>, %b
> +// because the latter can be matched to a single instruction merge.
> +// Furthermore, SCALAR_TO_VECTOR on little endian always involves a
> permute
> +// to put the value into element zero. Adjust the shuffle mask so that the
> +// vector can remain in permuted form (to prevent a swap prior to a
> shuffle).
> +SDValue PPCTargetLowering::combineVectorShuffle(ShuffleVectorSDNode *SVN,
> + SelectionDAG &DAG) const {
> + SDValue LHS = SVN->getOperand(0);
> + SDValue RHS = SVN->getOperand(1);
> + auto Mask = SVN->getMask();
> + int NumElts = LHS.getValueType().getVectorNumElements();
> + SDValue Res(SVN, 0);
> + SDLoc dl(SVN);
> +
> + // None of these combines are useful on big endian systems since the ISA
> + // already has a big endian bias.
> + if (!Subtarget.isLittleEndian())
> + return Res;
> +
> + // If this is not a shuffle of a shuffle and the first element comes
> from
> + // the second vector, canonicalize to the commuted form. This will make
> it
> + // more likely to match one of the single instruction patterns.
> + if (Mask[0] >= NumElts && LHS.getOpcode() != ISD::VECTOR_SHUFFLE &&
> + RHS.getOpcode() != ISD::VECTOR_SHUFFLE) {
> + std::swap(LHS, RHS);
> + Res = DAG.getCommutedVectorShuffle(*SVN);
> + Mask = cast<ShuffleVectorSDNode>(Res)->getMask();
> + }
> +
> + // Adjust the shuffle mask if either input vector comes from a
> + // SCALAR_TO_VECTOR and keep the respective input vector in permuted
> + // form (to prevent the need for a swap).
> + SmallVector<int, 16> ShuffV(Mask.begin(), Mask.end());
> + SDValue SToVLHS = isScalarToVec(LHS);
> + SDValue SToVRHS = isScalarToVec(RHS);
> + if (SToVLHS || SToVRHS) {
> + int NumEltsIn = SToVLHS ?
> SToVLHS.getValueType().getVectorNumElements()
> + :
> SToVRHS.getValueType().getVectorNumElements();
> + int NumEltsOut = ShuffV.size();
> +
> + // Initially assume that neither input is permuted. These will be
> adjusted
> + // accordingly if either input is.
> + int LHSMaxIdx = -1;
> + int RHSMinIdx = -1;
> + int RHSMaxIdx = -1;
> + int HalfVec = LHS.getValueType().getVectorNumElements() / 2;
> +
> + // Get the permuted scalar to vector nodes for the source(s) that
> come from
> + // ISD::SCALAR_TO_VECTOR.
> + if (SToVLHS) {
> + // Set up the values for the shuffle vector fixup.
> + LHSMaxIdx = NumEltsOut / NumEltsIn;
> + SToVLHS = getSToVPermuted(SToVLHS, DAG);
> + if (SToVLHS.getValueType() != LHS.getValueType())
> + SToVLHS = DAG.getBitcast(LHS.getValueType(), SToVLHS);
> + LHS = SToVLHS;
> + }
> + if (SToVRHS) {
> + RHSMinIdx = NumEltsOut;
> + RHSMaxIdx = NumEltsOut / NumEltsIn + RHSMinIdx;
> + SToVRHS = getSToVPermuted(SToVRHS, DAG);
> + if (SToVRHS.getValueType() != RHS.getValueType())
> + SToVRHS = DAG.getBitcast(RHS.getValueType(), SToVRHS);
> + RHS = SToVRHS;
> + }
> +
> + // Fix up the shuffle mask to reflect where the desired element
> actually is.
> + // The minimum and maximum indices that correspond to element zero
> for both
> + // the LHS and RHS are computed and will control which shuffle mask
> entries
> + // are to be changed. For example, if the RHS is permuted, any
> shuffle mask
> + // entries in the range [RHSMinIdx,RHSMaxIdx) will be incremented by
> + // HalfVec to refer to the corresponding element in the permuted
> vector.
> + fixupShuffleMaskForPermutedSToV(ShuffV, LHSMaxIdx, RHSMinIdx,
> RHSMaxIdx,
> + HalfVec);
> + Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS,
> ShuffV);
> +
> + // We may have simplified away the shuffle. We won't be able to do
> anything
> + // further with it here.
> + if (!isa<ShuffleVectorSDNode>(Res))
> + return Res;
> + Mask = cast<ShuffleVectorSDNode>(Res)->getMask();
> + }
> +
> + // The common case after we commuted the shuffle is that the RHS is a
> splat
> + // and we have elements coming in from the splat at indices that are not
> + // conducive to using a merge.
> + // Example:
> + // vector_shuffle<0,17,1,19,2,21,3,23,4,25,5,27,6,29,7,31> t1, <zero>
> + if (!isSplatBV(RHS))
> + return Res;
> +
> + // We are looking for a mask such that all even elements are from
> + // one vector and all odd elements from the other.
> + if (!isAlternatingShuffMask(Mask, NumElts))
> + return Res;
> +
> + // Adjust the mask so we are pulling in the same index from the splat
> + // as the index from the interesting vector in consecutive elements.
> + // Example:
> + // vector_shuffle<0,16,1,17,2,18,3,19,4,20,5,21,6,22,7,23> t1, <zero>
> + for (int i = 1, e = Mask.size(); i < e; i += 2)
> + ShuffV[i] = (ShuffV[i - 1] + NumElts);
> +
> + Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, ShuffV);
> + return Res;
> +}
> +
> SDValue PPCTargetLowering::combineVReverseMemOP(ShuffleVectorSDNode *SVN,
> LSBaseSDNode *LSBase,
> DAGCombinerInfo &DCI)
> const {
> @@ -14223,7 +14441,7 @@ SDValue
> PPCTargetLowering::PerformDAGCombine(SDNode *N,
> LSBaseSDNode* LSBase = cast<LSBaseSDNode>(N->getOperand(0));
> return combineVReverseMemOP(cast<ShuffleVectorSDNode>(N), LSBase,
> DCI);
> }
> - break;
> + return combineVectorShuffle(cast<ShuffleVectorSDNode>(N), DCI.DAG);
> case ISD::STORE: {
>
> EVT Op1VT = N->getOperand(1).getValueType();
>
> diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h
> b/llvm/lib/Target/PowerPC/PPCISelLowering.h
> index 77252e919553..9f7c6ab53a17 100644
> --- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
> +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
> @@ -221,6 +221,14 @@ namespace llvm {
> /// As with SINT_VEC_TO_FP, used for converting illegal types.
> UINT_VEC_TO_FP,
>
> + /// PowerPC instructions that have SCALAR_TO_VECTOR semantics tend to
> + /// place the value into the least significant element of the most
> + /// significant doubleword in the vector. This is not element zero for
> + /// anything smaller than a doubleword on either endianness. This
> node has
> + /// the same semantics as SCALAR_TO_VECTOR except that the value
> remains in
> + /// the aforementioned location in the vector register.
> + SCALAR_TO_VECTOR_PERMUTED,
> +
> // FIXME: Remove these once the ANDI glue bug is fixed:
> /// i1 = ANDI_rec_1_[EQ|GT]_BIT(i32 or i64 x) - Represents the result
> of the
> /// eq or gt bit of CR0 after executing andi. x, 1. This is used to
> @@ -1215,6 +1223,8 @@ namespace llvm {
> SDValue combineSetCC(SDNode *N, DAGCombinerInfo &DCI) const;
> SDValue combineABS(SDNode *N, DAGCombinerInfo &DCI) const;
> SDValue combineVSelect(SDNode *N, DAGCombinerInfo &DCI) const;
> + SDValue combineVectorShuffle(ShuffleVectorSDNode *SVN,
> + SelectionDAG &DAG) const;
> SDValue combineVReverseMemOP(ShuffleVectorSDNode *SVN, LSBaseSDNode
> *LSBase,
> DAGCombinerInfo &DCI) const;
>
>
> diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
> b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
> index e7ec1808ec3b..c43b2716cb37 100644
> --- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
> +++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
> @@ -138,6 +138,8 @@ def PPCldvsxlh : SDNode<"PPCISD::LD_VSX_LH",
> SDT_PPCldvsxlh,
> [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
> def PPCldsplat : SDNode<"PPCISD::LD_SPLAT", SDT_PPCldsplat,
> [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
> +def PPCSToV : SDNode<"PPCISD::SCALAR_TO_VECTOR_PERMUTED",
> + SDTypeProfile<1, 1, []>, []>;
>
> //-------------------------- Predicate definitions
> ---------------------------//
> def HasVSX : Predicate<"PPCSubTarget->hasVSX()">;
> @@ -288,6 +290,11 @@ class X_XS6_RA5_RB5<bits<6> opcode, bits<10> xo,
> string opc,
> } // Predicates = HasP9Vector
> } // AddedComplexity = 400, hasSideEffects = 0
>
> +multiclass ScalToVecWPermute<ValueType Ty, dag In, dag NonPermOut, dag
> PermOut> {
> + def : Pat<(Ty (scalar_to_vector In)), (Ty NonPermOut)>;
> + def : Pat<(Ty (PPCSToV In)), (Ty PermOut)>;
> +}
> +
> //-------------------------- Instruction definitions
> -------------------------//
> // VSX instructions require the VSX feature, they are to be selected over
> // equivalent Altivec patterns (as they address a larger register set) and
> @@ -2710,12 +2717,14 @@ def : Pat<(v2i64 (build_vector DblToLong.A,
> DblToLong.A)),
> def : Pat<(v2i64 (build_vector DblToULong.A, DblToULong.A)),
> (v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC),
> (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC), 0))>;
> -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
> - (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC),
> 1))>;
> -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
> - (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC),
> 1))>;
> +defm : ScalToVecWPermute<
> + v4i32, FltToIntLoad.A,
> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC),
> 1),
> + (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC)>;
> +defm : ScalToVecWPermute<
> + v4i32, FltToUIntLoad.A,
> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC),
> 1),
> + (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC)>;
> def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)),
> (v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>;
> def : Pat<(v2f64 (PPCldsplat xoaddr:$A)),
> @@ -2730,10 +2739,12 @@ def : Pat<(v2i64 (build_vector FltToLong.A,
> FltToLong.A)),
> def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)),
> (v2i64 (XXPERMDIs
> (COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>;
> -def : Pat<(v2i64 (scalar_to_vector DblToLongLoad.A)),
> - (v2i64 (XVCVDPSXDS (LXVDSX xoaddr:$A)))>;
> -def : Pat<(v2i64 (scalar_to_vector DblToULongLoad.A)),
> - (v2i64 (XVCVDPUXDS (LXVDSX xoaddr:$A)))>;
> +defm : ScalToVecWPermute<
> + v2i64, DblToLongLoad.A,
> + (XVCVDPSXDS (LXVDSX xoaddr:$A)), (XVCVDPSXDS (LXVDSX xoaddr:$A))>;
> +defm : ScalToVecWPermute<
> + v2i64, DblToULongLoad.A,
> + (XVCVDPUXDS (LXVDSX xoaddr:$A)), (XVCVDPUXDS (LXVDSX xoaddr:$A))>;
> } // HasVSX
>
> // Any big endian VSX subtarget.
> @@ -2831,9 +2842,10 @@ def : Pat<WToDPExtractConv.BV13U,
>
> // Any little endian VSX subtarget.
> let Predicates = [HasVSX, IsLittleEndian] in {
> -def : Pat<(v2f64 (scalar_to_vector f64:$A)),
> - (v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),
> - (SUBREG_TO_REG (i64 1), $A, sub_64), 0))>;
> +defm : ScalToVecWPermute<v2f64, (f64 f64:$A),
> + (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),
> + (SUBREG_TO_REG (i64 1), $A, sub_64),
> 0),
> + (SUBREG_TO_REG (i64 1), $A, sub_64)>;
>
> def : Pat<(f64 (extractelt v2f64:$S, 0)),
> (f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;
> @@ -2943,18 +2955,24 @@ def : Pat<(PPCstore_scal_int_from_vsr
> (STXSDX (XSCVDPUXDS f64:$src), xoaddr:$dst)>;
>
> // Load-and-splat with fp-to-int conversion (using X-Form VSX/FP loads).
> -def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)),
> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
> - (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC),
> 1))>;
> -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)),
> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
> - (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC),
> 1))>;
> -def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)),
> - (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
> - (XFLOADf32 xoaddr:$A), VSFRC)),
> 0))>;
> -def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)),
> - (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
> - (XFLOADf32 xoaddr:$A), VSFRC)),
> 0))>;
> +defm : ScalToVecWPermute<
> + v4i32, DblToIntLoad.A,
> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC),
> 1),
> + (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC)>;
> +defm : ScalToVecWPermute<
> + v4i32, DblToUIntLoad.A,
> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC),
> 1),
> + (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC)>;
> +defm : ScalToVecWPermute<
> + v2i64, FltToLongLoad.A,
> + (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A),
> VSFRC)), 0),
> + (SUBREG_TO_REG (i64 1), (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32
> xoaddr:$A),
> + VSFRC)), sub_64)>;
> +defm : ScalToVecWPermute<
> + v2i64, FltToULongLoad.A,
> + (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A),
> VSFRC)), 0),
> + (SUBREG_TO_REG (i64 1), (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32
> xoaddr:$A),
> + VSFRC)), sub_64)>;
> } // HasVSX, NoP9Vector
>
> // Any VSX subtarget that only has loads and stores that load in big
> endian
> @@ -3156,8 +3174,12 @@ def : Pat<DWToSPExtractConv.El1US1,
> (f64 (COPY_TO_REGCLASS $S1, VSRC)), VSFRC)))>;
>
> // v4f32 scalar <-> vector conversions (LE)
> -def : Pat<(v4f32 (scalar_to_vector f32:$A)),
> - (v4f32 (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1))>;
> + // The permuted version is no better than the version that puts the
> value
> + // into the right element because XSCVDPSPN is
> diff erent from all the other
> + // instructions used for PPCSToV.
> + defm : ScalToVecWPermute<v4f32, (f32 f32:$A),
> + (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1),
> + (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 3)>;
> def : Pat<(f32 (vector_extract v4f32:$S, 0)),
> (f32 (XSCVSPDPN (XXSLDWI $S, $S, 3)))>;
> def : Pat<(f32 (vector_extract v4f32:$S, 1)),
> @@ -3189,18 +3211,25 @@ def : Pat<(f64 (PPCfcfid (f64 (PPCmtvsra (i32
> (extractelt v4i32:$A, 3)))))),
> // LIWAX - This instruction is used for sign extending i32 -> i64.
> // LIWZX - This instruction will be emitted for i32, f32, and when
> // zero-extending i32 to i64 (zext i32 -> i64).
> -def : Pat<(v2i64 (scalar_to_vector (i64 (sextloadi32 xoaddr:$src)))),
> - (v2i64 (XXPERMDIs
> - (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2))>;
> -def : Pat<(v2i64 (scalar_to_vector (i64 (zextloadi32 xoaddr:$src)))),
> - (v2i64 (XXPERMDIs
> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
> -def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),
> - (v4i32 (XXPERMDIs
> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
> -def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))),
> - (v4f32 (XXPERMDIs
> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
> +defm : ScalToVecWPermute<
> + v2i64, (i64 (sextloadi32 xoaddr:$src)),
> + (XXPERMDIs (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2),
> + (SUBREG_TO_REG (i64 1), (LIWAX xoaddr:$src), sub_64)>;
> +
> +defm : ScalToVecWPermute<
> + v2i64, (i64 (zextloadi32 xoaddr:$src)),
> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
> +
> +defm : ScalToVecWPermute<
> + v4i32, (i32 (load xoaddr:$src)),
> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
> +
> +defm : ScalToVecWPermute<
> + v4f32, (f32 (load xoaddr:$src)),
> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
>
> def : Pat<DWToSPExtractConv.BVU,
> (v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3),
> @@ -3336,14 +3365,17 @@ def : Pat<(i64 (vector_extract v2i64:$S,
> i64:$Idx)),
> // Little endian VSX subtarget with direct moves.
> let Predicates = [HasVSX, HasDirectMove, IsLittleEndian] in {
> // v16i8 scalar <-> vector conversions (LE)
> - def : Pat<(v16i8 (scalar_to_vector i32:$A)),
> - (v16i8 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;
> - def : Pat<(v8i16 (scalar_to_vector i32:$A)),
> - (v8i16 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;
> - def : Pat<(v4i32 (scalar_to_vector i32:$A)),
> - (v4i32 MovesToVSR.LE_WORD_0)>;
> - def : Pat<(v2i64 (scalar_to_vector i64:$A)),
> - (v2i64 MovesToVSR.LE_DWORD_0)>;
> + defm : ScalToVecWPermute<v16i8, (i32 i32:$A),
> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC),
> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1, VSRC)>;
> + defm : ScalToVecWPermute<v8i16, (i32 i32:$A),
> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC),
> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1, VSRC)>;
> + defm : ScalToVecWPermute<v4i32, (i32 i32:$A), MovesToVSR.LE_WORD_0,
> + (SUBREG_TO_REG (i64 1), (MTVSRWZ $A), sub_64)>;
> + defm : ScalToVecWPermute<v2i64, (i64 i64:$A), MovesToVSR.LE_DWORD_0,
> + MovesToVSR.LE_DWORD_1>;
> +
> // v2i64 scalar <-> vector conversions (LE)
> def : Pat<(i64 (vector_extract v2i64:$S, 0)),
> (i64 VectorExtractions.LE_DWORD_0)>;
> @@ -3641,30 +3673,41 @@ def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS,
> xoaddr:$dst),
> (STXVX $rS, xoaddr:$dst)>;
>
> // Build vectors from i8 loads
> -def : Pat<(v16i8 (scalar_to_vector ScalarLoads.Li8)),
> - (v16i8 (VSPLTBs 7, (LXSIBZX xoaddr:$src)))>;
> -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.ZELi8)),
> - (v8i16 (VSPLTHs 3, (LXSIBZX xoaddr:$src)))>;
> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi8)),
> - (v4i32 (XXSPLTWs (LXSIBZX xoaddr:$src), 1))>;
> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi8i64)),
> - (v2i64 (XXPERMDIs (LXSIBZX xoaddr:$src), 0))>;
> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi8)),
> - (v4i32 (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1))>;
> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi8i64)),
> - (v2i64 (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), 0))>;
> +defm : ScalToVecWPermute<v16i8, ScalarLoads.Li8,
> + (VSPLTBs 7, (LXSIBZX xoaddr:$src)),
> + (VSPLTBs 7, (LXSIBZX xoaddr:$src))>;
> +defm : ScalToVecWPermute<v8i16, ScalarLoads.ZELi8,
> + (VSPLTHs 3, (LXSIBZX xoaddr:$src)),
> + (VSPLTHs 3, (LXSIBZX xoaddr:$src))>;
> +defm : ScalToVecWPermute<v4i32, ScalarLoads.ZELi8,
> + (XXSPLTWs (LXSIBZX xoaddr:$src), 1),
> + (XXSPLTWs (LXSIBZX xoaddr:$src), 1)>;
> +defm : ScalToVecWPermute<v2i64, ScalarLoads.ZELi8i64,
> + (XXPERMDIs (LXSIBZX xoaddr:$src), 0),
> + (XXPERMDIs (LXSIBZX xoaddr:$src), 0)>;
> +defm : ScalToVecWPermute<v4i32, ScalarLoads.SELi8,
> + (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1),
> + (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1)>;
> +defm : ScalToVecWPermute<v2i64, ScalarLoads.SELi8i64,
> + (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), 0),
> + (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)),
> 0)>;
>
> // Build vectors from i16 loads
> -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.Li16)),
> - (v8i16 (VSPLTHs 3, (LXSIHZX xoaddr:$src)))>;
> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi16)),
> - (v4i32 (XXSPLTWs (LXSIHZX xoaddr:$src), 1))>;
> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
> - (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>;
> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi16)),
> - (v4i32 (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1))>;
> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi16i64)),
> - (v2i64 (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), 0))>;
> +defm : ScalToVecWPermute<v8i16, ScalarLoads.Li16,
> + (VSPLTHs 3, (LXSIHZX xoaddr:$src)),
> + (VSPLTHs 3, (LXSIHZX xoaddr:$src))>;
> +defm : ScalToVecWPermute<v4i32, ScalarLoads.ZELi16,
> + (XXSPLTWs (LXSIHZX xoaddr:$src), 1),
> + (XXSPLTWs (LXSIHZX xoaddr:$src), 1)>;
> +defm : ScalToVecWPermute<v2i64, ScalarLoads.ZELi16i64,
> + (XXPERMDIs (LXSIHZX xoaddr:$src), 0),
> + (XXPERMDIs (LXSIHZX xoaddr:$src), 0)>;
> +defm : ScalToVecWPermute<v4i32, ScalarLoads.SELi16,
> + (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1),
> + (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1)>;
> +defm : ScalToVecWPermute<v2i64, ScalarLoads.SELi16i64,
> + (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), 0),
> + (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)),
> 0)>;
>
> // Load/convert and convert/store patterns for f16.
> def : Pat<(f64 (extloadf16 xoaddr:$src)),
> @@ -3806,8 +3849,7 @@ def : Pat<(f32 (PPCxsminc f32:$XA, f32:$XB)),
> VSSRC))>;
>
> // Endianness-neutral patterns for const splats with ISA 3.0 instructions.
> -def : Pat<(v4i32 (scalar_to_vector i32:$A)),
> - (v4i32 (MTVSRWS $A))>;
> +defm : ScalToVecWPermute<v4i32, (i32 i32:$A), (MTVSRWS $A), (MTVSRWS $A)>;
> def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
> (v4i32 (MTVSRWS $A))>;
> def : Pat<(v16i8 (build_vector immNonAllOneAnyExt8:$A,
> immNonAllOneAnyExt8:$A,
> @@ -3819,24 +3861,32 @@ def : Pat<(v16i8 (build_vector
> immNonAllOneAnyExt8:$A, immNonAllOneAnyExt8:$A,
> immNonAllOneAnyExt8:$A,
> immNonAllOneAnyExt8:$A,
> immNonAllOneAnyExt8:$A,
> immNonAllOneAnyExt8:$A)),
> (v16i8 (COPY_TO_REGCLASS (XXSPLTIB imm:$A), VSRC))>;
> -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
> - (v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>;
> -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
> - (v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>;
> -def : Pat<(v4i32 (scalar_to_vector DblToIntLoadP9.A)),
> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
> - (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC),
> 1))>;
> -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoadP9.A)),
> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
> - (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC),
> 1))>;
> -def : Pat<(v2i64 (scalar_to_vector FltToLongLoadP9.A)),
> - (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
> - (DFLOADf32 iaddrX4:$A),
> - VSFRC)), 0))>;
> -def : Pat<(v2i64 (scalar_to_vector FltToULongLoadP9.A)),
> - (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
> - (DFLOADf32 iaddrX4:$A),
> - VSFRC)), 0))>;
> +defm : ScalToVecWPermute<v4i32, FltToIntLoad.A,
> + (XVCVSPSXWS (LXVWSX xoaddr:$A)),
> + (XVCVSPSXWS (LXVWSX xoaddr:$A))>;
> +defm : ScalToVecWPermute<v4i32, FltToUIntLoad.A,
> + (XVCVSPUXWS (LXVWSX xoaddr:$A)),
> + (XVCVSPUXWS (LXVWSX xoaddr:$A))>;
> +defm : ScalToVecWPermute<
> + v4i32, DblToIntLoadP9.A,
> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC),
> 1),
> + (SUBREG_TO_REG (i64 1), (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), sub_64)>;
> +defm : ScalToVecWPermute<
> + v4i32, DblToUIntLoadP9.A,
> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC),
> 1),
> + (SUBREG_TO_REG (i64 1), (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), sub_64)>;
> +defm : ScalToVecWPermute<
> + v2i64, FltToLongLoadP9.A,
> + (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A),
> VSFRC)), 0),
> + (SUBREG_TO_REG
> + (i64 1),
> + (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)),
> sub_64)>;
> +defm : ScalToVecWPermute<
> + v2i64, FltToULongLoadP9.A,
> + (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A),
> VSFRC)), 0),
> + (SUBREG_TO_REG
> + (i64 1),
> + (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)),
> sub_64)>;
> def : Pat<(v4f32 (PPCldsplat xoaddr:$A)),
> (v4f32 (LXVWSX xoaddr:$A))>;
> def : Pat<(v4i32 (PPCldsplat xoaddr:$A)),
> @@ -4116,19 +4166,23 @@ def : Pat<(truncstorei16 (i32 (vector_extract
> v8i16:$S, 6)), xoaddr:$dst),
> def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 7)), xoaddr:$dst),
> (STXSIHXv (COPY_TO_REGCLASS (v16i8 (VSLDOI $S, $S, 10)), VSRC),
> xoaddr:$dst)>;
>
> -def : Pat<(v2i64 (scalar_to_vector (i64 (load iaddrX4:$src)))),
> - (v2i64 (XXPERMDIs
> - (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>;
> -def : Pat<(v2i64 (scalar_to_vector (i64 (load xaddrX4:$src)))),
> - (v2i64 (XXPERMDIs
> - (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>;
> +defm : ScalToVecWPermute<
> + v2i64, (i64 (load iaddrX4:$src)),
> + (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2),
> + (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>;
> +defm : ScalToVecWPermute<
> + v2i64, (i64 (load xaddrX4:$src)),
> + (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2),
> + (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>;
> +defm : ScalToVecWPermute<
> + v2f64, (f64 (load iaddrX4:$src)),
> + (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2),
> + (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>;
> +defm : ScalToVecWPermute<
> + v2f64, (f64 (load xaddrX4:$src)),
> + (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2),
> + (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>;
>
> -def : Pat<(v2f64 (scalar_to_vector (f64 (load iaddrX4:$src)))),
> - (v2f64 (XXPERMDIs
> - (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>;
> -def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddrX4:$src)))),
> - (v2f64 (XXPERMDIs
> - (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>;
> def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddrX4:$src),
> (XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
> sub_64), xaddrX4:$src)>;
>
> diff --git a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
> b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
> index 8c9ffa815467..4d06571d0ec7 100644
> --- a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
> +++ b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
> @@ -13,8 +13,7 @@ define void @testExpandPostRAPseudo(i32* nocapture
> readonly %ptr) {
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8: lfiwzx f0, 0, r3
> ; CHECK-P8: ld r4, .LC0 at toc@l(r4)
> -; CHECK-P8: xxswapd vs0, f0
> -; CHECK-P8: xxspltw v2, vs0, 3
> +; CHECK-P8: xxspltw v2, vs0, 1
> ; CHECK-P8: stvx v2, 0, r4
> ; CHECK-P8: lis r4, 1024
> ; CHECK-P8: lfiwax f0, 0, r3
>
> diff --git a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
> b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
> index ee0cc41ea6bd..1cb7d7b62055 100644
> --- a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
> +++ b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
> @@ -1282,8 +1282,7 @@ define <4 x i32> @spltMemVali(i32* nocapture
> readonly %ptr) {
> ; P8LE-LABEL: spltMemVali:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfiwzx f0, 0, r3
> -; P8LE-NEXT: xxswapd vs0, f0
> -; P8LE-NEXT: xxspltw v2, vs0, 3
> +; P8LE-NEXT: xxspltw v2, vs0, 1
> ; P8LE-NEXT: blr
> entry:
> %0 = load i32, i32* %ptr, align 4
> @@ -2801,8 +2800,7 @@ define <4 x i32> @spltMemValui(i32* nocapture
> readonly %ptr) {
> ; P8LE-LABEL: spltMemValui:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfiwzx f0, 0, r3
> -; P8LE-NEXT: xxswapd vs0, f0
> -; P8LE-NEXT: xxspltw v2, vs0, 3
> +; P8LE-NEXT: xxspltw v2, vs0, 1
> ; P8LE-NEXT: blr
> entry:
> %0 = load i32, i32* %ptr, align 4
> @@ -4573,7 +4571,7 @@ define <2 x i64> @spltMemValConvftoll(float*
> nocapture readonly %ptr) {
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfs f0, 0(r3)
> ; P9LE-NEXT: xscvdpsxds f0, f0
> -; P9LE-NEXT: xxspltd v2, f0, 0
> +; P9LE-NEXT: xxspltd v2, vs0, 0
> ; P9LE-NEXT: blr
> ;
> ; P8BE-LABEL: spltMemValConvftoll:
> @@ -4587,7 +4585,7 @@ define <2 x i64> @spltMemValConvftoll(float*
> nocapture readonly %ptr) {
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfsx f0, 0, r3
> ; P8LE-NEXT: xscvdpsxds f0, f0
> -; P8LE-NEXT: xxspltd v2, f0, 0
> +; P8LE-NEXT: xxspltd v2, vs0, 0
> ; P8LE-NEXT: blr
> entry:
> %0 = load float, float* %ptr, align 4
> @@ -5761,7 +5759,7 @@ define <2 x i64> @spltMemValConvftoull(float*
> nocapture readonly %ptr) {
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfs f0, 0(r3)
> ; P9LE-NEXT: xscvdpuxds f0, f0
> -; P9LE-NEXT: xxspltd v2, f0, 0
> +; P9LE-NEXT: xxspltd v2, vs0, 0
> ; P9LE-NEXT: blr
> ;
> ; P8BE-LABEL: spltMemValConvftoull:
> @@ -5775,7 +5773,7 @@ define <2 x i64> @spltMemValConvftoull(float*
> nocapture readonly %ptr) {
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfsx f0, 0, r3
> ; P8LE-NEXT: xscvdpuxds f0, f0
> -; P8LE-NEXT: xxspltd v2, f0, 0
> +; P8LE-NEXT: xxspltd v2, vs0, 0
> ; P8LE-NEXT: blr
> entry:
> %0 = load float, float* %ptr, align 4
>
> diff --git a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
> b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
> index 2ffe98e1f694..7fac0511e3c5 100644
> --- a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
> +++ b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
> @@ -23,18 +23,12 @@ entry:
> define dso_local <16 x i8> @testmrghb2(<16 x i8> %a, <16 x i8> %b)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: testmrghb2:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: addis r3, r2, .LCPI1_0 at toc@ha
> -; CHECK-P8-NEXT: addi r3, r3, .LCPI1_0 at toc@l
> -; CHECK-P8-NEXT: lvx v4, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P8-NEXT: vmrghb v2, v2, v3
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testmrghb2:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: addis r3, r2, .LCPI1_0 at toc@ha
> -; CHECK-P9-NEXT: addi r3, r3, .LCPI1_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v4, 0, r3
> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
> ; CHECK-P9-NEXT: blr
> entry:
> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
> 24, i32 8, i32 25, i32 9, i32 26, i32 10, i32 27, i32 11, i32 28, i32 12,
> i32 29, i32 13, i32 30, i32 14, i32 31, i32 15>
> @@ -57,18 +51,12 @@ entry:
> define dso_local <16 x i8> @testmrghh2(<16 x i8> %a, <16 x i8> %b)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: testmrghh2:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: addis r3, r2, .LCPI3_0 at toc@ha
> -; CHECK-P8-NEXT: addi r3, r3, .LCPI3_0 at toc@l
> -; CHECK-P8-NEXT: lvx v4, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P8-NEXT: vmrghh v2, v2, v3
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testmrghh2:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: addis r3, r2, .LCPI3_0 at toc@ha
> -; CHECK-P9-NEXT: addi r3, r3, .LCPI3_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v4, 0, r3
> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
> ; CHECK-P9-NEXT: blr
> entry:
> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
> 24, i32 25, i32 8, i32 9, i32 26, i32 27, i32 10, i32 11, i32 28, i32 29,
> i32 12, i32 13, i32 30, i32 31, i32 14, i32 15>
> @@ -91,18 +79,12 @@ entry:
> define dso_local <16 x i8> @testmrglb2(<16 x i8> %a, <16 x i8> %b)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: testmrglb2:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: addis r3, r2, .LCPI5_0 at toc@ha
> -; CHECK-P8-NEXT: addi r3, r3, .LCPI5_0 at toc@l
> -; CHECK-P8-NEXT: lvx v4, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P8-NEXT: vmrglb v2, v2, v3
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testmrglb2:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: addis r3, r2, .LCPI5_0 at toc@ha
> -; CHECK-P9-NEXT: addi r3, r3, .LCPI5_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v4, 0, r3
> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P9-NEXT: vmrglb v2, v2, v3
> ; CHECK-P9-NEXT: blr
> entry:
> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
> 16, i32 0, i32 17, i32 1, i32 18, i32 2, i32 19, i32 3, i32 20, i32 4, i32
> 21, i32 5, i32 22, i32 6, i32 23, i32 7>
> @@ -125,18 +107,12 @@ entry:
> define dso_local <16 x i8> @testmrglh2(<16 x i8> %a, <16 x i8> %b)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: testmrglh2:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: addis r3, r2, .LCPI7_0 at toc@ha
> -; CHECK-P8-NEXT: addi r3, r3, .LCPI7_0 at toc@l
> -; CHECK-P8-NEXT: lvx v4, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P8-NEXT: vmrglh v2, v2, v3
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testmrglh2:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: addis r3, r2, .LCPI7_0 at toc@ha
> -; CHECK-P9-NEXT: addi r3, r3, .LCPI7_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v4, 0, r3
> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P9-NEXT: vmrglh v2, v2, v3
> ; CHECK-P9-NEXT: blr
> entry:
> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
> 16, i32 17, i32 0, i32 1, i32 18, i32 19, i32 2, i32 3, i32 20, i32 21, i32
> 4, i32 5, i32 22, i32 23, i32 6, i32 7>
> @@ -159,18 +135,12 @@ entry:
> define dso_local <16 x i8> @testmrghw2(<16 x i8> %a, <16 x i8> %b)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: testmrghw2:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: addis r3, r2, .LCPI9_0 at toc@ha
> -; CHECK-P8-NEXT: addi r3, r3, .LCPI9_0 at toc@l
> -; CHECK-P8-NEXT: lvx v4, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P8-NEXT: vmrghw v2, v2, v3
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testmrghw2:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: addis r3, r2, .LCPI9_0 at toc@ha
> -; CHECK-P9-NEXT: addi r3, r3, .LCPI9_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v4, 0, r3
> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
> ; CHECK-P9-NEXT: blr
> entry:
> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
> 24, i32 25, i32 26, i32 27, i32 8, i32 9, i32 10, i32 11, i32 28, i32 29,
> i32 30, i32 31, i32 12, i32 13, i32 14, i32 15>
> @@ -193,18 +163,12 @@ entry:
> define dso_local <16 x i8> @testmrglw2(<16 x i8> %a, <16 x i8> %b)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: testmrglw2:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: addis r3, r2, .LCPI11_0 at toc@ha
> -; CHECK-P8-NEXT: addi r3, r3, .LCPI11_0 at toc@l
> -; CHECK-P8-NEXT: lvx v4, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P8-NEXT: vmrglw v2, v2, v3
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testmrglw2:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: addis r3, r2, .LCPI11_0 at toc@ha
> -; CHECK-P9-NEXT: addi r3, r3, .LCPI11_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v4, 0, r3
> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
> +; CHECK-P9-NEXT: vmrglw v2, v2, v3
> ; CHECK-P9-NEXT: blr
> entry:
> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
> 16, i32 17, i32 18, i32 19, i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32
> 22, i32 23, i32 4, i32 5, i32 6, i32 7>
> @@ -215,24 +179,16 @@ define dso_local <8 x i16> @testmrglb3(<8 x i8>*
> nocapture readonly %a) local_un
> ; CHECK-P8-LABEL: testmrglb3:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: ld r3, 0(r3)
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI12_0 at toc@ha
> -; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: addi r3, r4, .LCPI12_0 at toc@l
> -; CHECK-P8-NEXT: lvx v3, 0, r3
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: vperm v2, v2, v4, v3
> +; CHECK-P8-NEXT: xxlxor v2, v2, v2
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> +; CHECK-P8-NEXT: vmrghb v2, v2, v3
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testmrglb3:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: lfd f0, 0(r3)
> -; CHECK-P9-NEXT: addis r3, r2, .LCPI12_0 at toc@ha
> -; CHECK-P9-NEXT: addi r3, r3, .LCPI12_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v3, 0, r3
> -; CHECK-P9-NEXT: xxswapd v2, f0
> -; CHECK-P9-NEXT: xxlxor v4, v4, v4
> -; CHECK-P9-NEXT: vperm v2, v2, v4, v3
> +; CHECK-P9-NEXT: lxsd v2, 0(r3)
> +; CHECK-P9-NEXT: xxlxor v3, v3, v3
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> ; CHECK-P9-NEXT: blr
> entry:
> %0 = load <8 x i8>, <8 x i8>* %a, align 8
>
> diff --git a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
> b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
> index a23db59635a4..3a43b3584caf 100644
> --- a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
> +++ b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
> @@ -331,12 +331,12 @@ define <2 x float> @fptrunc_v2f32_v2f64(<2 x double>
> %vf1) {
> ; P9: # %bb.0:
> ; P9-NEXT: xsrsp f0, v2
> ; P9-NEXT: xscvdpspn vs0, f0
> -; P9-NEXT: xxsldwi v3, vs0, vs0, 1
> +; P9-NEXT: xxsldwi v3, vs0, vs0, 3
> ; P9-NEXT: xxswapd vs0, v2
> ; P9-NEXT: xsrsp f0, f0
> ; P9-NEXT: xscvdpspn vs0, f0
> -; P9-NEXT: xxsldwi v2, vs0, vs0, 1
> -; P9-NEXT: vmrglw v2, v3, v2
> +; P9-NEXT: xxsldwi v2, vs0, vs0, 3
> +; P9-NEXT: vmrghw v2, v3, v2
> ; P9-NEXT: blr
> %res = call <2 x float>
> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(
> <2 x double> %vf1,
>
> diff --git a/llvm/test/CodeGen/PowerPC/load-and-splat.ll
> b/llvm/test/CodeGen/PowerPC/load-and-splat.ll
> index f411712ba3fa..26da1fdaefef 100644
> --- a/llvm/test/CodeGen/PowerPC/load-and-splat.ll
> +++ b/llvm/test/CodeGen/PowerPC/load-and-splat.ll
> @@ -40,8 +40,7 @@ define dso_local void @test2(<4 x float>* nocapture %c,
> float* nocapture readonl
> ; P8: # %bb.0: # %entry
> ; P8-NEXT: addi r4, r4, 12
> ; P8-NEXT: lfiwzx f0, 0, r4
> -; P8-NEXT: xxswapd vs0, f0
> -; P8-NEXT: xxspltw v2, vs0, 3
> +; P8-NEXT: xxspltw v2, vs0, 1
> ; P8-NEXT: stvx v2, 0, r3
> ; P8-NEXT: blr
> entry:
> @@ -65,8 +64,7 @@ define dso_local void @test3(<4 x i32>* nocapture %c,
> i32* nocapture readonly %a
> ; P8: # %bb.0: # %entry
> ; P8-NEXT: addi r4, r4, 12
> ; P8-NEXT: lfiwzx f0, 0, r4
> -; P8-NEXT: xxswapd vs0, f0
> -; P8-NEXT: xxspltw v2, vs0, 3
> +; P8-NEXT: xxspltw v2, vs0, 1
> ; P8-NEXT: stvx v2, 0, r3
> ; P8-NEXT: blr
> entry:
> @@ -110,8 +108,7 @@ define <16 x i8> @unadjusted_lxvwsx(i32* %s, i32* %t) {
> ; P8-LABEL: unadjusted_lxvwsx:
> ; P8: # %bb.0: # %entry
> ; P8-NEXT: lfiwzx f0, 0, r3
> -; P8-NEXT: xxswapd vs0, f0
> -; P8-NEXT: xxspltw v2, vs0, 3
> +; P8-NEXT: xxspltw v2, vs0, 1
> ; P8-NEXT: blr
> entry:
> %0 = bitcast i32* %s to <4 x i8>*
> @@ -131,8 +128,7 @@ define <16 x i8> @adjusted_lxvwsx(i64* %s, i64* %t) {
> ; P8: # %bb.0: # %entry
> ; P8-NEXT: ld r3, 0(r3)
> ; P8-NEXT: mtfprd f0, r3
> -; P8-NEXT: xxswapd v2, vs0
> -; P8-NEXT: xxspltw v2, v2, 2
> +; P8-NEXT: xxspltw v2, vs0, 0
> ; P8-NEXT: blr
> entry:
> %0 = bitcast i64* %s to <8 x i8>*
>
> diff --git a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
> b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
> index 409978549c36..a03ab5f9519e 100644
> --- a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
> +++ b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
> @@ -9,8 +9,7 @@ define <16 x i8> @test(i32* %s, i32* %t) {
> ; CHECK-LE-LABEL: test:
> ; CHECK-LE: # %bb.0: # %entry
> ; CHECK-LE-NEXT: lfiwzx f0, 0, r3
> -; CHECK-LE-NEXT: xxswapd vs0, f0
> -; CHECK-LE-NEXT: xxspltw v2, vs0, 3
> +; CHECK-LE-NEXT: xxspltw v2, vs0, 1
> ; CHECK-LE-NEXT: blr
>
> ; CHECK-LABEL: test:
>
> diff --git a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
> b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
> index e1f0e827b9f6..dffa0fb98fc0 100644
> --- a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
> +++ b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
> @@ -21,8 +21,8 @@ entry:
> ; CHECK: sldi r3, r3, 56
> ; CHECK: mtvsrd v2, r3
> ; CHECK-LE-LABEL: buildc
> -; CHECK-LE: mtfprd f0, r3
> -; CHECK-LE: xxswapd v2, vs0
> +; CHECK-LE: mtvsrd v2, r3
> +; CHECK-LE: vspltb v2, v2, 7
> }
>
> ; Function Attrs: norecurse nounwind readnone
> @@ -35,8 +35,8 @@ entry:
> ; CHECK: sldi r3, r3, 48
> ; CHECK: mtvsrd v2, r3
> ; CHECK-LE-LABEL: builds
> -; CHECK-LE: mtfprd f0, r3
> -; CHECK-LE: xxswapd v2, vs0
> +; CHECK-LE: mtvsrd v2, r3
> +; CHECK-LE: vsplth v2, v2, 3
> }
>
> ; Function Attrs: norecurse nounwind readnone
>
> diff --git a/llvm/test/CodeGen/PowerPC/pr25080.ll
> b/llvm/test/CodeGen/PowerPC/pr25080.ll
> index 7a2fb76fd453..f87cb5b940ca 100644
> --- a/llvm/test/CodeGen/PowerPC/pr25080.ll
> +++ b/llvm/test/CodeGen/PowerPC/pr25080.ll
> @@ -17,41 +17,33 @@ define <8 x i16> @pr25080(<8 x i32> %a) {
> ; LE-NEXT: mfvsrwz 3, 34
> ; LE-NEXT: xxsldwi 1, 34, 34, 1
> ; LE-NEXT: mfvsrwz 4, 35
> -; LE-NEXT: xxsldwi 4, 34, 34, 3
> -; LE-NEXT: mtfprd 2, 3
> +; LE-NEXT: xxsldwi 2, 34, 34, 3
> +; LE-NEXT: mtvsrd 36, 3
> ; LE-NEXT: mffprwz 3, 0
> ; LE-NEXT: xxswapd 0, 35
> -; LE-NEXT: mtfprd 3, 4
> -; LE-NEXT: xxsldwi 5, 35, 35, 1
> +; LE-NEXT: mtvsrd 37, 4
> ; LE-NEXT: mffprwz 4, 1
> -; LE-NEXT: xxsldwi 7, 35, 35, 3
> -; LE-NEXT: mtfprd 1, 3
> -; LE-NEXT: xxswapd 33, 3
> -; LE-NEXT: mffprwz 3, 4
> -; LE-NEXT: mtfprd 4, 4
> -; LE-NEXT: xxswapd 34, 1
> +; LE-NEXT: xxsldwi 1, 35, 35, 1
> +; LE-NEXT: mtvsrd 34, 3
> +; LE-NEXT: mffprwz 3, 2
> +; LE-NEXT: mtvsrd 32, 4
> ; LE-NEXT: mffprwz 4, 0
> -; LE-NEXT: mtfprd 0, 3
> -; LE-NEXT: xxswapd 35, 4
> -; LE-NEXT: mffprwz 3, 5
> -; LE-NEXT: mtfprd 6, 4
> -; LE-NEXT: xxswapd 36, 0
> -; LE-NEXT: mtfprd 1, 3
> -; LE-NEXT: mffprwz 3, 7
> -; LE-NEXT: xxswapd 37, 6
> -; LE-NEXT: vmrglh 2, 3, 2
> -; LE-NEXT: xxswapd 35, 2
> -; LE-NEXT: mtfprd 2, 3
> -; LE-NEXT: xxswapd 32, 1
> +; LE-NEXT: xxsldwi 0, 35, 35, 3
> +; LE-NEXT: mtvsrd 33, 3
> +; LE-NEXT: mffprwz 3, 1
> +; LE-NEXT: mtvsrd 38, 4
> +; LE-NEXT: mtvsrd 35, 3
> +; LE-NEXT: mffprwz 3, 0
> +; LE-NEXT: vmrghh 2, 0, 2
> +; LE-NEXT: mtvsrd 32, 3
> ; LE-NEXT: addis 3, 2, .LCPI0_1 at toc@ha
> +; LE-NEXT: vmrghh 4, 1, 4
> ; LE-NEXT: addi 3, 3, .LCPI0_1 at toc@l
> -; LE-NEXT: xxswapd 38, 2
> -; LE-NEXT: vmrglh 3, 4, 3
> -; LE-NEXT: vmrglh 4, 0, 5
> -; LE-NEXT: vmrglh 5, 6, 1
> -; LE-NEXT: vmrglw 2, 3, 2
> -; LE-NEXT: vmrglw 3, 5, 4
> +; LE-NEXT: vmrghh 3, 3, 6
> +; LE-NEXT: vmrghh 5, 0, 5
> +; LE-NEXT: vmrglw 2, 4, 2
> ; LE-NEXT: vspltish 4, 15
> +; LE-NEXT: vmrglw 3, 5, 3
> ; LE-NEXT: xxmrgld 34, 35, 34
> ; LE-NEXT: lvx 3, 0, 3
> ; LE-NEXT: xxlor 34, 34, 35
>
> diff --git a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
> b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
> index 4c10c3813fb5..d3bfb910fc9f 100644
> --- a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
> +++ b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
> @@ -58,12 +58,11 @@ L.LB38_2452:
>
> ; CHECK-LABEL: @aercalc_
> ; CHECK: lfs
> -; CHECK: xxspltd
> +; CHECK: xxswapd
> ; CHECK: stxvd2x
> ; CHECK-NOT: xxswapd
>
> ; CHECK-P9-LABEL: @aercalc_
> ; CHECK-P9: lfs
> -; CHECK-P9: xxspltd
> ; CHECK-P9: stxv
> ; CHECK-P9-NOT: xxswapd
>
> diff --git a/llvm/test/CodeGen/PowerPC/pr38087.ll
> b/llvm/test/CodeGen/PowerPC/pr38087.ll
> index e05a3d2b97aa..49b3d39bc18c 100644
> --- a/llvm/test/CodeGen/PowerPC/pr38087.ll
> +++ b/llvm/test/CodeGen/PowerPC/pr38087.ll
> @@ -11,9 +11,8 @@ declare { i32, i1 } @llvm.usub.with.overflow.i32(i32,
> i32) #0
> define void @draw_llvm_vs_variant0(<4 x float> %x) {
> ; CHECK-LABEL: draw_llvm_vs_variant0:
> ; CHECK: # %bb.0: # %entry
> -; CHECK-NEXT: lfd f0, 0(r3)
> -; CHECK-NEXT: xxswapd v3, f0
> -; CHECK-NEXT: vmrglh v3, v3, v3
> +; CHECK-NEXT: lxsd v3, 0(r3)
> +; CHECK-NEXT: vmrghh v3, v3, v3
> ; CHECK-NEXT: vextsh2w v3, v3
> ; CHECK-NEXT: xvcvsxwsp vs0, v3
> ; CHECK-NEXT: xxspltw vs0, vs0, 2
>
> diff --git a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
> b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
> index 4c9137d86124..6584cb74bdb5 100644
> --- a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
> +++ b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
> @@ -11,34 +11,31 @@
> define signext i32 @test_pre_inc_disable_1(i8* nocapture readonly %pix1,
> i32 signext %i_stride_pix1, i8* nocapture readonly %pix2) {
> ; CHECK-LABEL: test_pre_inc_disable_1:
> ; CHECK: # %bb.0: # %entry
> -; CHECK-NEXT: lfd f0, 0(r5)
> +; CHECK-NEXT: lxsd v5, 0(r5)
> ; CHECK-NEXT: addis r5, r2, .LCPI0_0 at toc@ha
> ; CHECK-NEXT: addi r5, r5, .LCPI0_0 at toc@l
> ; CHECK-NEXT: lxvx v2, 0, r5
> ; CHECK-NEXT: addis r5, r2, .LCPI0_1 at toc@ha
> ; CHECK-NEXT: addi r5, r5, .LCPI0_1 at toc@l
> ; CHECK-NEXT: lxvx v4, 0, r5
> -; CHECK-NEXT: xxswapd v5, f0
> -; CHECK-NEXT: xxlxor v3, v3, v3
> ; CHECK-NEXT: li r5, 4
> +; CHECK-NEXT: xxlxor v3, v3, v3
> ; CHECK-NEXT: vperm v0, v3, v5, v2
> ; CHECK-NEXT: mtctr r5
> ; CHECK-NEXT: li r5, 0
> -; CHECK-NEXT: vperm v1, v5, v3, v4
> +; CHECK-NEXT: vperm v1, v3, v5, v4
> ; CHECK-NEXT: li r6, 0
> ; CHECK-NEXT: xvnegsp v5, v0
> ; CHECK-NEXT: xvnegsp v0, v1
> ; CHECK-NEXT: .p2align 4
> ; CHECK-NEXT: .LBB0_1: # %for.cond1.preheader
> ; CHECK-NEXT: #
> -; CHECK-NEXT: lfd f0, 0(r3)
> -; CHECK-NEXT: xxswapd v1, f0
> -; CHECK-NEXT: lfdx f0, r3, r4
> -; CHECK-NEXT: vperm v6, v1, v3, v4
> +; CHECK-NEXT: lxsd v1, 0(r3)
> +; CHECK-NEXT: vperm v6, v3, v1, v4
> ; CHECK-NEXT: vperm v1, v3, v1, v2
> ; CHECK-NEXT: xvnegsp v1, v1
> -; CHECK-NEXT: add r7, r3, r4
> ; CHECK-NEXT: xvnegsp v6, v6
> +; CHECK-NEXT: add r7, r3, r4
> ; CHECK-NEXT: vabsduw v1, v1, v5
> ; CHECK-NEXT: vabsduw v6, v6, v0
> ; CHECK-NEXT: vadduwm v1, v6, v1
> @@ -46,15 +43,14 @@ define signext i32 @test_pre_inc_disable_1(i8*
> nocapture readonly %pix1, i32 sig
> ; CHECK-NEXT: vadduwm v1, v1, v6
> ; CHECK-NEXT: xxspltw v6, v1, 2
> ; CHECK-NEXT: vadduwm v1, v1, v6
> -; CHECK-NEXT: xxswapd v6, f0
> +; CHECK-NEXT: lxsdx v6, r3, r4
> ; CHECK-NEXT: vextuwrx r3, r5, v1
> -; CHECK-NEXT: vperm v7, v6, v3, v4
> +; CHECK-NEXT: vperm v7, v3, v6, v4
> ; CHECK-NEXT: vperm v6, v3, v6, v2
> -; CHECK-NEXT: add r6, r3, r6
> -; CHECK-NEXT: add r3, r7, r4
> ; CHECK-NEXT: xvnegsp v6, v6
> ; CHECK-NEXT: xvnegsp v1, v7
> ; CHECK-NEXT: vabsduw v6, v6, v5
> +; CHECK-NEXT: add r6, r3, r6
> ; CHECK-NEXT: vabsduw v1, v1, v0
> ; CHECK-NEXT: vadduwm v1, v1, v6
> ; CHECK-NEXT: xxswapd v6, v1
> @@ -62,6 +58,7 @@ define signext i32 @test_pre_inc_disable_1(i8* nocapture
> readonly %pix1, i32 sig
> ; CHECK-NEXT: xxspltw v6, v1, 2
> ; CHECK-NEXT: vadduwm v1, v1, v6
> ; CHECK-NEXT: vextuwrx r8, r5, v1
> +; CHECK-NEXT: add r3, r7, r4
> ; CHECK-NEXT: add r6, r8, r6
> ; CHECK-NEXT: bdnz .LBB0_1
> ; CHECK-NEXT: # %bb.2: # %for.cond.cleanup
> @@ -181,29 +178,27 @@ for.cond.cleanup: ;
> preds = %for.cond1.preheader
> define signext i32 @test_pre_inc_disable_2(i8* nocapture readonly %pix1,
> i8* nocapture readonly %pix2) {
> ; CHECK-LABEL: test_pre_inc_disable_2:
> ; CHECK: # %bb.0: # %entry
> -; CHECK-NEXT: lfd f0, 0(r3)
> +; CHECK-NEXT: lxsd v2, 0(r3)
> ; CHECK-NEXT: addis r3, r2, .LCPI1_0 at toc@ha
> ; CHECK-NEXT: addi r3, r3, .LCPI1_0 at toc@l
> ; CHECK-NEXT: lxvx v4, 0, r3
> ; CHECK-NEXT: addis r3, r2, .LCPI1_1 at toc@ha
> -; CHECK-NEXT: xxswapd v2, f0
> -; CHECK-NEXT: lfd f0, 0(r4)
> ; CHECK-NEXT: addi r3, r3, .LCPI1_1 at toc@l
> -; CHECK-NEXT: xxlxor v3, v3, v3
> ; CHECK-NEXT: lxvx v0, 0, r3
> -; CHECK-NEXT: xxswapd v1, f0
> -; CHECK-NEXT: vperm v5, v2, v3, v4
> +; CHECK-NEXT: lxsd v1, 0(r4)
> +; CHECK-NEXT: xxlxor v3, v3, v3
> +; CHECK-NEXT: vperm v5, v3, v2, v4
> ; CHECK-NEXT: vperm v2, v3, v2, v0
> ; CHECK-NEXT: vperm v0, v3, v1, v0
> -; CHECK-NEXT: vperm v3, v1, v3, v4
> +; CHECK-NEXT: vperm v3, v3, v1, v4
> ; CHECK-NEXT: vabsduw v2, v2, v0
> ; CHECK-NEXT: vabsduw v3, v5, v3
> ; CHECK-NEXT: vadduwm v2, v3, v2
> ; CHECK-NEXT: xxswapd v3, v2
> -; CHECK-NEXT: li r3, 0
> ; CHECK-NEXT: vadduwm v2, v2, v3
> ; CHECK-NEXT: xxspltw v3, v2, 2
> ; CHECK-NEXT: vadduwm v2, v2, v3
> +; CHECK-NEXT: li r3, 0
> ; CHECK-NEXT: vextuwrx r3, r3, v2
> ; CHECK-NEXT: extsw r3, r3
> ; CHECK-NEXT: blr
> @@ -286,16 +281,14 @@ define void @test32(i8* nocapture readonly %pix2,
> i32 signext %i_pix2) {
> ; CHECK-LABEL: test32:
> ; CHECK: # %bb.0: # %entry
> ; CHECK-NEXT: add r5, r3, r4
> -; CHECK-NEXT: lfiwzx f0, r3, r4
> +; CHECK-NEXT: lxsiwzx v2, r3, r4
> ; CHECK-NEXT: addis r3, r2, .LCPI2_0 at toc@ha
> ; CHECK-NEXT: addi r3, r3, .LCPI2_0 at toc@l
> ; CHECK-NEXT: lxvx v4, 0, r3
> ; CHECK-NEXT: li r3, 4
> -; CHECK-NEXT: xxswapd v2, f0
> -; CHECK-NEXT: lfiwzx f0, r5, r3
> +; CHECK-NEXT: lxsiwzx v5, r5, r3
> ; CHECK-NEXT: xxlxor v3, v3, v3
> ; CHECK-NEXT: vperm v2, v2, v3, v4
> -; CHECK-NEXT: xxswapd v5, f0
> ; CHECK-NEXT: vperm v3, v5, v3, v4
> ; CHECK-NEXT: vspltisw v4, 8
> ; CHECK-NEXT: vnegw v3, v3
> @@ -361,16 +354,15 @@ define void @test16(i16* nocapture readonly %sums,
> i32 signext %delta, i32 signe
> ; CHECK-NEXT: lxsihzx v2, r6, r7
> ; CHECK-NEXT: lxsihzx v4, r3, r4
> ; CHECK-NEXT: li r6, 0
> -; CHECK-NEXT: mtfprd f0, r6
> +; CHECK-NEXT: mtvsrd v3, r6
> ; CHECK-NEXT: vsplth v4, v4, 3
> -; CHECK-NEXT: xxswapd v3, vs0
> ; CHECK-NEXT: vsplth v2, v2, 3
> ; CHECK-NEXT: addis r3, r2, .LCPI3_0 at toc@ha
> ; CHECK-NEXT: addi r3, r3, .LCPI3_0 at toc@l
> -; CHECK-NEXT: vmrglh v2, v3, v2
> -; CHECK-NEXT: vmrglh v3, v3, v4
> -; CHECK-NEXT: xxlxor v4, v4, v4
> -; CHECK-NEXT: vmrglw v3, v3, v4
> +; CHECK-NEXT: vmrghh v4, v3, v4
> +; CHECK-NEXT: vmrghh v2, v3, v2
> +; CHECK-NEXT: vsplth v3, v3, 3
> +; CHECK-NEXT: vmrglw v3, v4, v3
> ; CHECK-NEXT: lxvx v4, 0, r3
> ; CHECK-NEXT: li r3, 0
> ; CHECK-NEXT: vperm v2, v2, v3, v4
> @@ -446,18 +438,17 @@ define void @test8(i8* nocapture readonly %sums, i32
> signext %delta, i32 signext
> ; CHECK-NEXT: add r6, r3, r4
> ; CHECK-NEXT: lxsibzx v2, r3, r4
> ; CHECK-NEXT: li r3, 0
> -; CHECK-NEXT: mtfprd f0, r3
> +; CHECK-NEXT: mtvsrd v3, r3
> ; CHECK-NEXT: li r3, 8
> ; CHECK-NEXT: lxsibzx v5, r6, r3
> -; CHECK-NEXT: xxswapd v3, vs0
> -; CHECK-NEXT: vspltb v4, v3, 15
> -; CHECK-NEXT: vspltb v2, v2, 7
> -; CHECK-NEXT: vmrglb v2, v3, v2
> ; CHECK-NEXT: addis r3, r2, .LCPI4_0 at toc@ha
> ; CHECK-NEXT: addi r3, r3, .LCPI4_0 at toc@l
> +; CHECK-NEXT: vspltb v2, v2, 7
> +; CHECK-NEXT: vmrghb v2, v3, v2
> +; CHECK-NEXT: vspltb v4, v3, 7
> ; CHECK-NEXT: vspltb v5, v5, 7
> ; CHECK-NEXT: vmrglh v2, v2, v4
> -; CHECK-NEXT: vmrglb v3, v3, v5
> +; CHECK-NEXT: vmrghb v3, v3, v5
> ; CHECK-NEXT: vmrglw v2, v2, v4
> ; CHECK-NEXT: vmrglh v3, v3, v4
> ; CHECK-NEXT: vmrglw v3, v4, v3
>
> diff --git a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
> b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
> index 099611a7b5e3..50b864980d98 100644
> --- a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
> +++ b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
> @@ -53,8 +53,7 @@ define <4 x float> @foof(float* nocapture readonly %a)
> #0 {
> ; CHECK-LABEL: foof:
> ; CHECK: # %bb.0: # %entry
> ; CHECK-NEXT: lfiwzx f0, 0, r3
> -; CHECK-NEXT: xxswapd vs0, f0
> -; CHECK-NEXT: xxspltw v2, vs0, 3
> +; CHECK-NEXT: xxspltw v2, vs0, 1
> ; CHECK-NEXT: blr
> entry:
> %0 = load float, float* %a, align 4
> @@ -68,8 +67,7 @@ define <4 x float> @foofx(float* nocapture readonly %a,
> i64 %idx) #0 {
> ; CHECK: # %bb.0: # %entry
> ; CHECK-NEXT: sldi r4, r4, 2
> ; CHECK-NEXT: lfiwzx f0, r3, r4
> -; CHECK-NEXT: xxswapd vs0, f0
> -; CHECK-NEXT: xxspltw v2, vs0, 3
> +; CHECK-NEXT: xxspltw v2, vs0, 1
> ; CHECK-NEXT: blr
> entry:
> %p = getelementptr float, float* %a, i64 %idx
>
> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
> index b43e2c8b97af..c12f7f9a9f05 100644
> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
> @@ -13,8 +13,7 @@ define <2 x i64> @s2v_test1(i64* nocapture readonly
> %int64, <2 x i64> %vec) {
> ; P9LE-LABEL: s2v_test1:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfd f0, 0(r3)
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test1:
> @@ -33,8 +32,7 @@ define <2 x i64> @s2v_test2(i64* nocapture readonly
> %int64, <2 x i64> %vec) {
> ; P9LE-LABEL: s2v_test2:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfd f0, 8(r3)
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test2:
> @@ -55,8 +53,7 @@ define <2 x i64> @s2v_test3(i64* nocapture readonly
> %int64, <2 x i64> %vec, i32
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: sldi r4, r7, 3
> ; P9LE-NEXT: lfdx f0, r3, r4
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test3
> @@ -78,8 +75,7 @@ define <2 x i64> @s2v_test4(i64* nocapture readonly
> %int64, <2 x i64> %vec) {
> ; P9LE-LABEL: s2v_test4:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfd f0, 8(r3)
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test4:
> @@ -99,8 +95,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i64*
> nocapture readonly %ptr1) {
> ; P9LE-LABEL: s2v_test5:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfd f0, 0(r5)
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test5:
> @@ -119,8 +114,7 @@ define <2 x double> @s2v_test_f1(double* nocapture
> readonly %f64, <2 x double> %
> ; P9LE-LABEL: s2v_test_f1:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfd f0, 0(r3)
> -; P9LE-NEXT: xxswapd vs0, f0
> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f1:
> @@ -132,8 +126,7 @@ define <2 x double> @s2v_test_f1(double* nocapture
> readonly %f64, <2 x double> %
> ; P8LE-LABEL: s2v_test_f1:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfdx f0, 0, r3
> -; P8LE-NEXT: xxspltd vs0, vs0, 0
> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f1:
> @@ -152,8 +145,7 @@ define <2 x double> @s2v_test_f2(double* nocapture
> readonly %f64, <2 x double> %
> ; P9LE-LABEL: s2v_test_f2:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfd f0, 8(r3)
> -; P9LE-NEXT: xxswapd vs0, f0
> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f2:
> @@ -165,8 +157,7 @@ define <2 x double> @s2v_test_f2(double* nocapture
> readonly %f64, <2 x double> %
> ; P8LE-LABEL: s2v_test_f2:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfd f0, 8(r3)
> -; P8LE-NEXT: xxspltd vs0, vs0, 0
> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f2:
> @@ -187,8 +178,7 @@ define <2 x double> @s2v_test_f3(double* nocapture
> readonly %f64, <2 x double> %
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: sldi r4, r7, 3
> ; P9LE-NEXT: lfdx f0, r3, r4
> -; P9LE-NEXT: xxswapd vs0, f0
> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f3:
> @@ -202,8 +192,7 @@ define <2 x double> @s2v_test_f3(double* nocapture
> readonly %f64, <2 x double> %
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: sldi r4, r7, 3
> ; P8LE-NEXT: lfdx f0, r3, r4
> -; P8LE-NEXT: xxspltd vs0, vs0, 0
> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f3:
> @@ -225,8 +214,7 @@ define <2 x double> @s2v_test_f4(double* nocapture
> readonly %f64, <2 x double> %
> ; P9LE-LABEL: s2v_test_f4:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfd f0, 8(r3)
> -; P9LE-NEXT: xxswapd vs0, f0
> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f4:
> @@ -238,8 +226,7 @@ define <2 x double> @s2v_test_f4(double* nocapture
> readonly %f64, <2 x double> %
> ; P8LE-LABEL: s2v_test_f4:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfd f0, 8(r3)
> -; P8LE-NEXT: xxspltd vs0, vs0, 0
> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f4:
> @@ -259,8 +246,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec,
> double* nocapture readonly %
> ; P9LE-LABEL: s2v_test_f5:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfd f0, 0(r5)
> -; P9LE-NEXT: xxswapd vs0, f0
> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f5:
> @@ -272,8 +258,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec,
> double* nocapture readonly %
> ; P8LE-LABEL: s2v_test_f5:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfdx f0, 0, r5
> -; P8LE-NEXT: xxspltd vs0, vs0, 0
> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f5:
>
> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
> index 83691b52575d..f4572c359942 100644
> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
> @@ -12,8 +12,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly
> %int32, <2 x i64> %vec) {
> ; P9LE-LABEL: s2v_test1:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfiwax f0, 0, r3
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test1:
> @@ -25,8 +24,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly
> %int32, <2 x i64> %vec) {
> ; P8LE-LABEL: s2v_test1:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfiwax f0, 0, r3
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test1:
> @@ -47,8 +45,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly
> %int32, <2 x i64> %vec) {
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: addi r3, r3, 4
> ; P9LE-NEXT: lfiwax f0, 0, r3
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test2:
> @@ -62,8 +59,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly
> %int32, <2 x i64> %vec) {
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: addi r3, r3, 4
> ; P8LE-NEXT: lfiwax f0, 0, r3
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test2:
> @@ -86,8 +82,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly
> %int32, <2 x i64> %vec, i32
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: sldi r4, r7, 2
> ; P9LE-NEXT: lfiwax f0, r3, r4
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test3:
> @@ -101,8 +96,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly
> %int32, <2 x i64> %vec, i32
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: sldi r4, r7, 2
> ; P8LE-NEXT: lfiwax f0, r3, r4
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test3:
> @@ -126,8 +120,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly
> %int32, <2 x i64> %vec) {
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: addi r3, r3, 4
> ; P9LE-NEXT: lfiwax f0, 0, r3
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test4:
> @@ -141,8 +134,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly
> %int32, <2 x i64> %vec) {
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: addi r3, r3, 4
> ; P8LE-NEXT: lfiwax f0, 0, r3
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test4:
> @@ -164,8 +156,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32*
> nocapture readonly %ptr1) {
> ; P9LE-LABEL: s2v_test5:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfiwax f0, 0, r5
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P9LE-NEXT: xxmrghd v2, v2, vs0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test5:
> @@ -177,8 +168,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32*
> nocapture readonly %ptr1) {
> ; P8LE-LABEL: s2v_test5:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfiwax f0, 0, r5
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
> +; P8LE-NEXT: xxmrghd v2, v2, vs0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test5:
> @@ -198,8 +188,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly
> %ptr) {
> ; P9LE-LABEL: s2v_test6:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfiwax f0, 0, r3
> -; P9LE-NEXT: xxswapd v2, f0
> -; P9LE-NEXT: xxspltd v2, v2, 1
> +; P9LE-NEXT: xxspltd v2, vs0, 0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test6:
> @@ -211,8 +200,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly
> %ptr) {
> ; P8LE-LABEL: s2v_test6:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfiwax f0, 0, r3
> -; P8LE-NEXT: xxswapd v2, f0
> -; P8LE-NEXT: xxspltd v2, v2, 1
> +; P8LE-NEXT: xxspltd v2, vs0, 0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test6:
> @@ -233,8 +221,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly
> %ptr) {
> ; P9LE-LABEL: s2v_test7:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: lfiwax f0, 0, r3
> -; P9LE-NEXT: xxswapd v2, f0
> -; P9LE-NEXT: xxspltd v2, v2, 1
> +; P9LE-NEXT: xxspltd v2, vs0, 0
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test7:
> @@ -246,8 +233,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly
> %ptr) {
> ; P8LE-LABEL: s2v_test7:
> ; P8LE: # %bb.0: # %entry
> ; P8LE-NEXT: lfiwax f0, 0, r3
> -; P8LE-NEXT: xxswapd v2, f0
> -; P8LE-NEXT: xxspltd v2, v2, 1
> +; P8LE-NEXT: xxspltd v2, vs0, 0
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test7:
>
> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
> index 2261d75c6619..3dc34533420c 100644
> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
> @@ -11,12 +11,11 @@
> define <4 x i32> @s2v_test1(i32* nocapture readonly %int32, <4 x i32>
> %vec) {
> ; P8LE-LABEL: s2v_test1:
> ; P8LE: # %bb.0: # %entry
> -; P8LE-NEXT: lfiwzx f0, 0, r3
> ; P8LE-NEXT: addis r4, r2, .LCPI0_0 at toc@ha
> -; P8LE-NEXT: addi r3, r4, .LCPI0_0 at toc@l
> -; P8LE-NEXT: lvx v3, 0, r3
> -; P8LE-NEXT: xxswapd v4, f0
> -; P8LE-NEXT: vperm v2, v4, v2, v3
> +; P8LE-NEXT: lxsiwzx v4, 0, r3
> +; P8LE-NEXT: addi r4, r4, .LCPI0_0 at toc@l
> +; P8LE-NEXT: lvx v3, 0, r4
> +; P8LE-NEXT: vperm v2, v2, v4, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test1:
> @@ -36,13 +35,12 @@ entry:
> define <4 x i32> @s2v_test2(i32* nocapture readonly %int32, <4 x i32>
> %vec) {
> ; P8LE-LABEL: s2v_test2:
> ; P8LE: # %bb.0: # %entry
> -; P8LE-NEXT: addi r3, r3, 4
> ; P8LE-NEXT: addis r4, r2, .LCPI1_0 at toc@ha
> -; P8LE-NEXT: lfiwzx f0, 0, r3
> -; P8LE-NEXT: addi r3, r4, .LCPI1_0 at toc@l
> -; P8LE-NEXT: lvx v3, 0, r3
> -; P8LE-NEXT: xxswapd v4, f0
> -; P8LE-NEXT: vperm v2, v4, v2, v3
> +; P8LE-NEXT: addi r3, r3, 4
> +; P8LE-NEXT: addi r4, r4, .LCPI1_0 at toc@l
> +; P8LE-NEXT: lxsiwzx v4, 0, r3
> +; P8LE-NEXT: lvx v3, 0, r4
> +; P8LE-NEXT: vperm v2, v2, v4, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test2:
> @@ -64,13 +62,12 @@ entry:
> define <4 x i32> @s2v_test3(i32* nocapture readonly %int32, <4 x i32>
> %vec, i32 signext %Idx) {
> ; P8LE-LABEL: s2v_test3:
> ; P8LE: # %bb.0: # %entry
> -; P8LE-NEXT: sldi r5, r7, 2
> ; P8LE-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
> -; P8LE-NEXT: lfiwzx f0, r3, r5
> -; P8LE-NEXT: addi r3, r4, .LCPI2_0 at toc@l
> -; P8LE-NEXT: lvx v4, 0, r3
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: vperm v2, v3, v2, v4
> +; P8LE-NEXT: sldi r5, r7, 2
> +; P8LE-NEXT: addi r4, r4, .LCPI2_0 at toc@l
> +; P8LE-NEXT: lxsiwzx v3, r3, r5
> +; P8LE-NEXT: lvx v4, 0, r4
> +; P8LE-NEXT: vperm v2, v2, v3, v4
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test3:
> @@ -93,13 +90,12 @@ entry:
> define <4 x i32> @s2v_test4(i32* nocapture readonly %int32, <4 x i32>
> %vec) {
> ; P8LE-LABEL: s2v_test4:
> ; P8LE: # %bb.0: # %entry
> -; P8LE-NEXT: addi r3, r3, 4
> ; P8LE-NEXT: addis r4, r2, .LCPI3_0 at toc@ha
> -; P8LE-NEXT: lfiwzx f0, 0, r3
> -; P8LE-NEXT: addi r3, r4, .LCPI3_0 at toc@l
> -; P8LE-NEXT: lvx v3, 0, r3
> -; P8LE-NEXT: xxswapd v4, f0
> -; P8LE-NEXT: vperm v2, v4, v2, v3
> +; P8LE-NEXT: addi r3, r3, 4
> +; P8LE-NEXT: addi r4, r4, .LCPI3_0 at toc@l
> +; P8LE-NEXT: lxsiwzx v4, 0, r3
> +; P8LE-NEXT: lvx v3, 0, r4
> +; P8LE-NEXT: vperm v2, v2, v4, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test4:
> @@ -121,12 +117,11 @@ entry:
> define <4 x i32> @s2v_test5(<4 x i32> %vec, i32* nocapture readonly
> %ptr1) {
> ; P8LE-LABEL: s2v_test5:
> ; P8LE: # %bb.0: # %entry
> -; P8LE-NEXT: lfiwzx f0, 0, r5
> ; P8LE-NEXT: addis r3, r2, .LCPI4_0 at toc@ha
> +; P8LE-NEXT: lxsiwzx v4, 0, r5
> ; P8LE-NEXT: addi r3, r3, .LCPI4_0 at toc@l
> ; P8LE-NEXT: lvx v3, 0, r3
> -; P8LE-NEXT: xxswapd v4, f0
> -; P8LE-NEXT: vperm v2, v4, v2, v3
> +; P8LE-NEXT: vperm v2, v2, v4, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test5:
> @@ -146,12 +141,11 @@ entry:
> define <4 x float> @s2v_test_f1(float* nocapture readonly %f64, <4 x
> float> %vec) {
> ; P8LE-LABEL: s2v_test_f1:
> ; P8LE: # %bb.0: # %entry
> -; P8LE-NEXT: lfiwzx f0, 0, r3
> ; P8LE-NEXT: addis r4, r2, .LCPI5_0 at toc@ha
> -; P8LE-NEXT: addi r3, r4, .LCPI5_0 at toc@l
> -; P8LE-NEXT: lvx v3, 0, r3
> -; P8LE-NEXT: xxswapd v4, f0
> -; P8LE-NEXT: vperm v2, v4, v2, v3
> +; P8LE-NEXT: lxsiwzx v4, 0, r3
> +; P8LE-NEXT: addi r4, r4, .LCPI5_0 at toc@l
> +; P8LE-NEXT: lvx v3, 0, r4
> +; P8LE-NEXT: vperm v2, v2, v4, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f1:
> @@ -172,10 +166,9 @@ define <2 x float> @s2v_test_f2(float* nocapture
> readonly %f64, <2 x float> %vec
> ; P9LE-LABEL: s2v_test_f2:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: addi r3, r3, 4
> -; P9LE-DAG: xxspltw v2, v2, 2
> -; P9LE-DAG: lfiwzx f0, 0, r3
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: vmrglw v2, v2, v3
> +; P9LE-NEXT: lxsiwzx v3, 0, r3
> +; P9LE-NEXT: vmrglw v2, v2, v2
> +; P9LE-NEXT: vmrghw v2, v2, v3
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f2:
> @@ -189,11 +182,10 @@ define <2 x float> @s2v_test_f2(float* nocapture
> readonly %f64, <2 x float> %vec
>
> ; P8LE-LABEL: s2v_test_f2:
> ; P8LE: # %bb.0: # %entry
> +; P8LE-NEXT: vmrglw v2, v2, v2
> ; P8LE-NEXT: addi r3, r3, 4
> -; P8LE-NEXT: xxspltw v2, v2, 2
> -; P8LE-NEXT: lfiwzx f0, 0, r3
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: vmrglw v2, v2, v3
> +; P8LE-NEXT: lxsiwzx v3, 0, r3
> +; P8LE-NEXT: vmrghw v2, v2, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f2:
> @@ -216,10 +208,9 @@ define <2 x float> @s2v_test_f3(float* nocapture
> readonly %f64, <2 x float> %vec
> ; P9LE-LABEL: s2v_test_f3:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: sldi r4, r7, 2
> -; P9LE-NEXT: lfiwzx f0, r3, r4
> -; P9LE-DAG: xxspltw v2, v2, 2
> -; P9LE-DAG: xxswapd v3, f0
> -; P9LE-NEXT: vmrglw v2, v2, v3
> +; P9LE-NEXT: lxsiwzx v3, r3, r4
> +; P9LE-NEXT: vmrglw v2, v2, v2
> +; P9LE-NEXT: vmrghw v2, v2, v3
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f3:
> @@ -233,11 +224,10 @@ define <2 x float> @s2v_test_f3(float* nocapture
> readonly %f64, <2 x float> %vec
>
> ; P8LE-LABEL: s2v_test_f3:
> ; P8LE: # %bb.0: # %entry
> +; P8LE-NEXT: vmrglw v2, v2, v2
> ; P8LE-NEXT: sldi r4, r7, 2
> -; P8LE-NEXT: xxspltw v2, v2, 2
> -; P8LE-NEXT: lfiwzx f0, r3, r4
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: vmrglw v2, v2, v3
> +; P8LE-NEXT: lxsiwzx v3, r3, r4
> +; P8LE-NEXT: vmrghw v2, v2, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f3:
> @@ -261,10 +251,9 @@ define <2 x float> @s2v_test_f4(float* nocapture
> readonly %f64, <2 x float> %vec
> ; P9LE-LABEL: s2v_test_f4:
> ; P9LE: # %bb.0: # %entry
> ; P9LE-NEXT: addi r3, r3, 4
> -; P9LE-NEXT: lfiwzx f0, 0, r3
> -; P9LE-DAG: xxspltw v2, v2, 2
> -; P9LE-DAG: xxswapd v3, f0
> -; P9LE-NEXT: vmrglw v2, v2, v3
> +; P9LE-NEXT: lxsiwzx v3, 0, r3
> +; P9LE-NEXT: vmrglw v2, v2, v2
> +; P9LE-NEXT: vmrghw v2, v2, v3
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f4:
> @@ -278,11 +267,10 @@ define <2 x float> @s2v_test_f4(float* nocapture
> readonly %f64, <2 x float> %vec
>
> ; P8LE-LABEL: s2v_test_f4:
> ; P8LE: # %bb.0: # %entry
> +; P8LE-NEXT: vmrglw v2, v2, v2
> ; P8LE-NEXT: addi r3, r3, 4
> -; P8LE-NEXT: xxspltw v2, v2, 2
> -; P8LE-NEXT: lfiwzx f0, 0, r3
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: vmrglw v2, v2, v3
> +; P8LE-NEXT: lxsiwzx v3, 0, r3
> +; P8LE-NEXT: vmrghw v2, v2, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f4:
> @@ -304,10 +292,9 @@ entry:
> define <2 x float> @s2v_test_f5(<2 x float> %vec, float* nocapture
> readonly %ptr1) {
> ; P9LE-LABEL: s2v_test_f5:
> ; P9LE: # %bb.0: # %entry
> -; P9LE-NEXT: lfiwzx f0, 0, r5
> -; P9LE-NEXT: xxspltw v2, v2, 2
> -; P9LE-NEXT: xxswapd v3, f0
> -; P9LE-NEXT: vmrglw v2, v2, v3
> +; P9LE-NEXT: lxsiwzx v3, 0, r5
> +; P9LE-NEXT: vmrglw v2, v2, v2
> +; P9LE-NEXT: vmrghw v2, v2, v3
> ; P9LE-NEXT: blr
>
> ; P9BE-LABEL: s2v_test_f5:
> @@ -320,10 +307,9 @@ define <2 x float> @s2v_test_f5(<2 x float> %vec,
> float* nocapture readonly %ptr
>
> ; P8LE-LABEL: s2v_test_f5:
> ; P8LE: # %bb.0: # %entry
> -; P8LE-NEXT: lfiwzx f0, 0, r5
> -; P8LE-NEXT: xxspltw v2, v2, 2
> -; P8LE-NEXT: xxswapd v3, f0
> -; P8LE-NEXT: vmrglw v2, v2, v3
> +; P8LE-NEXT: vmrglw v2, v2, v2
> +; P8LE-NEXT: lxsiwzx v3, 0, r5
> +; P8LE-NEXT: vmrghw v2, v2, v3
> ; P8LE-NEXT: blr
>
> ; P8BE-LABEL: s2v_test_f5:
>
> diff --git a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
> b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
> index 935630745f47..097ba07a5b1e 100644
> --- a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
> +++ b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
> @@ -13,60 +13,56 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 0
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, -21386
> -; P9LE-NEXT: ori r5, r5, 37253
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r5, r4, r5
> -; P9LE-NEXT: add r4, r5, r4
> +; P9LE-NEXT: lis r4, -21386
> +; P9LE-NEXT: ori r4, r4, 37253
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r4, r3, r4
> +; P9LE-NEXT: add r4, r4, r3
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 6
> ; P9LE-NEXT: add r4, r4, r5
> -; P9LE-NEXT: lis r5, 31710
> ; P9LE-NEXT: mulli r4, r4, 95
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, 31710
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: ori r5, r5, 63421
> -; P9LE-NEXT: mulhw r5, r4, r5
> -; P9LE-NEXT: sub r4, r5, r4
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: ori r4, r4, 63421
> +; P9LE-NEXT: mulhw r4, r3, r4
> +; P9LE-NEXT: sub r4, r4, r3
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 6
> ; P9LE-NEXT: add r4, r4, r5
> -; P9LE-NEXT: lis r5, 21399
> ; P9LE-NEXT: mulli r4, r4, -124
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, 21399
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: ori r5, r5, 33437
> -; P9LE-NEXT: mulhw r4, r4, r5
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: ori r4, r4, 33437
> +; P9LE-NEXT: mulhw r4, r3, r4
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 5
> ; P9LE-NEXT: add r4, r4, r5
> -; P9LE-NEXT: lis r5, -16728
> ; P9LE-NEXT: mulli r4, r4, 98
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: ori r5, r5, 63249
> -; P9LE-NEXT: mulhw r4, r4, r5
> +; P9LE-NEXT: lis r4, -16728
> +; P9LE-NEXT: ori r4, r4, 63249
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r4, r3, r4
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 8
> ; P9LE-NEXT: add r4, r4, r5
> ; P9LE-NEXT: mulli r4, r4, -1003
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v2, v4
> ; P9LE-NEXT: vmrglw v2, v2, v3
> ; P9LE-NEXT: blr
> ;
> @@ -135,58 +131,54 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) {
> ; P8LE: # %bb.0:
> ; P8LE-NEXT: xxswapd vs0, v2
> ; P8LE-NEXT: lis r3, 21399
> -; P8LE-NEXT: lis r9, -21386
> -; P8LE-NEXT: lis r11, 31710
> ; P8LE-NEXT: lis r8, -16728
> +; P8LE-NEXT: lis r9, -21386
> +; P8LE-NEXT: lis r10, 31710
> ; P8LE-NEXT: ori r3, r3, 33437
> -; P8LE-NEXT: ori r9, r9, 37253
> ; P8LE-NEXT: ori r8, r8, 63249
> +; P8LE-NEXT: ori r9, r9, 37253
> +; P8LE-NEXT: ori r10, r10, 63421
> ; P8LE-NEXT: mffprd r4, f0
> ; P8LE-NEXT: rldicl r5, r4, 32, 48
> -; P8LE-NEXT: clrldi r7, r4, 48
> ; P8LE-NEXT: rldicl r6, r4, 16, 48
> +; P8LE-NEXT: clrldi r7, r4, 48
> +; P8LE-NEXT: extsh r5, r5
> +; P8LE-NEXT: extsh r6, r6
> ; P8LE-NEXT: rldicl r4, r4, 48, 48
> -; P8LE-NEXT: extsh r10, r5
> -; P8LE-NEXT: extsh r0, r7
> -; P8LE-NEXT: mulhw r3, r10, r3
> -; P8LE-NEXT: ori r10, r11, 63421
> -; P8LE-NEXT: extsh r11, r4
> -; P8LE-NEXT: extsh r12, r6
> -; P8LE-NEXT: mulhw r9, r0, r9
> -; P8LE-NEXT: mulhw r10, r11, r10
> -; P8LE-NEXT: mulhw r8, r12, r8
> -; P8LE-NEXT: srwi r12, r3, 31
> +; P8LE-NEXT: extsh r7, r7
> +; P8LE-NEXT: mulhw r3, r5, r3
> +; P8LE-NEXT: extsh r4, r4
> +; P8LE-NEXT: mulhw r8, r6, r8
> +; P8LE-NEXT: mulhw r9, r7, r9
> +; P8LE-NEXT: mulhw r10, r4, r10
> +; P8LE-NEXT: srwi r11, r3, 31
> ; P8LE-NEXT: srawi r3, r3, 5
> -; P8LE-NEXT: add r9, r9, r0
> -; P8LE-NEXT: sub r10, r10, r11
> -; P8LE-NEXT: add r3, r3, r12
> +; P8LE-NEXT: add r3, r3, r11
> +; P8LE-NEXT: srwi r11, r8, 31
> +; P8LE-NEXT: add r9, r9, r7
> +; P8LE-NEXT: srawi r8, r8, 8
> +; P8LE-NEXT: sub r10, r10, r4
> +; P8LE-NEXT: add r8, r8, r11
> ; P8LE-NEXT: srwi r11, r9, 31
> ; P8LE-NEXT: srawi r9, r9, 6
> -; P8LE-NEXT: srwi r12, r8, 31
> -; P8LE-NEXT: srawi r8, r8, 8
> +; P8LE-NEXT: mulli r3, r3, 98
> ; P8LE-NEXT: add r9, r9, r11
> ; P8LE-NEXT: srwi r11, r10, 31
> ; P8LE-NEXT: srawi r10, r10, 6
> -; P8LE-NEXT: add r8, r8, r12
> -; P8LE-NEXT: mulli r3, r3, 98
> -; P8LE-NEXT: add r10, r10, r11
> ; P8LE-NEXT: mulli r8, r8, -1003
> +; P8LE-NEXT: add r10, r10, r11
> ; P8LE-NEXT: mulli r9, r9, 95
> ; P8LE-NEXT: mulli r10, r10, -124
> ; P8LE-NEXT: sub r3, r5, r3
> +; P8LE-NEXT: mtvsrd v2, r3
> ; P8LE-NEXT: sub r5, r6, r8
> -; P8LE-NEXT: mtfprd f0, r3
> ; P8LE-NEXT: sub r3, r7, r9
> +; P8LE-NEXT: mtvsrd v3, r5
> ; P8LE-NEXT: sub r4, r4, r10
> -; P8LE-NEXT: mtfprd f1, r5
> -; P8LE-NEXT: mtfprd f2, r3
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: mtfprd f3, r4
> -; P8LE-NEXT: xxswapd v3, vs1
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: xxswapd v5, vs3
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: vmrglh v3, v5, v4
> +; P8LE-NEXT: mtvsrd v4, r3
> +; P8LE-NEXT: mtvsrd v5, r4
> +; P8LE-NEXT: vmrghh v2, v3, v2
> +; P8LE-NEXT: vmrghh v3, v5, v4
> ; P8LE-NEXT: vmrglw v2, v2, v3
> ; P8LE-NEXT: blr
> ;
> @@ -256,56 +248,52 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 0
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, -21386
> -; P9LE-NEXT: ori r5, r5, 37253
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r6, r4, r5
> -; P9LE-NEXT: add r4, r6, r4
> -; P9LE-NEXT: srwi r6, r4, 31
> -; P9LE-NEXT: srawi r4, r4, 6
> -; P9LE-NEXT: add r4, r4, r6
> -; P9LE-NEXT: mulli r4, r4, 95
> -; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, -21386
> +; P9LE-NEXT: ori r4, r4, 37253
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r5, r3, r4
> +; P9LE-NEXT: add r5, r5, r3
> +; P9LE-NEXT: srwi r6, r5, 31
> +; P9LE-NEXT: srawi r5, r5, 6
> +; P9LE-NEXT: add r5, r5, r6
> +; P9LE-NEXT: mulli r5, r5, 95
> +; P9LE-NEXT: sub r3, r3, r5
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r6, r4, r5
> -; P9LE-NEXT: add r4, r6, r4
> -; P9LE-NEXT: srwi r6, r4, 31
> -; P9LE-NEXT: srawi r4, r4, 6
> -; P9LE-NEXT: add r4, r4, r6
> -; P9LE-NEXT: mulli r4, r4, 95
> -; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r5, r3, r4
> +; P9LE-NEXT: add r5, r5, r3
> +; P9LE-NEXT: srwi r6, r5, 31
> +; P9LE-NEXT: srawi r5, r5, 6
> +; P9LE-NEXT: add r5, r5, r6
> +; P9LE-NEXT: mulli r5, r5, 95
> +; P9LE-NEXT: sub r3, r3, r5
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r6, r4, r5
> -; P9LE-NEXT: add r4, r6, r4
> -; P9LE-NEXT: srwi r6, r4, 31
> -; P9LE-NEXT: srawi r4, r4, 6
> -; P9LE-NEXT: add r4, r4, r6
> -; P9LE-NEXT: mulli r4, r4, 95
> -; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r5, r3, r4
> +; P9LE-NEXT: add r5, r5, r3
> +; P9LE-NEXT: srwi r6, r5, 31
> +; P9LE-NEXT: srawi r5, r5, 6
> +; P9LE-NEXT: add r5, r5, r6
> +; P9LE-NEXT: mulli r5, r5, 95
> +; P9LE-NEXT: sub r3, r3, r5
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r5, r4, r5
> -; P9LE-NEXT: add r4, r5, r4
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r4, r3, r4
> +; P9LE-NEXT: add r4, r4, r3
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 6
> ; P9LE-NEXT: add r4, r4, r5
> ; P9LE-NEXT: mulli r4, r4, 95
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v2, v4
> ; P9LE-NEXT: vmrglw v2, v2, v3
> ; P9LE-NEXT: blr
> ;
> @@ -370,56 +358,50 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) {
> ; P8LE: # %bb.0:
> ; P8LE-NEXT: xxswapd vs0, v2
> ; P8LE-NEXT: lis r3, -21386
> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill
> ; P8LE-NEXT: ori r3, r3, 37253
> ; P8LE-NEXT: mffprd r4, f0
> ; P8LE-NEXT: clrldi r5, r4, 48
> ; P8LE-NEXT: rldicl r6, r4, 48, 48
> -; P8LE-NEXT: extsh r8, r5
> +; P8LE-NEXT: extsh r5, r5
> ; P8LE-NEXT: rldicl r7, r4, 32, 48
> -; P8LE-NEXT: extsh r9, r6
> -; P8LE-NEXT: mulhw r10, r8, r3
> +; P8LE-NEXT: extsh r6, r6
> +; P8LE-NEXT: mulhw r8, r5, r3
> ; P8LE-NEXT: rldicl r4, r4, 16, 48
> -; P8LE-NEXT: extsh r11, r7
> -; P8LE-NEXT: mulhw r12, r9, r3
> -; P8LE-NEXT: extsh r0, r4
> -; P8LE-NEXT: mulhw r30, r11, r3
> -; P8LE-NEXT: mulhw r3, r0, r3
> -; P8LE-NEXT: add r8, r10, r8
> -; P8LE-NEXT: add r9, r12, r9
> -; P8LE-NEXT: srwi r10, r8, 31
> +; P8LE-NEXT: extsh r7, r7
> +; P8LE-NEXT: mulhw r9, r6, r3
> +; P8LE-NEXT: extsh r4, r4
> +; P8LE-NEXT: mulhw r10, r7, r3
> +; P8LE-NEXT: mulhw r3, r4, r3
> +; P8LE-NEXT: add r8, r8, r5
> +; P8LE-NEXT: add r9, r9, r6
> +; P8LE-NEXT: srwi r11, r8, 31
> ; P8LE-NEXT: srawi r8, r8, 6
> -; P8LE-NEXT: add r11, r30, r11
> -; P8LE-NEXT: add r3, r3, r0
> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
> -; P8LE-NEXT: add r8, r8, r10
> -; P8LE-NEXT: srwi r10, r9, 31
> +; P8LE-NEXT: add r10, r10, r7
> +; P8LE-NEXT: add r3, r3, r4
> +; P8LE-NEXT: add r8, r8, r11
> +; P8LE-NEXT: srwi r11, r9, 31
> ; P8LE-NEXT: srawi r9, r9, 6
> ; P8LE-NEXT: mulli r8, r8, 95
> -; P8LE-NEXT: add r9, r9, r10
> -; P8LE-NEXT: srwi r10, r11, 31
> -; P8LE-NEXT: srawi r11, r11, 6
> +; P8LE-NEXT: add r9, r9, r11
> +; P8LE-NEXT: srwi r11, r10, 31
> +; P8LE-NEXT: srawi r10, r10, 6
> ; P8LE-NEXT: mulli r9, r9, 95
> -; P8LE-NEXT: add r10, r11, r10
> +; P8LE-NEXT: add r10, r10, r11
> ; P8LE-NEXT: srwi r11, r3, 31
> ; P8LE-NEXT: srawi r3, r3, 6
> ; P8LE-NEXT: mulli r10, r10, 95
> ; P8LE-NEXT: sub r5, r5, r8
> ; P8LE-NEXT: add r3, r3, r11
> -; P8LE-NEXT: mtfprd f0, r5
> +; P8LE-NEXT: mtvsrd v2, r5
> ; P8LE-NEXT: mulli r3, r3, 95
> ; P8LE-NEXT: sub r6, r6, r9
> -; P8LE-NEXT: mtfprd f1, r6
> -; P8LE-NEXT: xxswapd v2, vs0
> +; P8LE-NEXT: mtvsrd v3, r6
> ; P8LE-NEXT: sub r5, r7, r10
> -; P8LE-NEXT: mtfprd f2, r5
> -; P8LE-NEXT: xxswapd v3, vs1
> +; P8LE-NEXT: mtvsrd v4, r5
> ; P8LE-NEXT: sub r3, r4, r3
> -; P8LE-NEXT: mtfprd f3, r3
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: xxswapd v5, vs3
> -; P8LE-NEXT: vmrglh v3, v5, v4
> +; P8LE-NEXT: vmrghh v2, v3, v2
> +; P8LE-NEXT: mtvsrd v5, r3
> +; P8LE-NEXT: vmrghh v3, v5, v4
> ; P8LE-NEXT: vmrglw v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> @@ -487,67 +469,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 0
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, -21386
> -; P9LE-NEXT: ori r5, r5, 37253
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r6, r4, r5
> -; P9LE-NEXT: add r4, r6, r4
> -; P9LE-NEXT: srwi r6, r4, 31
> -; P9LE-NEXT: srawi r4, r4, 6
> -; P9LE-NEXT: add r4, r4, r6
> -; P9LE-NEXT: mulli r6, r4, 95
> +; P9LE-NEXT: lis r4, -21386
> +; P9LE-NEXT: ori r4, r4, 37253
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r5, r3, r4
> +; P9LE-NEXT: add r5, r5, r3
> +; P9LE-NEXT: srwi r6, r5, 31
> +; P9LE-NEXT: srawi r5, r5, 6
> +; P9LE-NEXT: add r5, r5, r6
> +; P9LE-NEXT: mulli r6, r5, 95
> ; P9LE-NEXT: sub r3, r3, r6
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: extsh r6, r3
> -; P9LE-NEXT: mulhw r7, r6, r5
> +; P9LE-NEXT: mulhw r7, r6, r4
> ; P9LE-NEXT: add r6, r7, r6
> ; P9LE-NEXT: srwi r7, r6, 31
> ; P9LE-NEXT: srawi r6, r6, 6
> ; P9LE-NEXT: add r6, r6, r7
> ; P9LE-NEXT: mulli r7, r6, 95
> ; P9LE-NEXT: sub r3, r3, r7
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: extsh r7, r3
> -; P9LE-NEXT: mulhw r8, r7, r5
> +; P9LE-NEXT: mulhw r8, r7, r4
> ; P9LE-NEXT: add r7, r8, r7
> ; P9LE-NEXT: srwi r8, r7, 31
> ; P9LE-NEXT: srawi r7, r7, 6
> ; P9LE-NEXT: add r7, r7, r8
> ; P9LE-NEXT: mulli r8, r7, 95
> ; P9LE-NEXT: sub r3, r3, r8
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: extsh r8, r3
> -; P9LE-NEXT: mulhw r5, r8, r5
> -; P9LE-NEXT: add r5, r5, r8
> -; P9LE-NEXT: srwi r8, r5, 31
> -; P9LE-NEXT: srawi r5, r5, 6
> -; P9LE-NEXT: add r5, r5, r8
> -; P9LE-NEXT: mulli r8, r5, 95
> +; P9LE-NEXT: mulhw r4, r8, r4
> +; P9LE-NEXT: add r4, r4, r8
> +; P9LE-NEXT: srwi r8, r4, 31
> +; P9LE-NEXT: srawi r4, r4, 6
> +; P9LE-NEXT: add r4, r4, r8
> +; P9LE-NEXT: mulli r8, r4, 95
> ; P9LE-NEXT: sub r3, r3, r8
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: mtfprd f0, r4
> -; P9LE-NEXT: vmrglh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v4, r6
> ; P9LE-NEXT: vmrglw v2, v2, v3
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r6
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r7
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r5
> -; P9LE-NEXT: xxswapd v5, vs0
> -; P9LE-NEXT: vmrglh v4, v5, v4
> +; P9LE-NEXT: mtvsrd v3, r5
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r7
> +; P9LE-NEXT: mtvsrd v5, r4
> +; P9LE-NEXT: vmrghh v4, v5, v4
> ; P9LE-NEXT: vmrglw v3, v4, v3
> ; P9LE-NEXT: vadduhm v2, v2, v3
> ; P9LE-NEXT: blr
> @@ -624,69 +598,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) {
> ; P8LE-LABEL: combine_srem_sdiv:
> ; P8LE: # %bb.0:
> ; P8LE-NEXT: xxswapd vs0, v2
> -; P8LE-NEXT: lis r4, -21386
> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill
> -; P8LE-NEXT: ori r4, r4, 37253
> -; P8LE-NEXT: mffprd r5, f0
> -; P8LE-NEXT: clrldi r3, r5, 48
> -; P8LE-NEXT: rldicl r6, r5, 48, 48
> -; P8LE-NEXT: rldicl r7, r5, 32, 48
> -; P8LE-NEXT: extsh r8, r3
> -; P8LE-NEXT: extsh r9, r6
> -; P8LE-NEXT: extsh r10, r7
> -; P8LE-NEXT: mulhw r11, r8, r4
> -; P8LE-NEXT: rldicl r5, r5, 16, 48
> -; P8LE-NEXT: mulhw r12, r9, r4
> -; P8LE-NEXT: mulhw r0, r10, r4
> -; P8LE-NEXT: extsh r30, r5
> -; P8LE-NEXT: mulhw r4, r30, r4
> +; P8LE-NEXT: lis r3, -21386
> +; P8LE-NEXT: ori r3, r3, 37253
> +; P8LE-NEXT: mffprd r4, f0
> +; P8LE-NEXT: clrldi r5, r4, 48
> +; P8LE-NEXT: rldicl r6, r4, 48, 48
> +; P8LE-NEXT: rldicl r7, r4, 32, 48
> +; P8LE-NEXT: extsh r5, r5
> +; P8LE-NEXT: extsh r8, r6
> +; P8LE-NEXT: extsh r9, r7
> +; P8LE-NEXT: mulhw r10, r5, r3
> +; P8LE-NEXT: mulhw r11, r8, r3
> +; P8LE-NEXT: rldicl r4, r4, 16, 48
> +; P8LE-NEXT: mulhw r12, r9, r3
> +; P8LE-NEXT: extsh r0, r4
> +; P8LE-NEXT: mulhw r3, r0, r3
> +; P8LE-NEXT: add r10, r10, r5
> ; P8LE-NEXT: add r8, r11, r8
> +; P8LE-NEXT: srwi r11, r10, 31
> ; P8LE-NEXT: add r9, r12, r9
> -; P8LE-NEXT: srwi r11, r8, 31
> -; P8LE-NEXT: add r10, r0, r10
> -; P8LE-NEXT: srawi r8, r8, 6
> -; P8LE-NEXT: srawi r12, r9, 6
> +; P8LE-NEXT: srawi r10, r10, 6
> +; P8LE-NEXT: srawi r12, r8, 6
> +; P8LE-NEXT: srwi r8, r8, 31
> +; P8LE-NEXT: add r10, r10, r11
> +; P8LE-NEXT: add r3, r3, r0
> +; P8LE-NEXT: srawi r11, r9, 6
> ; P8LE-NEXT: srwi r9, r9, 31
> -; P8LE-NEXT: add r8, r8, r11
> -; P8LE-NEXT: add r4, r4, r30
> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
> -; P8LE-NEXT: srawi r11, r10, 6
> -; P8LE-NEXT: srwi r10, r10, 31
> -; P8LE-NEXT: add r9, r12, r9
> -; P8LE-NEXT: mtfprd f0, r8
> -; P8LE-NEXT: mulli r12, r8, 95
> -; P8LE-NEXT: add r10, r11, r10
> -; P8LE-NEXT: srwi r8, r4, 31
> -; P8LE-NEXT: mtfprd f1, r9
> -; P8LE-NEXT: srawi r4, r4, 6
> -; P8LE-NEXT: mulli r11, r9, 95
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: mtfprd f2, r10
> -; P8LE-NEXT: mulli r9, r10, 95
> -; P8LE-NEXT: add r4, r4, r8
> -; P8LE-NEXT: xxswapd v3, vs1
> -; P8LE-NEXT: mtfprd f3, r4
> -; P8LE-NEXT: mulli r4, r4, 95
> -; P8LE-NEXT: xxswapd v1, vs2
> -; P8LE-NEXT: sub r3, r3, r12
> -; P8LE-NEXT: mtfprd f0, r3
> -; P8LE-NEXT: sub r6, r6, r11
> -; P8LE-NEXT: xxswapd v6, vs3
> -; P8LE-NEXT: sub r3, r7, r9
> -; P8LE-NEXT: mtfprd f1, r6
> -; P8LE-NEXT: mtfprd f4, r3
> -; P8LE-NEXT: sub r3, r5, r4
> -; P8LE-NEXT: mtfprd f5, r3
> -; P8LE-NEXT: xxswapd v4, vs1
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: xxswapd v3, vs0
> -; P8LE-NEXT: xxswapd v5, vs4
> -; P8LE-NEXT: xxswapd v0, vs5
> -; P8LE-NEXT: vmrglh v3, v4, v3
> -; P8LE-NEXT: vmrglh v4, v0, v5
> -; P8LE-NEXT: vmrglh v5, v6, v1
> -; P8LE-NEXT: vmrglw v3, v4, v3
> -; P8LE-NEXT: vmrglw v2, v5, v2
> +; P8LE-NEXT: add r8, r12, r8
> +; P8LE-NEXT: mtvsrd v2, r10
> +; P8LE-NEXT: mulli r12, r10, 95
> +; P8LE-NEXT: add r9, r11, r9
> +; P8LE-NEXT: srwi r11, r3, 31
> +; P8LE-NEXT: mtvsrd v3, r8
> +; P8LE-NEXT: srawi r3, r3, 6
> +; P8LE-NEXT: mulli r10, r8, 95
> +; P8LE-NEXT: mtvsrd v4, r9
> +; P8LE-NEXT: add r3, r3, r11
> +; P8LE-NEXT: mulli r8, r9, 95
> +; P8LE-NEXT: vmrghh v2, v3, v2
> +; P8LE-NEXT: mulli r9, r3, 95
> +; P8LE-NEXT: sub r5, r5, r12
> +; P8LE-NEXT: sub r6, r6, r10
> +; P8LE-NEXT: mtvsrd v3, r5
> +; P8LE-NEXT: mtvsrd v5, r6
> +; P8LE-NEXT: sub r5, r7, r8
> +; P8LE-NEXT: sub r4, r4, r9
> +; P8LE-NEXT: mtvsrd v0, r5
> +; P8LE-NEXT: mtvsrd v1, r4
> +; P8LE-NEXT: vmrghh v3, v5, v3
> +; P8LE-NEXT: mtvsrd v5, r3
> +; P8LE-NEXT: vmrghh v0, v1, v0
> +; P8LE-NEXT: vmrghh v4, v5, v4
> +; P8LE-NEXT: vmrglw v3, v0, v3
> +; P8LE-NEXT: vmrglw v2, v4, v2
> ; P8LE-NEXT: vadduhm v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> @@ -767,47 +731,43 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x
> i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 0
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: srawi r4, r4, 6
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: srawi r4, r3, 6
> ; P9LE-NEXT: addze r4, r4
> ; P9LE-NEXT: slwi r4, r4, 6
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: srawi r4, r4, 5
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: srawi r4, r3, 5
> ; P9LE-NEXT: addze r4, r4
> ; P9LE-NEXT: slwi r4, r4, 5
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, -21386
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, -21386
> -; P9LE-NEXT: ori r5, r5, 37253
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r5, r4, r5
> -; P9LE-NEXT: add r4, r5, r4
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: ori r4, r4, 37253
> +; P9LE-NEXT: mulhw r4, r3, r4
> +; P9LE-NEXT: add r4, r4, r3
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 6
> ; P9LE-NEXT: add r4, r4, r5
> ; P9LE-NEXT: mulli r4, r4, 95
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: srawi r4, r4, 3
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: srawi r4, r3, 3
> ; P9LE-NEXT: addze r4, r4
> ; P9LE-NEXT: slwi r4, r4, 3
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v2, v4, v2
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v4, v2
> ; P9LE-NEXT: vmrglw v2, v2, v3
> ; P9LE-NEXT: blr
> ;
> @@ -866,42 +826,38 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x
> i16> %x) {
> ; P8LE-NEXT: ori r3, r3, 37253
> ; P8LE-NEXT: mffprd r4, f0
> ; P8LE-NEXT: rldicl r5, r4, 16, 48
> -; P8LE-NEXT: clrldi r7, r4, 48
> -; P8LE-NEXT: extsh r6, r5
> -; P8LE-NEXT: extsh r8, r7
> -; P8LE-NEXT: mulhw r3, r6, r3
> -; P8LE-NEXT: rldicl r9, r4, 48, 48
> -; P8LE-NEXT: srawi r8, r8, 6
> -; P8LE-NEXT: extsh r10, r9
> +; P8LE-NEXT: clrldi r6, r4, 48
> +; P8LE-NEXT: extsh r5, r5
> +; P8LE-NEXT: extsh r6, r6
> +; P8LE-NEXT: mulhw r3, r5, r3
> +; P8LE-NEXT: rldicl r7, r4, 48, 48
> +; P8LE-NEXT: srawi r8, r6, 6
> +; P8LE-NEXT: extsh r7, r7
> ; P8LE-NEXT: addze r8, r8
> ; P8LE-NEXT: rldicl r4, r4, 32, 48
> -; P8LE-NEXT: srawi r10, r10, 5
> +; P8LE-NEXT: srawi r9, r7, 5
> +; P8LE-NEXT: extsh r4, r4
> ; P8LE-NEXT: slwi r8, r8, 6
> -; P8LE-NEXT: add r3, r3, r6
> -; P8LE-NEXT: addze r6, r10
> -; P8LE-NEXT: sub r7, r7, r8
> +; P8LE-NEXT: add r3, r3, r5
> +; P8LE-NEXT: addze r9, r9
> +; P8LE-NEXT: sub r6, r6, r8
> ; P8LE-NEXT: srwi r10, r3, 31
> ; P8LE-NEXT: srawi r3, r3, 6
> -; P8LE-NEXT: mtfprd f0, r7
> -; P8LE-NEXT: slwi r6, r6, 5
> +; P8LE-NEXT: slwi r8, r9, 5
> +; P8LE-NEXT: mtvsrd v2, r6
> ; P8LE-NEXT: add r3, r3, r10
> -; P8LE-NEXT: extsh r10, r4
> -; P8LE-NEXT: sub r6, r9, r6
> +; P8LE-NEXT: srawi r9, r4, 3
> +; P8LE-NEXT: sub r6, r7, r8
> ; P8LE-NEXT: mulli r3, r3, 95
> -; P8LE-NEXT: srawi r8, r10, 3
> -; P8LE-NEXT: mtfprd f1, r6
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: addze r7, r8
> -; P8LE-NEXT: xxswapd v3, vs1
> +; P8LE-NEXT: addze r7, r9
> +; P8LE-NEXT: mtvsrd v3, r6
> +; P8LE-NEXT: vmrghh v2, v3, v2
> ; P8LE-NEXT: sub r3, r5, r3
> ; P8LE-NEXT: slwi r5, r7, 3
> ; P8LE-NEXT: sub r4, r4, r5
> -; P8LE-NEXT: mtfprd f2, r3
> -; P8LE-NEXT: mtfprd f3, r4
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: xxswapd v5, vs3
> -; P8LE-NEXT: vmrglh v3, v4, v5
> +; P8LE-NEXT: mtvsrd v4, r3
> +; P8LE-NEXT: mtvsrd v5, r4
> +; P8LE-NEXT: vmrghh v3, v4, v5
> ; P8LE-NEXT: vmrglw v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> @@ -959,48 +915,46 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, -14230
> -; P9LE-NEXT: ori r5, r5, 30865
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r5, r4, r5
> -; P9LE-NEXT: add r4, r5, r4
> +; P9LE-NEXT: lis r4, -14230
> +; P9LE-NEXT: ori r4, r4, 30865
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r4, r3, r4
> +; P9LE-NEXT: add r4, r4, r3
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 9
> ; P9LE-NEXT: add r4, r4, r5
> -; P9LE-NEXT: lis r5, -19946
> ; P9LE-NEXT: mulli r4, r4, 654
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, -19946
> +; P9LE-NEXT: mtvsrd v3, r3
> +; P9LE-NEXT: li r3, 0
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 4
> -; P9LE-NEXT: ori r5, r5, 17097
> -; P9LE-NEXT: xxlxor v3, v3, v3
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r5, r4, r5
> -; P9LE-NEXT: add r4, r5, r4
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: ori r4, r4, 17097
> +; P9LE-NEXT: mulhw r4, r3, r4
> +; P9LE-NEXT: add r4, r4, r3
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 4
> ; P9LE-NEXT: add r4, r4, r5
> -; P9LE-NEXT: lis r5, 24749
> ; P9LE-NEXT: mulli r4, r4, 23
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: vmrghh v3, v3, v4
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: ori r5, r5, 47143
> -; P9LE-NEXT: mulhw r4, r4, r5
> +; P9LE-NEXT: lis r4, 24749
> +; P9LE-NEXT: ori r4, r4, 47143
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r4, r3, r4
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 11
> ; P9LE-NEXT: add r4, r4, r5
> ; P9LE-NEXT: mulli r4, r4, 5423
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v2, v4
> ; P9LE-NEXT: vmrglw v2, v2, v3
> ; P9LE-NEXT: blr
> ;
> @@ -1058,49 +1012,47 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16> %x)
> {
> ; P8LE-LABEL: dont_fold_srem_one:
> ; P8LE: # %bb.0:
> ; P8LE-NEXT: xxswapd vs0, v2
> -; P8LE-NEXT: lis r3, 24749
> -; P8LE-NEXT: lis r7, -19946
> -; P8LE-NEXT: lis r9, -14230
> -; P8LE-NEXT: xxlxor v5, v5, v5
> -; P8LE-NEXT: ori r3, r3, 47143
> -; P8LE-NEXT: ori r7, r7, 17097
> -; P8LE-NEXT: mffprd r4, f0
> -; P8LE-NEXT: rldicl r5, r4, 16, 48
> -; P8LE-NEXT: rldicl r6, r4, 32, 48
> -; P8LE-NEXT: rldicl r4, r4, 48, 48
> -; P8LE-NEXT: extsh r8, r5
> -; P8LE-NEXT: extsh r10, r6
> -; P8LE-NEXT: mulhw r3, r8, r3
> -; P8LE-NEXT: ori r8, r9, 30865
> -; P8LE-NEXT: extsh r9, r4
> -; P8LE-NEXT: mulhw r7, r10, r7
> -; P8LE-NEXT: mulhw r8, r9, r8
> -; P8LE-NEXT: add r7, r7, r10
> -; P8LE-NEXT: srwi r10, r3, 31
> -; P8LE-NEXT: add r8, r8, r9
> -; P8LE-NEXT: srawi r3, r3, 11
> -; P8LE-NEXT: srwi r9, r7, 31
> -; P8LE-NEXT: srawi r7, r7, 4
> -; P8LE-NEXT: add r3, r3, r10
> -; P8LE-NEXT: add r7, r7, r9
> +; P8LE-NEXT: lis r5, 24749
> +; P8LE-NEXT: lis r6, -19946
> +; P8LE-NEXT: lis r8, -14230
> +; P8LE-NEXT: ori r5, r5, 47143
> +; P8LE-NEXT: ori r6, r6, 17097
> +; P8LE-NEXT: ori r8, r8, 30865
> +; P8LE-NEXT: mffprd r3, f0
> +; P8LE-NEXT: rldicl r4, r3, 16, 48
> +; P8LE-NEXT: rldicl r7, r3, 32, 48
> +; P8LE-NEXT: rldicl r3, r3, 48, 48
> +; P8LE-NEXT: extsh r4, r4
> +; P8LE-NEXT: extsh r7, r7
> +; P8LE-NEXT: extsh r3, r3
> +; P8LE-NEXT: mulhw r5, r4, r5
> +; P8LE-NEXT: mulhw r6, r7, r6
> +; P8LE-NEXT: mulhw r8, r3, r8
> +; P8LE-NEXT: srwi r9, r5, 31
> +; P8LE-NEXT: srawi r5, r5, 11
> +; P8LE-NEXT: add r6, r6, r7
> +; P8LE-NEXT: add r8, r8, r3
> +; P8LE-NEXT: add r5, r5, r9
> +; P8LE-NEXT: srwi r9, r6, 31
> +; P8LE-NEXT: srawi r6, r6, 4
> +; P8LE-NEXT: add r6, r6, r9
> ; P8LE-NEXT: srwi r9, r8, 31
> ; P8LE-NEXT: srawi r8, r8, 9
> -; P8LE-NEXT: mulli r3, r3, 5423
> +; P8LE-NEXT: mulli r5, r5, 5423
> ; P8LE-NEXT: add r8, r8, r9
> -; P8LE-NEXT: mulli r7, r7, 23
> +; P8LE-NEXT: mulli r6, r6, 23
> +; P8LE-NEXT: li r9, 0
> ; P8LE-NEXT: mulli r8, r8, 654
> -; P8LE-NEXT: sub r3, r5, r3
> -; P8LE-NEXT: mtfprd f0, r3
> -; P8LE-NEXT: sub r3, r6, r7
> -; P8LE-NEXT: sub r4, r4, r8
> -; P8LE-NEXT: mtfprd f1, r3
> -; P8LE-NEXT: mtfprd f2, r4
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: xxswapd v3, vs1
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: vmrglh v2, v2, v3
> -; P8LE-NEXT: vmrglh v3, v4, v5
> -; P8LE-NEXT: vmrglw v2, v2, v3
> +; P8LE-NEXT: mtvsrd v2, r9
> +; P8LE-NEXT: sub r4, r4, r5
> +; P8LE-NEXT: sub r5, r7, r6
> +; P8LE-NEXT: mtvsrd v3, r4
> +; P8LE-NEXT: sub r3, r3, r8
> +; P8LE-NEXT: mtvsrd v4, r5
> +; P8LE-NEXT: mtvsrd v5, r3
> +; P8LE-NEXT: vmrghh v3, v3, v4
> +; P8LE-NEXT: vmrghh v2, v5, v2
> +; P8LE-NEXT: vmrglw v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> ; P8BE-LABEL: dont_fold_srem_one:
> @@ -1161,43 +1113,41 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x
> i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, -19946
> -; P9LE-NEXT: ori r5, r5, 17097
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: mulhw r5, r4, r5
> -; P9LE-NEXT: add r4, r5, r4
> +; P9LE-NEXT: lis r4, -19946
> +; P9LE-NEXT: ori r4, r4, 17097
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: mulhw r4, r3, r4
> +; P9LE-NEXT: add r4, r4, r3
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 4
> ; P9LE-NEXT: add r4, r4, r5
> -; P9LE-NEXT: lis r5, 24749
> ; P9LE-NEXT: mulli r4, r4, 23
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, 24749
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: ori r5, r5, 47143
> -; P9LE-NEXT: mulhw r4, r4, r5
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: ori r4, r4, 47143
> +; P9LE-NEXT: mulhw r4, r3, r4
> ; P9LE-NEXT: srwi r5, r4, 31
> ; P9LE-NEXT: srawi r4, r4, 11
> ; P9LE-NEXT: add r4, r4, r5
> ; P9LE-NEXT: mulli r4, r4, 5423
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: extsh r4, r3
> -; P9LE-NEXT: srawi r4, r4, 15
> +; P9LE-NEXT: extsh r3, r3
> +; P9LE-NEXT: srawi r4, r3, 15
> ; P9LE-NEXT: addze r4, r4
> ; P9LE-NEXT: slwi r4, r4, 15
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxlxor v4, v4, v4
> -; P9LE-NEXT: vmrglh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: li r3, 0
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r3
> +; P9LE-NEXT: vmrghh v2, v2, v4
> ; P9LE-NEXT: vmrglw v2, v3, v2
> ; P9LE-NEXT: blr
> ;
> @@ -1252,42 +1202,40 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x
> i16> %x) {
> ; P8LE-NEXT: xxswapd vs0, v2
> ; P8LE-NEXT: lis r4, 24749
> ; P8LE-NEXT: lis r5, -19946
> -; P8LE-NEXT: xxlxor v5, v5, v5
> ; P8LE-NEXT: ori r4, r4, 47143
> ; P8LE-NEXT: ori r5, r5, 17097
> ; P8LE-NEXT: mffprd r3, f0
> ; P8LE-NEXT: rldicl r6, r3, 16, 48
> ; P8LE-NEXT: rldicl r7, r3, 32, 48
> -; P8LE-NEXT: extsh r8, r6
> -; P8LE-NEXT: extsh r9, r7
> -; P8LE-NEXT: mulhw r4, r8, r4
> -; P8LE-NEXT: mulhw r5, r9, r5
> +; P8LE-NEXT: extsh r6, r6
> +; P8LE-NEXT: extsh r7, r7
> +; P8LE-NEXT: mulhw r4, r6, r4
> +; P8LE-NEXT: mulhw r5, r7, r5
> ; P8LE-NEXT: rldicl r3, r3, 48, 48
> +; P8LE-NEXT: extsh r3, r3
> ; P8LE-NEXT: srwi r8, r4, 31
> ; P8LE-NEXT: srawi r4, r4, 11
> -; P8LE-NEXT: add r5, r5, r9
> +; P8LE-NEXT: add r5, r5, r7
> ; P8LE-NEXT: add r4, r4, r8
> ; P8LE-NEXT: srwi r8, r5, 31
> ; P8LE-NEXT: srawi r5, r5, 4
> ; P8LE-NEXT: mulli r4, r4, 5423
> ; P8LE-NEXT: add r5, r5, r8
> -; P8LE-NEXT: extsh r8, r3
> +; P8LE-NEXT: srawi r9, r3, 15
> +; P8LE-NEXT: li r8, 0
> ; P8LE-NEXT: mulli r5, r5, 23
> -; P8LE-NEXT: srawi r8, r8, 15
> +; P8LE-NEXT: mtvsrd v2, r8
> ; P8LE-NEXT: sub r4, r6, r4
> -; P8LE-NEXT: addze r6, r8
> -; P8LE-NEXT: mtfprd f0, r4
> -; P8LE-NEXT: slwi r4, r6, 15
> +; P8LE-NEXT: addze r6, r9
> +; P8LE-NEXT: slwi r6, r6, 15
> +; P8LE-NEXT: mtvsrd v3, r4
> ; P8LE-NEXT: sub r5, r7, r5
> -; P8LE-NEXT: sub r3, r3, r4
> -; P8LE-NEXT: mtfprd f1, r5
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: mtfprd f2, r3
> -; P8LE-NEXT: xxswapd v3, vs1
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: vmrglh v2, v2, v3
> -; P8LE-NEXT: vmrglh v3, v4, v5
> -; P8LE-NEXT: vmrglw v2, v2, v3
> +; P8LE-NEXT: sub r3, r3, r6
> +; P8LE-NEXT: mtvsrd v4, r5
> +; P8LE-NEXT: mtvsrd v5, r3
> +; P8LE-NEXT: vmrghh v3, v3, v4
> +; P8LE-NEXT: vmrghh v2, v5, v2
> +; P8LE-NEXT: vmrglw v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> ; P8BE-LABEL: dont_fold_urem_i16_smax:
>
> diff --git a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
> b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
> index 323397202c00..95f0fc25f2dd 100644
> --- a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
> +++ b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
> @@ -15,10 +15,10 @@ entry:
> }
>
> ; CHECK-LABEL: @bar0
> +; CHECK-DAG: xxswapd 1, 1
> ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
> -; CHECK-DAG: xxspltd [[REG2:[0-9]+]]
> -; CHECK: xxpermdi [[REG3:[0-9]+]], [[REG2]], [[REG1]], 1
> -; CHECK: stxvd2x [[REG3]]
> +; CHECK: xxmrgld [[REG2:[0-9]+]], 1, [[REG1]]
> +; CHECK: stxvd2x [[REG2]]
> ; CHECK-NOT: xxswapd
>
> define void @bar1(double %y) {
> @@ -30,10 +30,10 @@ entry:
> }
>
> ; CHECK-LABEL: @bar1
> +; CHECK-DAG: xxswapd 1, 1
> ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
> -; CHECK-DAG: xxspltd [[REG2:[0-9]+]]
> -; CHECK: xxmrghd [[REG3:[0-9]+]], [[REG1]], [[REG2]]
> -; CHECK: stxvd2x [[REG3]]
> +; CHECK: xxpermdi [[REG2:[0-9]+]], [[REG1]], 1, 1
> +; CHECK: stxvd2x [[REG2]]
> ; CHECK-NOT: xxswapd
>
> define void @baz0() {
>
> diff --git a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
> b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
> index 23738eaa95a7..4437e6799269 100644
> --- a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
> +++ b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
> @@ -27,7 +27,7 @@ define void @bar0() {
> ; CHECK: ld r3, .LC0 at toc@l(r3)
> ; CHECK: addis r3, r2, .LC2 at toc@ha
> ; CHECK: ld r3, .LC2 at toc@l(r3)
> -; CHECK: xxpermdi vs0, vs0, vs1, 1
> +; CHECK: xxmrgld vs0, vs0, vs1
> ; CHECK: stxvd2x vs0, 0, r3
> ; CHECK: blr
> ;
> @@ -38,7 +38,7 @@ define void @bar0() {
> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC1 at toc@ha
> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC2 at toc@ha
> ; CHECK-P9-NOVECTOR: ld r3, .LC2 at toc@l(r3)
> -; CHECK-P9-NOVECTOR: xxpermdi vs0, vs1, vs0, 1
> +; CHECK-P9-NOVECTOR: xxmrgld vs0, vs1, vs0
> ; CHECK-P9-NOVECTOR: stxvd2x vs0, 0, r3
> ; CHECK-P9-NOVECTOR: blr
> ;
> @@ -72,7 +72,7 @@ define void @bar1() {
> ; CHECK: ld r3, .LC0 at toc@l(r3)
> ; CHECK: addis r3, r2, .LC2 at toc@ha
> ; CHECK: ld r3, .LC2 at toc@l(r3)
> -; CHECK: xxmrghd vs0, vs1, vs0
> +; CHECK: xxpermdi vs0, vs1, vs0, 1
> ; CHECK: stxvd2x vs0, 0, r3
> ; CHECK: blr
> ;
> @@ -83,7 +83,7 @@ define void @bar1() {
> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC1 at toc@ha
> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC2 at toc@ha
> ; CHECK-P9-NOVECTOR: ld r3, .LC2 at toc@l(r3)
> -; CHECK-P9-NOVECTOR: xxmrghd vs0, vs0, vs1
> +; CHECK-P9-NOVECTOR: xxpermdi vs0, vs0, vs1, 1
> ; CHECK-P9-NOVECTOR: stxvd2x vs0, 0, r3
> ; CHECK-P9-NOVECTOR: blr
> ;
>
> diff --git a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
> b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
> index d853a420dcd8..4bb3730aa043 100644
> --- a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
> +++ b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
> @@ -13,53 +13,50 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, 21399
> -; P9LE-NEXT: ori r5, r5, 33437
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r4, r4, r5
> -; P9LE-NEXT: lis r5, 16727
> -; P9LE-NEXT: ori r5, r5, 2287
> +; P9LE-NEXT: lis r4, 21399
> +; P9LE-NEXT: ori r4, r4, 33437
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: mulhwu r4, r3, r4
> ; P9LE-NEXT: srwi r4, r4, 5
> ; P9LE-NEXT: mulli r4, r4, 98
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, 16727
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r4, r4, r5
> -; P9LE-NEXT: lis r5, 8456
> -; P9LE-NEXT: ori r5, r5, 16913
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: ori r4, r4, 2287
> +; P9LE-NEXT: mulhwu r4, r3, r4
> ; P9LE-NEXT: srwi r4, r4, 8
> ; P9LE-NEXT: mulli r4, r4, 1003
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: rlwinm r4, r3, 30, 18, 31
> -; P9LE-NEXT: mulhwu r4, r4, r5
> -; P9LE-NEXT: lis r5, 22765
> -; P9LE-NEXT: ori r5, r5, 8969
> -; P9LE-NEXT: srwi r4, r4, 2
> -; P9LE-NEXT: mulli r4, r4, 124
> -; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r5, 8456
> +; P9LE-NEXT: ori r5, r5, 16913
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: clrlwi r4, r3, 16
> +; P9LE-NEXT: rlwinm r3, r3, 30, 18, 31
> +; P9LE-NEXT: mulhwu r3, r3, r5
> +; P9LE-NEXT: srwi r3, r3, 2
> +; P9LE-NEXT: mulli r3, r3, 124
> +; P9LE-NEXT: sub r3, r4, r3
> +; P9LE-NEXT: lis r4, 22765
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 0
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r5, r4, r5
> -; P9LE-NEXT: sub r4, r4, r5
> -; P9LE-NEXT: srwi r4, r4, 1
> -; P9LE-NEXT: add r4, r4, r5
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: ori r4, r4, 8969
> +; P9LE-NEXT: mulhwu r4, r3, r4
> +; P9LE-NEXT: sub r5, r3, r4
> +; P9LE-NEXT: srwi r5, r5, 1
> +; P9LE-NEXT: add r4, r5, r4
> ; P9LE-NEXT: srwi r4, r4, 6
> ; P9LE-NEXT: mulli r4, r4, 95
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v2, v4, v2
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v4, v2
> ; P9LE-NEXT: vmrglw v2, v3, v2
> ; P9LE-NEXT: blr
> ;
> @@ -123,50 +120,47 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) {
> ; P8LE-NEXT: xxswapd vs0, v2
> ; P8LE-NEXT: lis r3, 22765
> ; P8LE-NEXT: lis r7, 21399
> -; P8LE-NEXT: lis r10, 16727
> +; P8LE-NEXT: lis r9, 16727
> +; P8LE-NEXT: lis r10, 8456
> ; P8LE-NEXT: ori r3, r3, 8969
> ; P8LE-NEXT: ori r7, r7, 33437
> -; P8LE-NEXT: ori r10, r10, 2287
> +; P8LE-NEXT: ori r9, r9, 2287
> +; P8LE-NEXT: ori r10, r10, 16913
> ; P8LE-NEXT: mffprd r4, f0
> ; P8LE-NEXT: clrldi r6, r4, 48
> ; P8LE-NEXT: rldicl r5, r4, 32, 48
> -; P8LE-NEXT: clrlwi r9, r6, 16
> +; P8LE-NEXT: clrlwi r6, r6, 16
> ; P8LE-NEXT: rldicl r8, r4, 16, 48
> -; P8LE-NEXT: clrlwi r11, r5, 16
> -; P8LE-NEXT: mulhwu r3, r9, r3
> -; P8LE-NEXT: clrlwi r12, r8, 16
> -; P8LE-NEXT: mulhwu r7, r11, r7
> -; P8LE-NEXT: lis r11, 8456
> +; P8LE-NEXT: clrlwi r5, r5, 16
> +; P8LE-NEXT: mulhwu r3, r6, r3
> ; P8LE-NEXT: rldicl r4, r4, 48, 48
> -; P8LE-NEXT: mulhwu r10, r12, r10
> -; P8LE-NEXT: ori r11, r11, 16913
> -; P8LE-NEXT: rlwinm r12, r4, 30, 18, 31
> -; P8LE-NEXT: mulhwu r11, r12, r11
> -; P8LE-NEXT: sub r9, r9, r3
> -; P8LE-NEXT: srwi r9, r9, 1
> +; P8LE-NEXT: clrlwi r8, r8, 16
> +; P8LE-NEXT: rlwinm r11, r4, 30, 18, 31
> +; P8LE-NEXT: mulhwu r7, r5, r7
> +; P8LE-NEXT: clrlwi r4, r4, 16
> +; P8LE-NEXT: mulhwu r9, r8, r9
> +; P8LE-NEXT: mulhwu r10, r11, r10
> +; P8LE-NEXT: sub r11, r6, r3
> +; P8LE-NEXT: srwi r11, r11, 1
> ; P8LE-NEXT: srwi r7, r7, 5
> -; P8LE-NEXT: add r3, r9, r3
> -; P8LE-NEXT: srwi r9, r10, 8
> +; P8LE-NEXT: add r3, r11, r3
> +; P8LE-NEXT: srwi r9, r9, 8
> +; P8LE-NEXT: srwi r10, r10, 2
> ; P8LE-NEXT: srwi r3, r3, 6
> ; P8LE-NEXT: mulli r7, r7, 98
> -; P8LE-NEXT: srwi r10, r11, 2
> ; P8LE-NEXT: mulli r9, r9, 1003
> ; P8LE-NEXT: mulli r3, r3, 95
> ; P8LE-NEXT: mulli r10, r10, 124
> ; P8LE-NEXT: sub r5, r5, r7
> ; P8LE-NEXT: sub r7, r8, r9
> -; P8LE-NEXT: mtfprd f0, r5
> ; P8LE-NEXT: sub r3, r6, r3
> +; P8LE-NEXT: mtvsrd v2, r5
> ; P8LE-NEXT: sub r4, r4, r10
> -; P8LE-NEXT: mtfprd f1, r7
> -; P8LE-NEXT: mtfprd f2, r3
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: mtfprd f3, r4
> -; P8LE-NEXT: xxswapd v3, vs1
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: xxswapd v5, vs3
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: vmrglh v3, v5, v4
> +; P8LE-NEXT: mtvsrd v3, r7
> +; P8LE-NEXT: mtvsrd v4, r3
> +; P8LE-NEXT: mtvsrd v5, r4
> +; P8LE-NEXT: vmrghh v2, v3, v2
> +; P8LE-NEXT: vmrghh v3, v5, v4
> ; P8LE-NEXT: vmrglw v2, v2, v3
> ; P8LE-NEXT: blr
> ;
> @@ -230,56 +224,52 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 0
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, 22765
> -; P9LE-NEXT: ori r5, r5, 8969
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r6, r4, r5
> -; P9LE-NEXT: sub r4, r4, r6
> -; P9LE-NEXT: srwi r4, r4, 1
> -; P9LE-NEXT: add r4, r4, r6
> -; P9LE-NEXT: srwi r4, r4, 6
> -; P9LE-NEXT: mulli r4, r4, 95
> -; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, 22765
> +; P9LE-NEXT: ori r4, r4, 8969
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: mulhwu r5, r3, r4
> +; P9LE-NEXT: sub r6, r3, r5
> +; P9LE-NEXT: srwi r6, r6, 1
> +; P9LE-NEXT: add r5, r6, r5
> +; P9LE-NEXT: srwi r5, r5, 6
> +; P9LE-NEXT: mulli r5, r5, 95
> +; P9LE-NEXT: sub r3, r3, r5
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r6, r4, r5
> -; P9LE-NEXT: sub r4, r4, r6
> -; P9LE-NEXT: srwi r4, r4, 1
> -; P9LE-NEXT: add r4, r4, r6
> -; P9LE-NEXT: srwi r4, r4, 6
> -; P9LE-NEXT: mulli r4, r4, 95
> -; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: mulhwu r5, r3, r4
> +; P9LE-NEXT: sub r6, r3, r5
> +; P9LE-NEXT: srwi r6, r6, 1
> +; P9LE-NEXT: add r5, r6, r5
> +; P9LE-NEXT: srwi r5, r5, 6
> +; P9LE-NEXT: mulli r5, r5, 95
> +; P9LE-NEXT: sub r3, r3, r5
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r6, r4, r5
> -; P9LE-NEXT: sub r4, r4, r6
> -; P9LE-NEXT: srwi r4, r4, 1
> -; P9LE-NEXT: add r4, r4, r6
> -; P9LE-NEXT: srwi r4, r4, 6
> -; P9LE-NEXT: mulli r4, r4, 95
> -; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: mulhwu r5, r3, r4
> +; P9LE-NEXT: sub r6, r3, r5
> +; P9LE-NEXT: srwi r6, r6, 1
> +; P9LE-NEXT: add r5, r6, r5
> +; P9LE-NEXT: srwi r5, r5, 6
> +; P9LE-NEXT: mulli r5, r5, 95
> +; P9LE-NEXT: sub r3, r3, r5
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r5, r4, r5
> -; P9LE-NEXT: sub r4, r4, r5
> -; P9LE-NEXT: srwi r4, r4, 1
> -; P9LE-NEXT: add r4, r4, r5
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: mulhwu r4, r3, r4
> +; P9LE-NEXT: sub r5, r3, r4
> +; P9LE-NEXT: srwi r5, r5, 1
> +; P9LE-NEXT: add r4, r5, r4
> ; P9LE-NEXT: srwi r4, r4, 6
> ; P9LE-NEXT: mulli r4, r4, 95
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v2, v4
> ; P9LE-NEXT: vmrglw v2, v2, v3
> ; P9LE-NEXT: blr
> ;
> @@ -344,36 +334,34 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
> ; P8LE: # %bb.0:
> ; P8LE-NEXT: xxswapd vs0, v2
> ; P8LE-NEXT: lis r3, 22765
> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill
> ; P8LE-NEXT: ori r3, r3, 8969
> ; P8LE-NEXT: mffprd r4, f0
> ; P8LE-NEXT: clrldi r5, r4, 48
> ; P8LE-NEXT: rldicl r6, r4, 48, 48
> -; P8LE-NEXT: clrlwi r8, r5, 16
> +; P8LE-NEXT: clrlwi r5, r5, 16
> ; P8LE-NEXT: rldicl r7, r4, 32, 48
> -; P8LE-NEXT: clrlwi r9, r6, 16
> +; P8LE-NEXT: clrlwi r6, r6, 16
> +; P8LE-NEXT: mulhwu r8, r5, r3
> ; P8LE-NEXT: rldicl r4, r4, 16, 48
> -; P8LE-NEXT: mulhwu r10, r8, r3
> -; P8LE-NEXT: clrlwi r11, r7, 16
> -; P8LE-NEXT: clrlwi r0, r4, 16
> -; P8LE-NEXT: mulhwu r12, r9, r3
> -; P8LE-NEXT: mulhwu r30, r11, r3
> -; P8LE-NEXT: mulhwu r3, r0, r3
> -; P8LE-NEXT: sub r8, r8, r10
> -; P8LE-NEXT: srwi r8, r8, 1
> -; P8LE-NEXT: sub r9, r9, r12
> -; P8LE-NEXT: add r8, r8, r10
> -; P8LE-NEXT: sub r10, r11, r30
> -; P8LE-NEXT: sub r11, r0, r3
> -; P8LE-NEXT: srwi r9, r9, 1
> -; P8LE-NEXT: srwi r10, r10, 1
> +; P8LE-NEXT: clrlwi r7, r7, 16
> +; P8LE-NEXT: mulhwu r9, r6, r3
> +; P8LE-NEXT: clrlwi r4, r4, 16
> +; P8LE-NEXT: mulhwu r10, r7, r3
> +; P8LE-NEXT: mulhwu r3, r4, r3
> +; P8LE-NEXT: sub r11, r5, r8
> +; P8LE-NEXT: sub r12, r6, r9
> +; P8LE-NEXT: srwi r11, r11, 1
> +; P8LE-NEXT: add r8, r11, r8
> +; P8LE-NEXT: sub r11, r7, r10
> +; P8LE-NEXT: srwi r12, r12, 1
> +; P8LE-NEXT: add r9, r12, r9
> +; P8LE-NEXT: sub r12, r4, r3
> ; P8LE-NEXT: srwi r11, r11, 1
> -; P8LE-NEXT: add r9, r9, r12
> ; P8LE-NEXT: srwi r8, r8, 6
> -; P8LE-NEXT: add r10, r10, r30
> -; P8LE-NEXT: add r3, r11, r3
> +; P8LE-NEXT: add r10, r11, r10
> +; P8LE-NEXT: srwi r11, r12, 1
> ; P8LE-NEXT: srwi r9, r9, 6
> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
> +; P8LE-NEXT: add r3, r11, r3
> ; P8LE-NEXT: mulli r8, r8, 95
> ; P8LE-NEXT: srwi r10, r10, 6
> ; P8LE-NEXT: srwi r3, r3, 6
> @@ -382,18 +370,14 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
> ; P8LE-NEXT: mulli r3, r3, 95
> ; P8LE-NEXT: sub r5, r5, r8
> ; P8LE-NEXT: sub r6, r6, r9
> -; P8LE-NEXT: mtfprd f0, r5
> +; P8LE-NEXT: mtvsrd v2, r5
> ; P8LE-NEXT: sub r5, r7, r10
> ; P8LE-NEXT: sub r3, r4, r3
> -; P8LE-NEXT: mtfprd f1, r6
> -; P8LE-NEXT: mtfprd f2, r5
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: mtfprd f3, r3
> -; P8LE-NEXT: xxswapd v3, vs1
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: xxswapd v5, vs3
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: vmrglh v3, v5, v4
> +; P8LE-NEXT: mtvsrd v3, r6
> +; P8LE-NEXT: mtvsrd v4, r5
> +; P8LE-NEXT: mtvsrd v5, r3
> +; P8LE-NEXT: vmrghh v2, v3, v2
> +; P8LE-NEXT: vmrghh v3, v5, v4
> ; P8LE-NEXT: vmrglw v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> @@ -461,67 +445,59 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 0
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, 22765
> -; P9LE-NEXT: ori r5, r5, 8969
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r6, r4, r5
> -; P9LE-NEXT: sub r4, r4, r6
> -; P9LE-NEXT: srwi r4, r4, 1
> -; P9LE-NEXT: add r4, r4, r6
> -; P9LE-NEXT: srwi r4, r4, 6
> -; P9LE-NEXT: mulli r6, r4, 95
> +; P9LE-NEXT: lis r4, 22765
> +; P9LE-NEXT: ori r4, r4, 8969
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: mulhwu r5, r3, r4
> +; P9LE-NEXT: sub r6, r3, r5
> +; P9LE-NEXT: srwi r6, r6, 1
> +; P9LE-NEXT: add r5, r6, r5
> +; P9LE-NEXT: srwi r5, r5, 6
> +; P9LE-NEXT: mulli r6, r5, 95
> ; P9LE-NEXT: sub r3, r3, r6
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: clrlwi r6, r3, 16
> -; P9LE-NEXT: mulhwu r7, r6, r5
> +; P9LE-NEXT: mulhwu r7, r6, r4
> ; P9LE-NEXT: sub r6, r6, r7
> ; P9LE-NEXT: srwi r6, r6, 1
> ; P9LE-NEXT: add r6, r6, r7
> ; P9LE-NEXT: srwi r6, r6, 6
> ; P9LE-NEXT: mulli r7, r6, 95
> ; P9LE-NEXT: sub r3, r3, r7
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: clrlwi r7, r3, 16
> -; P9LE-NEXT: mulhwu r8, r7, r5
> +; P9LE-NEXT: mulhwu r8, r7, r4
> ; P9LE-NEXT: sub r7, r7, r8
> ; P9LE-NEXT: srwi r7, r7, 1
> ; P9LE-NEXT: add r7, r7, r8
> ; P9LE-NEXT: srwi r7, r7, 6
> ; P9LE-NEXT: mulli r8, r7, 95
> ; P9LE-NEXT: sub r3, r3, r8
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: clrlwi r8, r3, 16
> -; P9LE-NEXT: mulhwu r5, r8, r5
> -; P9LE-NEXT: sub r8, r8, r5
> +; P9LE-NEXT: mulhwu r4, r8, r4
> +; P9LE-NEXT: sub r8, r8, r4
> ; P9LE-NEXT: srwi r8, r8, 1
> -; P9LE-NEXT: add r5, r8, r5
> -; P9LE-NEXT: srwi r5, r5, 6
> -; P9LE-NEXT: mulli r8, r5, 95
> +; P9LE-NEXT: add r4, r8, r4
> +; P9LE-NEXT: srwi r4, r4, 6
> +; P9LE-NEXT: mulli r8, r4, 95
> ; P9LE-NEXT: sub r3, r3, r8
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: mtfprd f0, r4
> -; P9LE-NEXT: vmrglh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v2, v4
> +; P9LE-NEXT: mtvsrd v4, r6
> ; P9LE-NEXT: vmrglw v2, v2, v3
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r6
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r7
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r5
> -; P9LE-NEXT: xxswapd v5, vs0
> -; P9LE-NEXT: vmrglh v4, v5, v4
> +; P9LE-NEXT: mtvsrd v3, r5
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: mtvsrd v4, r7
> +; P9LE-NEXT: mtvsrd v5, r4
> +; P9LE-NEXT: vmrghh v4, v5, v4
> ; P9LE-NEXT: vmrglw v3, v4, v3
> ; P9LE-NEXT: vadduhm v2, v2, v3
> ; P9LE-NEXT: blr
> @@ -598,69 +574,61 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) {
> ; P8LE-LABEL: combine_urem_udiv:
> ; P8LE: # %bb.0:
> ; P8LE-NEXT: xxswapd vs0, v2
> -; P8LE-NEXT: lis r4, 22765
> +; P8LE-NEXT: lis r3, 22765
> ; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill
> -; P8LE-NEXT: ori r4, r4, 8969
> -; P8LE-NEXT: mffprd r5, f0
> -; P8LE-NEXT: clrldi r3, r5, 48
> -; P8LE-NEXT: rldicl r6, r5, 48, 48
> -; P8LE-NEXT: clrlwi r8, r3, 16
> -; P8LE-NEXT: rldicl r7, r5, 32, 48
> -; P8LE-NEXT: clrlwi r9, r6, 16
> -; P8LE-NEXT: mulhwu r10, r8, r4
> -; P8LE-NEXT: clrlwi r11, r7, 16
> -; P8LE-NEXT: rldicl r5, r5, 16, 48
> -; P8LE-NEXT: mulhwu r12, r9, r4
> -; P8LE-NEXT: mulhwu r0, r11, r4
> -; P8LE-NEXT: clrlwi r30, r5, 16
> -; P8LE-NEXT: mulhwu r4, r30, r4
> -; P8LE-NEXT: sub r8, r8, r10
> +; P8LE-NEXT: ori r3, r3, 8969
> +; P8LE-NEXT: mffprd r4, f0
> +; P8LE-NEXT: clrldi r5, r4, 48
> +; P8LE-NEXT: rldicl r6, r4, 48, 48
> +; P8LE-NEXT: clrlwi r5, r5, 16
> +; P8LE-NEXT: clrlwi r8, r6, 16
> +; P8LE-NEXT: rldicl r7, r4, 32, 48
> +; P8LE-NEXT: rldicl r4, r4, 16, 48
> +; P8LE-NEXT: mulhwu r9, r5, r3
> +; P8LE-NEXT: mulhwu r11, r8, r3
> +; P8LE-NEXT: clrlwi r10, r7, 16
> +; P8LE-NEXT: clrlwi r12, r4, 16
> +; P8LE-NEXT: mulhwu r0, r10, r3
> +; P8LE-NEXT: mulhwu r3, r12, r3
> +; P8LE-NEXT: sub r30, r5, r9
> +; P8LE-NEXT: sub r8, r8, r11
> +; P8LE-NEXT: srwi r30, r30, 1
> ; P8LE-NEXT: srwi r8, r8, 1
> -; P8LE-NEXT: sub r9, r9, r12
> -; P8LE-NEXT: add r8, r8, r10
> -; P8LE-NEXT: sub r10, r11, r0
> -; P8LE-NEXT: srwi r9, r9, 1
> +; P8LE-NEXT: sub r10, r10, r0
> +; P8LE-NEXT: add r9, r30, r9
> +; P8LE-NEXT: add r8, r8, r11
> +; P8LE-NEXT: sub r11, r12, r3
> ; P8LE-NEXT: srwi r10, r10, 1
> -; P8LE-NEXT: sub r11, r30, r4
> -; P8LE-NEXT: add r9, r9, r12
> -; P8LE-NEXT: srwi r8, r8, 6
> ; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
> -; P8LE-NEXT: add r10, r10, r0
> -; P8LE-NEXT: srwi r11, r11, 1
> ; P8LE-NEXT: srwi r9, r9, 6
> -; P8LE-NEXT: mtfprd f0, r8
> -; P8LE-NEXT: mulli r12, r8, 95
> +; P8LE-NEXT: srwi r11, r11, 1
> +; P8LE-NEXT: srwi r8, r8, 6
> +; P8LE-NEXT: add r10, r10, r0
> +; P8LE-NEXT: mulli r12, r9, 95
> +; P8LE-NEXT: add r3, r11, r3
> +; P8LE-NEXT: mtvsrd v2, r9
> ; P8LE-NEXT: srwi r10, r10, 6
> -; P8LE-NEXT: add r4, r11, r4
> -; P8LE-NEXT: mtfprd f1, r9
> -; P8LE-NEXT: mulli r8, r9, 95
> -; P8LE-NEXT: mulli r9, r10, 95
> -; P8LE-NEXT: srwi r4, r4, 6
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: mtfprd f2, r10
> -; P8LE-NEXT: mtfprd f3, r4
> -; P8LE-NEXT: mulli r4, r4, 95
> -; P8LE-NEXT: xxswapd v3, vs1
> -; P8LE-NEXT: xxswapd v1, vs2
> -; P8LE-NEXT: sub r3, r3, r12
> -; P8LE-NEXT: xxswapd v6, vs3
> -; P8LE-NEXT: mtfprd f0, r3
> -; P8LE-NEXT: sub r3, r7, r9
> -; P8LE-NEXT: sub r6, r6, r8
> -; P8LE-NEXT: mtfprd f4, r3
> -; P8LE-NEXT: sub r3, r5, r4
> -; P8LE-NEXT: mtfprd f1, r6
> -; P8LE-NEXT: mtfprd f5, r3
> -; P8LE-NEXT: xxswapd v5, vs4
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: xxswapd v3, vs0
> -; P8LE-NEXT: xxswapd v4, vs1
> -; P8LE-NEXT: xxswapd v0, vs5
> -; P8LE-NEXT: vmrglh v3, v4, v3
> -; P8LE-NEXT: vmrglh v4, v0, v5
> -; P8LE-NEXT: vmrglh v5, v6, v1
> -; P8LE-NEXT: vmrglw v3, v4, v3
> -; P8LE-NEXT: vmrglw v2, v5, v2
> +; P8LE-NEXT: mulli r9, r8, 95
> +; P8LE-NEXT: srwi r3, r3, 6
> +; P8LE-NEXT: mtvsrd v3, r8
> +; P8LE-NEXT: mulli r8, r10, 95
> +; P8LE-NEXT: mtvsrd v4, r10
> +; P8LE-NEXT: mulli r10, r3, 95
> +; P8LE-NEXT: vmrghh v2, v3, v2
> +; P8LE-NEXT: sub r5, r5, r12
> +; P8LE-NEXT: sub r6, r6, r9
> +; P8LE-NEXT: mtvsrd v3, r5
> +; P8LE-NEXT: mtvsrd v5, r6
> +; P8LE-NEXT: sub r5, r7, r8
> +; P8LE-NEXT: sub r4, r4, r10
> +; P8LE-NEXT: mtvsrd v0, r5
> +; P8LE-NEXT: mtvsrd v1, r4
> +; P8LE-NEXT: vmrghh v3, v5, v3
> +; P8LE-NEXT: mtvsrd v5, r3
> +; P8LE-NEXT: vmrghh v0, v1, v0
> +; P8LE-NEXT: vmrghh v4, v5, v4
> +; P8LE-NEXT: vmrglw v3, v0, v3
> +; P8LE-NEXT: vmrglw v2, v4, v2
> ; P8LE-NEXT: vadduhm v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> @@ -742,34 +710,30 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
> i16> %x) {
> ; P9LE-NEXT: li r3, 0
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: clrlwi r3, r3, 26
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: clrlwi r3, r3, 27
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, 22765
> -; P9LE-NEXT: ori r5, r5, 8969
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r5, r4, r5
> -; P9LE-NEXT: sub r4, r4, r5
> -; P9LE-NEXT: srwi r4, r4, 1
> -; P9LE-NEXT: add r4, r4, r5
> +; P9LE-NEXT: lis r4, 22765
> +; P9LE-NEXT: ori r4, r4, 8969
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: mulhwu r4, r3, r4
> +; P9LE-NEXT: sub r5, r3, r4
> +; P9LE-NEXT: srwi r5, r5, 1
> +; P9LE-NEXT: add r4, r5, r4
> ; P9LE-NEXT: srwi r4, r4, 6
> ; P9LE-NEXT: mulli r4, r4, 95
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> ; P9LE-NEXT: clrlwi r3, r3, 29
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v2, v4, v2
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: vmrghh v2, v4, v2
> ; P9LE-NEXT: vmrglw v2, v2, v3
> ; P9LE-NEXT: blr
> ;
> @@ -817,9 +781,9 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
> i16> %x) {
> ; P8LE-NEXT: mffprd r4, f0
> ; P8LE-NEXT: rldicl r5, r4, 16, 48
> ; P8LE-NEXT: rldicl r7, r4, 48, 48
> -; P8LE-NEXT: clrlwi r6, r5, 16
> -; P8LE-NEXT: mulhwu r3, r6, r3
> -; P8LE-NEXT: sub r6, r6, r3
> +; P8LE-NEXT: clrlwi r5, r5, 16
> +; P8LE-NEXT: mulhwu r3, r5, r3
> +; P8LE-NEXT: sub r6, r5, r3
> ; P8LE-NEXT: srwi r6, r6, 1
> ; P8LE-NEXT: add r3, r6, r3
> ; P8LE-NEXT: clrldi r6, r4, 48
> @@ -827,19 +791,15 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
> i16> %x) {
> ; P8LE-NEXT: clrlwi r6, r6, 26
> ; P8LE-NEXT: mulli r3, r3, 95
> ; P8LE-NEXT: rldicl r4, r4, 32, 48
> -; P8LE-NEXT: mtfprd f0, r6
> +; P8LE-NEXT: mtvsrd v2, r6
> ; P8LE-NEXT: clrlwi r6, r7, 27
> ; P8LE-NEXT: clrlwi r4, r4, 29
> -; P8LE-NEXT: mtfprd f1, r6
> -; P8LE-NEXT: mtfprd f3, r4
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: xxswapd v3, vs1
> +; P8LE-NEXT: mtvsrd v3, r6
> +; P8LE-NEXT: mtvsrd v5, r4
> +; P8LE-NEXT: vmrghh v2, v3, v2
> ; P8LE-NEXT: sub r3, r5, r3
> -; P8LE-NEXT: xxswapd v5, vs3
> -; P8LE-NEXT: mtfprd f2, r3
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: vmrglh v3, v4, v5
> +; P8LE-NEXT: mtvsrd v4, r3
> +; P8LE-NEXT: vmrghh v3, v4, v5
> ; P8LE-NEXT: vmrglw v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> @@ -885,40 +845,39 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) {
> ; P9LE: # %bb.0:
> ; P9LE-NEXT: li r3, 4
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: lis r5, -19946
> -; P9LE-NEXT: ori r5, r5, 17097
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r4, r4, r5
> -; P9LE-NEXT: lis r5, 24749
> -; P9LE-NEXT: ori r5, r5, 47143
> +; P9LE-NEXT: lis r4, -19946
> +; P9LE-NEXT: ori r4, r4, 17097
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: mulhwu r4, r3, r4
> ; P9LE-NEXT: srwi r4, r4, 4
> ; P9LE-NEXT: mulli r4, r4, 23
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: lis r4, 24749
> +; P9LE-NEXT: mtvsrd v3, r3
> ; P9LE-NEXT: li r3, 6
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: clrlwi r4, r3, 16
> -; P9LE-NEXT: mulhwu r4, r4, r5
> -; P9LE-NEXT: lis r5, -14230
> -; P9LE-NEXT: ori r5, r5, 30865
> +; P9LE-NEXT: clrlwi r3, r3, 16
> +; P9LE-NEXT: ori r4, r4, 47143
> +; P9LE-NEXT: mulhwu r4, r3, r4
> ; P9LE-NEXT: srwi r4, r4, 11
> ; P9LE-NEXT: mulli r4, r4, 5423
> ; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v3, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> +; P9LE-NEXT: mtvsrd v4, r3
> ; P9LE-NEXT: li r3, 2
> ; P9LE-NEXT: vextuhrx r3, r3, v2
> -; P9LE-NEXT: rlwinm r4, r3, 31, 17, 31
> -; P9LE-NEXT: mulhwu r4, r4, r5
> -; P9LE-NEXT: srwi r4, r4, 8
> -; P9LE-NEXT: mulli r4, r4, 654
> -; P9LE-NEXT: sub r3, r3, r4
> -; P9LE-NEXT: xxswapd v4, vs0
> -; P9LE-NEXT: mtfprd f0, r3
> -; P9LE-NEXT: xxswapd v2, vs0
> -; P9LE-NEXT: vmrglh v3, v4, v3
> -; P9LE-NEXT: xxlxor v4, v4, v4
> -; P9LE-NEXT: vmrglh v2, v2, v4
> +; P9LE-NEXT: lis r5, -14230
> +; P9LE-NEXT: ori r5, r5, 30865
> +; P9LE-NEXT: vmrghh v3, v4, v3
> +; P9LE-NEXT: clrlwi r4, r3, 16
> +; P9LE-NEXT: rlwinm r3, r3, 31, 17, 31
> +; P9LE-NEXT: mulhwu r3, r3, r5
> +; P9LE-NEXT: srwi r3, r3, 8
> +; P9LE-NEXT: mulli r3, r3, 654
> +; P9LE-NEXT: sub r3, r4, r3
> +; P9LE-NEXT: mtvsrd v2, r3
> +; P9LE-NEXT: li r3, 0
> +; P9LE-NEXT: mtvsrd v4, r3
> +; P9LE-NEXT: vmrghh v2, v2, v4
> ; P9LE-NEXT: vmrglw v2, v3, v2
> ; P9LE-NEXT: blr
> ;
> @@ -969,41 +928,40 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) {
> ; P8LE-LABEL: dont_fold_urem_one:
> ; P8LE: # %bb.0:
> ; P8LE-NEXT: xxswapd vs0, v2
> -; P8LE-NEXT: lis r3, -19946
> -; P8LE-NEXT: lis r7, 24749
> -; P8LE-NEXT: lis r9, -14230
> -; P8LE-NEXT: xxlxor v5, v5, v5
> -; P8LE-NEXT: ori r3, r3, 17097
> -; P8LE-NEXT: ori r7, r7, 47143
> -; P8LE-NEXT: ori r9, r9, 30865
> +; P8LE-NEXT: lis r3, -14230
> +; P8LE-NEXT: lis r7, -19946
> +; P8LE-NEXT: lis r9, 24749
> +; P8LE-NEXT: ori r3, r3, 30865
> +; P8LE-NEXT: ori r7, r7, 17097
> ; P8LE-NEXT: mffprd r4, f0
> -; P8LE-NEXT: rldicl r5, r4, 32, 48
> -; P8LE-NEXT: rldicl r6, r4, 16, 48
> -; P8LE-NEXT: clrlwi r8, r5, 16
> -; P8LE-NEXT: rldicl r4, r4, 48, 48
> +; P8LE-NEXT: rldicl r5, r4, 48, 48
> +; P8LE-NEXT: rldicl r6, r4, 32, 48
> +; P8LE-NEXT: rldicl r4, r4, 16, 48
> +; P8LE-NEXT: rlwinm r8, r5, 31, 17, 31
> +; P8LE-NEXT: clrlwi r6, r6, 16
> +; P8LE-NEXT: clrlwi r5, r5, 16
> ; P8LE-NEXT: mulhwu r3, r8, r3
> -; P8LE-NEXT: clrlwi r8, r6, 16
> -; P8LE-NEXT: mulhwu r7, r8, r7
> -; P8LE-NEXT: rlwinm r8, r4, 31, 17, 31
> -; P8LE-NEXT: mulhwu r8, r8, r9
> -; P8LE-NEXT: srwi r3, r3, 4
> -; P8LE-NEXT: srwi r7, r7, 11
> -; P8LE-NEXT: mulli r3, r3, 23
> -; P8LE-NEXT: srwi r8, r8, 8
> -; P8LE-NEXT: mulli r7, r7, 5423
> -; P8LE-NEXT: mulli r8, r8, 654
> +; P8LE-NEXT: ori r8, r9, 47143
> +; P8LE-NEXT: clrlwi r4, r4, 16
> +; P8LE-NEXT: li r9, 0
> +; P8LE-NEXT: mulhwu r7, r6, r7
> +; P8LE-NEXT: mulhwu r8, r4, r8
> +; P8LE-NEXT: mtvsrd v2, r9
> +; P8LE-NEXT: srwi r3, r3, 8
> +; P8LE-NEXT: srwi r7, r7, 4
> +; P8LE-NEXT: mulli r3, r3, 654
> +; P8LE-NEXT: srwi r8, r8, 11
> +; P8LE-NEXT: mulli r7, r7, 23
> +; P8LE-NEXT: mulli r8, r8, 5423
> ; P8LE-NEXT: sub r3, r5, r3
> ; P8LE-NEXT: sub r5, r6, r7
> -; P8LE-NEXT: mtfprd f0, r3
> +; P8LE-NEXT: mtvsrd v3, r3
> ; P8LE-NEXT: sub r3, r4, r8
> -; P8LE-NEXT: mtfprd f1, r5
> -; P8LE-NEXT: mtfprd f2, r3
> -; P8LE-NEXT: xxswapd v2, vs0
> -; P8LE-NEXT: xxswapd v3, vs1
> -; P8LE-NEXT: xxswapd v4, vs2
> -; P8LE-NEXT: vmrglh v2, v3, v2
> -; P8LE-NEXT: vmrglh v3, v4, v5
> -; P8LE-NEXT: vmrglw v2, v2, v3
> +; P8LE-NEXT: mtvsrd v4, r5
> +; P8LE-NEXT: mtvsrd v5, r3
> +; P8LE-NEXT: vmrghh v2, v3, v2
> +; P8LE-NEXT: vmrghh v3, v5, v4
> +; P8LE-NEXT: vmrglw v2, v3, v2
> ; P8LE-NEXT: blr
> ;
> ; P8BE-LABEL: dont_fold_urem_one:
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
> index 239b38e2ec70..48b62f57c1c9 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
> @@ -20,12 +20,10 @@ define i32 @test2elt(i64 %a.coerce) local_unnamed_addr
> #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: vmrghh v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -40,13 +38,11 @@ define i32 @test2elt(i64 %a.coerce) local_unnamed_addr
> #0 {
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs1
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: li r3, 0
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -90,20 +86,16 @@ define i64 @test4elt(<4 x float> %a)
> local_unnamed_addr #1 {
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> ; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: mtfprd f3, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs2
> -; CHECK-P8-NEXT: xxswapd v5, vs3
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> -; CHECK-P8-NEXT: vmrglh v3, v4, v5
> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mtvsrd v5, r3
> +; CHECK-P8-NEXT: vmrghh v3, v4, v3
> +; CHECK-P8-NEXT: vmrghh v2, v2, v5
> +; CHECK-P8-NEXT: vmrglw v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -114,27 +106,23 @@ define i64 @test4elt(<4 x float> %a)
> local_unnamed_addr #1 {
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xscvspdpn f0, v2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglh v2, v4, v2
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> +; CHECK-P9-NEXT: vmrghh v2, v4, v2
> ; CHECK-P9-NEXT: vmrglw v2, v2, v3
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> @@ -180,59 +168,51 @@ define <8 x i16> @test8elt(<8 x float>* nocapture
> readonly) local_unnamed_addr #
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: lvx v2, 0, r3
> ; CHECK-P8-NEXT: li r4, 16
> -; CHECK-P8-NEXT: lvx v5, r3, r4
> -; CHECK-P8-NEXT: xxswapd vs1, v2
> +; CHECK-P8-NEXT: lvx v3, r3, r4
> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3
> -; CHECK-P8-NEXT: xscvspdpn f4, v5
> -; CHECK-P8-NEXT: xxswapd vs3, v5
> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xxswapd vs1, v2
> +; CHECK-P8-NEXT: xscvspdpn f2, v2
> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1
> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
> +; CHECK-P8-NEXT: xscvspdpn f3, v3
> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
> -; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: mffprwz r6, f1
> -; CHECK-P8-NEXT: mffprwz r5, f0
> -; CHECK-P8-NEXT: mtfprd f1, r6
> -; CHECK-P8-NEXT: mtfprd f0, r5
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: xscvspdpn f0, v2
> -; CHECK-P8-NEXT: mtfprd f4, r4
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v1, vs4
> -; CHECK-P8-NEXT: vmrglh v2, v4, v3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: xxswapd v5, vs2
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: xxswapd vs0, v3
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: mffprwz r3, f2
> +; CHECK-P8-NEXT: xscvdpsxws f2, f4
> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xscvdpsxws f4, f5
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghh v2, v4, v2
> +; CHECK-P8-NEXT: mffprwz r4, f2
> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v6, vs3
> -; CHECK-P8-NEXT: xxswapd v0, vs0
> -; CHECK-P8-NEXT: vmrglh v3, v3, v4
> -; CHECK-P8-NEXT: vmrglh v4, v0, v5
> -; CHECK-P8-NEXT: vmrglh v5, v1, v6
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: vmrghh v3, v3, v4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mtvsrd v5, r3
> +; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: vmrghh v5, v0, v5
> +; CHECK-P8-NEXT: mtvsrd v1, r3
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
> +; CHECK-P8-NEXT: vmrghh v4, v4, v1
> +; CHECK-P8-NEXT: vmrglw v3, v4, v5
> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> ; CHECK-P8-NEXT: blr
> ;
> @@ -244,53 +224,45 @@ define <8 x i16> @test8elt(<8 x float>* nocapture
> readonly) local_unnamed_addr #
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> ; CHECK-P9-NEXT: xxswapd vs2, vs1
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> ; CHECK-P9-NEXT: xxswapd vs1, vs0
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
> ; CHECK-P9-NEXT: blr
> @@ -363,116 +335,100 @@ define void @test16elt(<16 x i16>* noalias
> nocapture sret %agg.result, <16 x flo
> ; CHECK-P8-LABEL: test16elt:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: lvx v5, 0, r4
> -; CHECK-P8-NEXT: li r6, 32
> ; CHECK-P8-NEXT: li r5, 16
> -; CHECK-P8-NEXT: lvx v2, r4, r6
> +; CHECK-P8-NEXT: li r6, 32
> ; CHECK-P8-NEXT: lvx v3, r4, r5
> +; CHECK-P8-NEXT: lvx v2, r4, r6
> ; CHECK-P8-NEXT: li r6, 48
> -; CHECK-P8-NEXT: xscvspdpn f0, v5
> -; CHECK-P8-NEXT: xxsldwi vs1, v5, v5, 3
> +; CHECK-P8-NEXT: xxsldwi vs0, v5, v5, 3
> +; CHECK-P8-NEXT: xscvspdpn f1, v5
> ; CHECK-P8-NEXT: lvx v4, r4, r6
> -; CHECK-P8-NEXT: xscvspdpn f4, v2
> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
> -; CHECK-P8-NEXT: xscvspdpn f2, v3
> ; CHECK-P8-NEXT: xxswapd vs3, v5
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: xxswapd vs8, v3
> -; CHECK-P8-NEXT: xscvspdpn f6, v4
> +; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
> ; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 3
> +; CHECK-P8-NEXT: xxswapd vs8, v3
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: xscvspdpn f3, vs3
> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xxsldwi vs10, v2, v2, 3
> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
> +; CHECK-P8-NEXT: xscvspdpn f7, vs7
> +; CHECK-P8-NEXT: xscvspdpn f8, vs8
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: xxsldwi vs9, v3, v3, 1
> +; CHECK-P8-NEXT: xscvdpsxws f3, f3
> +; CHECK-P8-NEXT: xscvspdpn f2, v3
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xscvdpsxws f1, f5
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: xxsldwi vs0, v3, v3, 1
> +; CHECK-P8-NEXT: xscvspdpn f4, v2
> +; CHECK-P8-NEXT: xscvdpsxws f5, f7
> +; CHECK-P8-NEXT: xxsldwi vs7, v4, v4, 3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 3
> +; CHECK-P8-NEXT: xscvspdpn f6, v4
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xscvdpsxws f1, f8
> +; CHECK-P8-NEXT: xxswapd vs8, v4
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: mffprwz r4, f5
> +; CHECK-P8-NEXT: xxswapd vs5, v2
> ; CHECK-P8-NEXT: xscvspdpn f3, vs3
> -; CHECK-P8-NEXT: xxsldwi vs12, v2, v2, 1
> -; CHECK-P8-NEXT: xscvspdpn f8, vs8
> -; CHECK-P8-NEXT: xxswapd vs11, v2
> ; CHECK-P8-NEXT: xscvdpsxws f4, f4
> -; CHECK-P8-NEXT: xxswapd v2, v4
> +; CHECK-P8-NEXT: vmrghh v3, v0, v3
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xscvdpsxws f6, f6
> +; CHECK-P8-NEXT: xscvspdpn f1, vs5
> +; CHECK-P8-NEXT: xxsldwi vs5, v2, v2, 1
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f2
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghh v2, v5, v1
> +; CHECK-P8-NEXT: vmrghh v5, v6, v0
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mffprwz r4, f4
> +; CHECK-P8-NEXT: xscvdpsxws f2, f3
> +; CHECK-P8-NEXT: xscvspdpn f5, vs5
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: mffprwz r4, f6
> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> ; CHECK-P8-NEXT: xscvspdpn f7, vs7
> -; CHECK-P8-NEXT: xxsldwi vs13, v4, v4, 3
> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
> -; CHECK-P8-NEXT: xxsldwi v3, v4, v4, 1
> -; CHECK-P8-NEXT: xscvspdpn f10, vs10
> +; CHECK-P8-NEXT: mtvsrd v7, r4
> +; CHECK-P8-NEXT: mffprwz r4, f2
> +; CHECK-P8-NEXT: xxsldwi vs2, v4, v4, 1
> +; CHECK-P8-NEXT: xscvspdpn f8, vs8
> +; CHECK-P8-NEXT: xscvdpsxws f0, f5
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xscvspdpn f1, vs2
> +; CHECK-P8-NEXT: xscvdpsxws f3, f7
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: xscvdpsxws f0, f8
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xscvspdpn f9, vs9
> -; CHECK-P8-NEXT: xscvdpsxws f6, f6
> -; CHECK-P8-NEXT: xscvspdpn f12, vs12
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> +; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: vmrghh v0, v0, v7
> +; CHECK-P8-NEXT: mtvsrd v7, r4
> ; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: xscvspdpn f11, vs11
> -; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xscvspdpn v2, v2
> -; CHECK-P8-NEXT: xscvdpsxws f8, f8
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: xscvdpsxws f7, f7
> -; CHECK-P8-NEXT: mffprwz r6, f2
> -; CHECK-P8-NEXT: xscvspdpn f13, vs13
> -; CHECK-P8-NEXT: xscvspdpn v3, v3
> -; CHECK-P8-NEXT: xscvdpsxws f10, f10
> -; CHECK-P8-NEXT: mtfprd f4, r4
> +; CHECK-P8-NEXT: vmrghh v4, v8, v4
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: xscvdpsxws f9, f9
> -; CHECK-P8-NEXT: mtfprd f2, r6
> -; CHECK-P8-NEXT: mffprwz r6, f6
> -; CHECK-P8-NEXT: xscvdpsxws f12, f12
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: xscvdpsxws f11, f11
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f6, r6
> -; CHECK-P8-NEXT: mffprwz r6, f3
> -; CHECK-P8-NEXT: xscvdpsxws v2, v2
> -; CHECK-P8-NEXT: xxswapd v9, vs6
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: mffprwz r4, f8
> -; CHECK-P8-NEXT: mtfprd f3, r6
> -; CHECK-P8-NEXT: xxswapd v0, vs5
> -; CHECK-P8-NEXT: mffprwz r6, f7
> -; CHECK-P8-NEXT: xscvdpsxws f13, f13
> -; CHECK-P8-NEXT: xxswapd v5, vs3
> -; CHECK-P8-NEXT: xscvdpsxws v3, v3
> -; CHECK-P8-NEXT: mtfprd f8, r4
> -; CHECK-P8-NEXT: mffprwz r4, f10
> -; CHECK-P8-NEXT: mtfprd f7, r6
> -; CHECK-P8-NEXT: mffprwz r6, f9
> -; CHECK-P8-NEXT: mtfprd f10, r4
> -; CHECK-P8-NEXT: mffprwz r4, f12
> -; CHECK-P8-NEXT: mtfprd f9, r6
> -; CHECK-P8-NEXT: xxswapd v6, vs10
> -; CHECK-P8-NEXT: mffprwz r6, f11
> -; CHECK-P8-NEXT: mtfprd f12, r4
> -; CHECK-P8-NEXT: xxswapd v1, vs9
> -; CHECK-P8-NEXT: mfvsrwz r4, v2
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: mtfprd f11, r6
> -; CHECK-P8-NEXT: mffprwz r6, f13
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: xxswapd v7, vs11
> -; CHECK-P8-NEXT: mfvsrwz r4, v3
> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
> -; CHECK-P8-NEXT: xxswapd v4, vs7
> -; CHECK-P8-NEXT: vmrglh v2, v2, v0
> -; CHECK-P8-NEXT: xxswapd v5, vs8
> -; CHECK-P8-NEXT: xxswapd v0, vs2
> -; CHECK-P8-NEXT: mtfprd f13, r6
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> -; CHECK-P8-NEXT: vmrglh v4, v5, v4
> -; CHECK-P8-NEXT: vmrglh v5, v0, v1
> -; CHECK-P8-NEXT: xxswapd v1, vs4
> -; CHECK-P8-NEXT: vmrglh v0, v7, v6
> -; CHECK-P8-NEXT: xxswapd v6, vs12
> -; CHECK-P8-NEXT: xxswapd v7, vs13
> -; CHECK-P8-NEXT: xxswapd v10, vs1
> +; CHECK-P8-NEXT: vmrghh v1, v1, v9
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> +; CHECK-P8-NEXT: vmrghh v7, v8, v7
> +; CHECK-P8-NEXT: vmrghh v6, v6, v9
> ; CHECK-P8-NEXT: vmrglw v2, v2, v3
> -; CHECK-P8-NEXT: vmrglh v1, v1, v6
> -; CHECK-P8-NEXT: vmrglh v6, v8, v7
> -; CHECK-P8-NEXT: vmrglh v7, v9, v10
> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
> -; CHECK-P8-NEXT: vmrglw v4, v1, v0
> -; CHECK-P8-NEXT: vmrglw v5, v7, v6
> +; CHECK-P8-NEXT: vmrglw v3, v0, v5
> +; CHECK-P8-NEXT: vmrglw v4, v1, v4
> +; CHECK-P8-NEXT: vmrglw v5, v6, v7
> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> ; CHECK-P8-NEXT: stvx v2, 0, r3
> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4
> @@ -481,118 +437,102 @@ define void @test16elt(<16 x i16>* noalias
> nocapture sret %agg.result, <16 x flo
> ;
> ; CHECK-P9-LABEL: test16elt:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: lxv vs1, 0(r4)
> -; CHECK-P9-NEXT: lxv vs3, 16(r4)
> -; CHECK-P9-NEXT: xscvspdpn f5, vs1
> -; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3
> -; CHECK-P9-NEXT: xscvspdpn f8, vs3
> -; CHECK-P9-NEXT: xxswapd vs4, vs1
> -; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> +; CHECK-P9-NEXT: lxv vs2, 0(r4)
> +; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3
> +; CHECK-P9-NEXT: xxswapd vs4, vs2
> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
> ; CHECK-P9-NEXT: xscvspdpn f4, vs4
> -; CHECK-P9-NEXT: xscvdpsxws f5, f5
> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: xscvspdpn f5, vs2
> +; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> -; CHECK-P9-NEXT: xscvdpsxws f8, f8
> -; CHECK-P9-NEXT: xxsldwi vs6, vs3, vs3, 3
> -; CHECK-P9-NEXT: xxswapd vs7, vs3
> -; CHECK-P9-NEXT: xscvspdpn f6, vs6
> -; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1
> -; CHECK-P9-NEXT: xscvspdpn f7, vs7
> -; CHECK-P9-NEXT: xscvspdpn f3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: mffprwz r5, f3
> +; CHECK-P9-NEXT: lxv vs1, 16(r4)
> +; CHECK-P9-NEXT: xxsldwi vs6, vs1, vs1, 3
> +; CHECK-P9-NEXT: xxswapd vs3, vs1
> +; CHECK-P9-NEXT: mtvsrd v2, r5
> +; CHECK-P9-NEXT: mffprwz r5, f4
> +; CHECK-P9-NEXT: xscvdpsxws f4, f5
> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
> +; CHECK-P9-NEXT: mtvsrd v3, r5
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> +; CHECK-P9-NEXT: mffprwz r5, f4
> +; CHECK-P9-NEXT: xscvspdpn f4, vs6
> +; CHECK-P9-NEXT: mtvsrd v3, r5
> +; CHECK-P9-NEXT: mffprwz r5, f2
> +; CHECK-P9-NEXT: xscvspdpn f2, vs1
> +; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> ; CHECK-P9-NEXT: xscvdpsxws f4, f4
> -; CHECK-P9-NEXT: xscvdpsxws f6, f6
> -; CHECK-P9-NEXT: mffprwz r5, f5
> -; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: xscvdpsxws f7, f7
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> -; CHECK-P9-NEXT: mtfprd f5, r5
> -; CHECK-P9-NEXT: mffprwz r5, f8
> -; CHECK-P9-NEXT: mtfprd f8, r5
> -; CHECK-P9-NEXT: mffprwz r5, f2
> ; CHECK-P9-NEXT: lxv vs0, 32(r4)
> -; CHECK-P9-NEXT: xxsldwi vs9, vs0, vs0, 3
> -; CHECK-P9-NEXT: xxswapd vs10, vs0
> -; CHECK-P9-NEXT: xscvspdpn f9, vs9
> -; CHECK-P9-NEXT: xscvspdpn f10, vs10
> -; CHECK-P9-NEXT: xscvdpsxws f9, f9
> -; CHECK-P9-NEXT: xscvdpsxws f10, f10
> -; CHECK-P9-NEXT: mtfprd f2, r5
> +; CHECK-P9-NEXT: mtvsrd v4, r5
> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r5, f4
> -; CHECK-P9-NEXT: mtfprd f4, r5
> +; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: mtvsrd v4, r5
> +; CHECK-P9-NEXT: mffprwz r5, f3
> +; CHECK-P9-NEXT: xxsldwi vs3, vs0, vs0, 3
> +; CHECK-P9-NEXT: mtvsrd v5, r5
> +; CHECK-P9-NEXT: mffprwz r5, f2
> +; CHECK-P9-NEXT: xscvspdpn f2, vs3
> +; CHECK-P9-NEXT: vmrghh v4, v5, v4
> +; CHECK-P9-NEXT: mtvsrd v5, r5
> ; CHECK-P9-NEXT: mffprwz r5, f1
> -; CHECK-P9-NEXT: mtfprd f1, r5
> -; CHECK-P9-NEXT: mffprwz r5, f6
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> -; CHECK-P9-NEXT: xxswapd v3, vs4
> +; CHECK-P9-NEXT: xxswapd vs1, vs0
> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> +; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: vmrghh v5, v5, v0
> +; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrglw v3, v5, v4
> +; CHECK-P9-NEXT: mffprwz r5, f2
> ; CHECK-P9-NEXT: xscvspdpn f2, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: mtfprd f6, r5
> -; CHECK-P9-NEXT: mffprwz r5, f7
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> +; CHECK-P9-NEXT: mffprwz r5, f1
> ; CHECK-P9-NEXT: lxv vs1, 48(r4)
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs5
> -; CHECK-P9-NEXT: mtfprd f7, r5
> -; CHECK-P9-NEXT: mffprwz r5, f3
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs6
> -; CHECK-P9-NEXT: xxswapd v5, vs7
> -; CHECK-P9-NEXT: mtfprd f3, r5
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> -; CHECK-P9-NEXT: xxswapd v0, vs3
> -; CHECK-P9-NEXT: vmrglh v4, v5, v4
> -; CHECK-P9-NEXT: xxswapd v5, vs8
> -; CHECK-P9-NEXT: vmrglh v5, v5, v0
> +; CHECK-P9-NEXT: mtvsrd v1, r5
> +; CHECK-P9-NEXT: vmrghh v0, v1, v0
> ; CHECK-P9-NEXT: mffprwz r4, f2
> -; CHECK-P9-NEXT: mtfprd f2, r4
> -; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: vmrglw v3, v5, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> ; CHECK-P9-NEXT: xxmrgld vs2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> +; CHECK-P9-NEXT: mffprwz r4, f0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 3
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> +; CHECK-P9-NEXT: vmrghh v2, v4, v2
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrglw v2, v2, v0
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, vs1
> +; CHECK-P9-NEXT: mtvsrd v3, r4
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: vmrglh v2, v4, v2
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xscvspdpn f0, vs1
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> -; CHECK-P9-NEXT: mffprwz r5, f9
> -; CHECK-P9-NEXT: mtfprd f9, r5
> -; CHECK-P9-NEXT: mffprwz r5, f10
> -; CHECK-P9-NEXT: mtfprd f10, r5
> -; CHECK-P9-NEXT: xxswapd v0, vs9
> -; CHECK-P9-NEXT: xxswapd v1, vs10
> -; CHECK-P9-NEXT: vmrglh v0, v1, v0
> -; CHECK-P9-NEXT: vmrglw v2, v2, v0
> -; CHECK-P9-NEXT: stxv vs2, 0(r3)
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r4
> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2
> ; CHECK-P9-NEXT: stxv vs0, 16(r3)
> +; CHECK-P9-NEXT: stxv vs2, 0(r3)
> ; CHECK-P9-NEXT: blr
> ;
> ; CHECK-BE-LABEL: test16elt:
> @@ -728,12 +668,10 @@ define i32 @test2elt_signed(i64 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: vmrghh v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -748,13 +686,11 @@ define i32 @test2elt_signed(i64 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs1
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: li r3, 0
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -798,20 +734,16 @@ define i64 @test4elt_signed(<4 x float> %a)
> local_unnamed_addr #1 {
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> ; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: mtfprd f3, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs2
> -; CHECK-P8-NEXT: xxswapd v5, vs3
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> -; CHECK-P8-NEXT: vmrglh v3, v4, v5
> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mtvsrd v5, r3
> +; CHECK-P8-NEXT: vmrghh v3, v4, v3
> +; CHECK-P8-NEXT: vmrghh v2, v2, v5
> +; CHECK-P8-NEXT: vmrglw v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -822,27 +754,23 @@ define i64 @test4elt_signed(<4 x float> %a)
> local_unnamed_addr #1 {
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xscvspdpn f0, v2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglh v2, v4, v2
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> +; CHECK-P9-NEXT: vmrghh v2, v4, v2
> ; CHECK-P9-NEXT: vmrglw v2, v2, v3
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> @@ -888,59 +816,51 @@ define <8 x i16> @test8elt_signed(<8 x float>*
> nocapture readonly) local_unnamed
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: lvx v2, 0, r3
> ; CHECK-P8-NEXT: li r4, 16
> -; CHECK-P8-NEXT: lvx v5, r3, r4
> -; CHECK-P8-NEXT: xxswapd vs1, v2
> +; CHECK-P8-NEXT: lvx v3, r3, r4
> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3
> -; CHECK-P8-NEXT: xscvspdpn f4, v5
> -; CHECK-P8-NEXT: xxswapd vs3, v5
> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xxswapd vs1, v2
> +; CHECK-P8-NEXT: xscvspdpn f2, v2
> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1
> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
> +; CHECK-P8-NEXT: xscvspdpn f3, v3
> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
> -; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: mffprwz r6, f1
> -; CHECK-P8-NEXT: mffprwz r5, f0
> -; CHECK-P8-NEXT: mtfprd f1, r6
> -; CHECK-P8-NEXT: mtfprd f0, r5
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: xscvspdpn f0, v2
> -; CHECK-P8-NEXT: mtfprd f4, r4
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v1, vs4
> -; CHECK-P8-NEXT: vmrglh v2, v4, v3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: xxswapd v5, vs2
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: xxswapd vs0, v3
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: mffprwz r3, f2
> +; CHECK-P8-NEXT: xscvdpsxws f2, f4
> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xscvdpsxws f4, f5
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghh v2, v4, v2
> +; CHECK-P8-NEXT: mffprwz r4, f2
> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v6, vs3
> -; CHECK-P8-NEXT: xxswapd v0, vs0
> -; CHECK-P8-NEXT: vmrglh v3, v3, v4
> -; CHECK-P8-NEXT: vmrglh v4, v0, v5
> -; CHECK-P8-NEXT: vmrglh v5, v1, v6
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: vmrghh v3, v3, v4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mtvsrd v5, r3
> +; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: vmrghh v5, v0, v5
> +; CHECK-P8-NEXT: mtvsrd v1, r3
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
> +; CHECK-P8-NEXT: vmrghh v4, v4, v1
> +; CHECK-P8-NEXT: vmrglw v3, v4, v5
> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> ; CHECK-P8-NEXT: blr
> ;
> @@ -952,53 +872,45 @@ define <8 x i16> @test8elt_signed(<8 x float>*
> nocapture readonly) local_unnamed
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> ; CHECK-P9-NEXT: xxswapd vs2, vs1
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> ; CHECK-P9-NEXT: xxswapd vs1, vs0
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
> ; CHECK-P9-NEXT: blr
> @@ -1071,116 +983,100 @@ define void @test16elt_signed(<16 x i16>* noalias
> nocapture sret %agg.result, <1
> ; CHECK-P8-LABEL: test16elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: lvx v5, 0, r4
> -; CHECK-P8-NEXT: li r6, 32
> ; CHECK-P8-NEXT: li r5, 16
> -; CHECK-P8-NEXT: lvx v2, r4, r6
> +; CHECK-P8-NEXT: li r6, 32
> ; CHECK-P8-NEXT: lvx v3, r4, r5
> +; CHECK-P8-NEXT: lvx v2, r4, r6
> ; CHECK-P8-NEXT: li r6, 48
> -; CHECK-P8-NEXT: xscvspdpn f0, v5
> -; CHECK-P8-NEXT: xxsldwi vs1, v5, v5, 3
> +; CHECK-P8-NEXT: xxsldwi vs0, v5, v5, 3
> +; CHECK-P8-NEXT: xscvspdpn f1, v5
> ; CHECK-P8-NEXT: lvx v4, r4, r6
> -; CHECK-P8-NEXT: xscvspdpn f4, v2
> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
> -; CHECK-P8-NEXT: xscvspdpn f2, v3
> ; CHECK-P8-NEXT: xxswapd vs3, v5
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: xxswapd vs8, v3
> -; CHECK-P8-NEXT: xscvspdpn f6, v4
> +; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
> ; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 3
> +; CHECK-P8-NEXT: xxswapd vs8, v3
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: xscvspdpn f3, vs3
> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xxsldwi vs10, v2, v2, 3
> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
> +; CHECK-P8-NEXT: xscvspdpn f7, vs7
> +; CHECK-P8-NEXT: xscvspdpn f8, vs8
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: xxsldwi vs9, v3, v3, 1
> +; CHECK-P8-NEXT: xscvdpsxws f3, f3
> +; CHECK-P8-NEXT: xscvspdpn f2, v3
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xscvdpsxws f1, f5
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: xxsldwi vs0, v3, v3, 1
> +; CHECK-P8-NEXT: xscvspdpn f4, v2
> +; CHECK-P8-NEXT: xscvdpsxws f5, f7
> +; CHECK-P8-NEXT: xxsldwi vs7, v4, v4, 3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 3
> +; CHECK-P8-NEXT: xscvspdpn f6, v4
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xscvdpsxws f1, f8
> +; CHECK-P8-NEXT: xxswapd vs8, v4
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: mffprwz r4, f5
> +; CHECK-P8-NEXT: xxswapd vs5, v2
> ; CHECK-P8-NEXT: xscvspdpn f3, vs3
> -; CHECK-P8-NEXT: xxsldwi vs12, v2, v2, 1
> -; CHECK-P8-NEXT: xscvspdpn f8, vs8
> -; CHECK-P8-NEXT: xxswapd vs11, v2
> ; CHECK-P8-NEXT: xscvdpsxws f4, f4
> -; CHECK-P8-NEXT: xxswapd v2, v4
> +; CHECK-P8-NEXT: vmrghh v3, v0, v3
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xscvdpsxws f6, f6
> +; CHECK-P8-NEXT: xscvspdpn f1, vs5
> +; CHECK-P8-NEXT: xxsldwi vs5, v2, v2, 1
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f2
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghh v2, v5, v1
> +; CHECK-P8-NEXT: vmrghh v5, v6, v0
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mffprwz r4, f4
> +; CHECK-P8-NEXT: xscvdpsxws f2, f3
> +; CHECK-P8-NEXT: xscvspdpn f5, vs5
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: mffprwz r4, f6
> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> ; CHECK-P8-NEXT: xscvspdpn f7, vs7
> -; CHECK-P8-NEXT: xxsldwi vs13, v4, v4, 3
> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
> -; CHECK-P8-NEXT: xxsldwi v3, v4, v4, 1
> -; CHECK-P8-NEXT: xscvspdpn f10, vs10
> +; CHECK-P8-NEXT: mtvsrd v7, r4
> +; CHECK-P8-NEXT: mffprwz r4, f2
> +; CHECK-P8-NEXT: xxsldwi vs2, v4, v4, 1
> +; CHECK-P8-NEXT: xscvspdpn f8, vs8
> +; CHECK-P8-NEXT: xscvdpsxws f0, f5
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xscvspdpn f1, vs2
> +; CHECK-P8-NEXT: xscvdpsxws f3, f7
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: xscvdpsxws f0, f8
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xscvspdpn f9, vs9
> -; CHECK-P8-NEXT: xscvdpsxws f6, f6
> -; CHECK-P8-NEXT: xscvspdpn f12, vs12
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> +; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: vmrghh v0, v0, v7
> +; CHECK-P8-NEXT: mtvsrd v7, r4
> ; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: xscvspdpn f11, vs11
> -; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xscvspdpn v2, v2
> -; CHECK-P8-NEXT: xscvdpsxws f8, f8
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: xscvdpsxws f7, f7
> -; CHECK-P8-NEXT: mffprwz r6, f2
> -; CHECK-P8-NEXT: xscvspdpn f13, vs13
> -; CHECK-P8-NEXT: xscvspdpn v3, v3
> -; CHECK-P8-NEXT: xscvdpsxws f10, f10
> -; CHECK-P8-NEXT: mtfprd f4, r4
> +; CHECK-P8-NEXT: vmrghh v4, v8, v4
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: xscvdpsxws f9, f9
> -; CHECK-P8-NEXT: mtfprd f2, r6
> -; CHECK-P8-NEXT: mffprwz r6, f6
> -; CHECK-P8-NEXT: xscvdpsxws f12, f12
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: xscvdpsxws f11, f11
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f6, r6
> -; CHECK-P8-NEXT: mffprwz r6, f3
> -; CHECK-P8-NEXT: xscvdpsxws v2, v2
> -; CHECK-P8-NEXT: xxswapd v9, vs6
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: mffprwz r4, f8
> -; CHECK-P8-NEXT: mtfprd f3, r6
> -; CHECK-P8-NEXT: xxswapd v0, vs5
> -; CHECK-P8-NEXT: mffprwz r6, f7
> -; CHECK-P8-NEXT: xscvdpsxws f13, f13
> -; CHECK-P8-NEXT: xxswapd v5, vs3
> -; CHECK-P8-NEXT: xscvdpsxws v3, v3
> -; CHECK-P8-NEXT: mtfprd f8, r4
> -; CHECK-P8-NEXT: mffprwz r4, f10
> -; CHECK-P8-NEXT: mtfprd f7, r6
> -; CHECK-P8-NEXT: mffprwz r6, f9
> -; CHECK-P8-NEXT: mtfprd f10, r4
> -; CHECK-P8-NEXT: mffprwz r4, f12
> -; CHECK-P8-NEXT: mtfprd f9, r6
> -; CHECK-P8-NEXT: xxswapd v6, vs10
> -; CHECK-P8-NEXT: mffprwz r6, f11
> -; CHECK-P8-NEXT: mtfprd f12, r4
> -; CHECK-P8-NEXT: xxswapd v1, vs9
> -; CHECK-P8-NEXT: mfvsrwz r4, v2
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: mtfprd f11, r6
> -; CHECK-P8-NEXT: mffprwz r6, f13
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: xxswapd v7, vs11
> -; CHECK-P8-NEXT: mfvsrwz r4, v3
> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
> -; CHECK-P8-NEXT: xxswapd v4, vs7
> -; CHECK-P8-NEXT: vmrglh v2, v2, v0
> -; CHECK-P8-NEXT: xxswapd v5, vs8
> -; CHECK-P8-NEXT: xxswapd v0, vs2
> -; CHECK-P8-NEXT: mtfprd f13, r6
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> -; CHECK-P8-NEXT: vmrglh v4, v5, v4
> -; CHECK-P8-NEXT: vmrglh v5, v0, v1
> -; CHECK-P8-NEXT: xxswapd v1, vs4
> -; CHECK-P8-NEXT: vmrglh v0, v7, v6
> -; CHECK-P8-NEXT: xxswapd v6, vs12
> -; CHECK-P8-NEXT: xxswapd v7, vs13
> -; CHECK-P8-NEXT: xxswapd v10, vs1
> +; CHECK-P8-NEXT: vmrghh v1, v1, v9
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> +; CHECK-P8-NEXT: vmrghh v7, v8, v7
> +; CHECK-P8-NEXT: vmrghh v6, v6, v9
> ; CHECK-P8-NEXT: vmrglw v2, v2, v3
> -; CHECK-P8-NEXT: vmrglh v1, v1, v6
> -; CHECK-P8-NEXT: vmrglh v6, v8, v7
> -; CHECK-P8-NEXT: vmrglh v7, v9, v10
> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
> -; CHECK-P8-NEXT: vmrglw v4, v1, v0
> -; CHECK-P8-NEXT: vmrglw v5, v7, v6
> +; CHECK-P8-NEXT: vmrglw v3, v0, v5
> +; CHECK-P8-NEXT: vmrglw v4, v1, v4
> +; CHECK-P8-NEXT: vmrglw v5, v6, v7
> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> ; CHECK-P8-NEXT: stvx v2, 0, r3
> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4
> @@ -1189,118 +1085,102 @@ define void @test16elt_signed(<16 x i16>*
> noalias nocapture sret %agg.result, <1
> ;
> ; CHECK-P9-LABEL: test16elt_signed:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: lxv vs1, 0(r4)
> -; CHECK-P9-NEXT: lxv vs3, 16(r4)
> -; CHECK-P9-NEXT: xscvspdpn f5, vs1
> -; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3
> -; CHECK-P9-NEXT: xscvspdpn f8, vs3
> -; CHECK-P9-NEXT: xxswapd vs4, vs1
> -; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> +; CHECK-P9-NEXT: lxv vs2, 0(r4)
> +; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3
> +; CHECK-P9-NEXT: xxswapd vs4, vs2
> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
> ; CHECK-P9-NEXT: xscvspdpn f4, vs4
> -; CHECK-P9-NEXT: xscvdpsxws f5, f5
> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: xscvspdpn f5, vs2
> +; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> -; CHECK-P9-NEXT: xscvdpsxws f8, f8
> -; CHECK-P9-NEXT: xxsldwi vs6, vs3, vs3, 3
> -; CHECK-P9-NEXT: xxswapd vs7, vs3
> -; CHECK-P9-NEXT: xscvspdpn f6, vs6
> -; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1
> -; CHECK-P9-NEXT: xscvspdpn f7, vs7
> -; CHECK-P9-NEXT: xscvspdpn f3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: mffprwz r5, f3
> +; CHECK-P9-NEXT: lxv vs1, 16(r4)
> +; CHECK-P9-NEXT: xxsldwi vs6, vs1, vs1, 3
> +; CHECK-P9-NEXT: xxswapd vs3, vs1
> +; CHECK-P9-NEXT: mtvsrd v2, r5
> +; CHECK-P9-NEXT: mffprwz r5, f4
> +; CHECK-P9-NEXT: xscvdpsxws f4, f5
> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
> +; CHECK-P9-NEXT: mtvsrd v3, r5
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> +; CHECK-P9-NEXT: mffprwz r5, f4
> +; CHECK-P9-NEXT: xscvspdpn f4, vs6
> +; CHECK-P9-NEXT: mtvsrd v3, r5
> +; CHECK-P9-NEXT: mffprwz r5, f2
> +; CHECK-P9-NEXT: xscvspdpn f2, vs1
> +; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> ; CHECK-P9-NEXT: xscvdpsxws f4, f4
> -; CHECK-P9-NEXT: xscvdpsxws f6, f6
> -; CHECK-P9-NEXT: mffprwz r5, f5
> -; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: xscvdpsxws f7, f7
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> -; CHECK-P9-NEXT: mtfprd f5, r5
> -; CHECK-P9-NEXT: mffprwz r5, f8
> -; CHECK-P9-NEXT: mtfprd f8, r5
> -; CHECK-P9-NEXT: mffprwz r5, f2
> ; CHECK-P9-NEXT: lxv vs0, 32(r4)
> -; CHECK-P9-NEXT: xxsldwi vs9, vs0, vs0, 3
> -; CHECK-P9-NEXT: xxswapd vs10, vs0
> -; CHECK-P9-NEXT: xscvspdpn f9, vs9
> -; CHECK-P9-NEXT: xscvspdpn f10, vs10
> -; CHECK-P9-NEXT: xscvdpsxws f9, f9
> -; CHECK-P9-NEXT: xscvdpsxws f10, f10
> -; CHECK-P9-NEXT: mtfprd f2, r5
> +; CHECK-P9-NEXT: mtvsrd v4, r5
> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r5, f4
> -; CHECK-P9-NEXT: mtfprd f4, r5
> +; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: mtvsrd v4, r5
> +; CHECK-P9-NEXT: mffprwz r5, f3
> +; CHECK-P9-NEXT: xxsldwi vs3, vs0, vs0, 3
> +; CHECK-P9-NEXT: mtvsrd v5, r5
> +; CHECK-P9-NEXT: mffprwz r5, f2
> +; CHECK-P9-NEXT: xscvspdpn f2, vs3
> +; CHECK-P9-NEXT: vmrghh v4, v5, v4
> +; CHECK-P9-NEXT: mtvsrd v5, r5
> ; CHECK-P9-NEXT: mffprwz r5, f1
> -; CHECK-P9-NEXT: mtfprd f1, r5
> -; CHECK-P9-NEXT: mffprwz r5, f6
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> -; CHECK-P9-NEXT: xxswapd v3, vs4
> +; CHECK-P9-NEXT: xxswapd vs1, vs0
> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> +; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: vmrghh v5, v5, v0
> +; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrglw v3, v5, v4
> +; CHECK-P9-NEXT: mffprwz r5, f2
> ; CHECK-P9-NEXT: xscvspdpn f2, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: mtfprd f6, r5
> -; CHECK-P9-NEXT: mffprwz r5, f7
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> +; CHECK-P9-NEXT: mffprwz r5, f1
> ; CHECK-P9-NEXT: lxv vs1, 48(r4)
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs5
> -; CHECK-P9-NEXT: mtfprd f7, r5
> -; CHECK-P9-NEXT: mffprwz r5, f3
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs6
> -; CHECK-P9-NEXT: xxswapd v5, vs7
> -; CHECK-P9-NEXT: mtfprd f3, r5
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> -; CHECK-P9-NEXT: xxswapd v0, vs3
> -; CHECK-P9-NEXT: vmrglh v4, v5, v4
> -; CHECK-P9-NEXT: xxswapd v5, vs8
> -; CHECK-P9-NEXT: vmrglh v5, v5, v0
> +; CHECK-P9-NEXT: mtvsrd v1, r5
> +; CHECK-P9-NEXT: vmrghh v0, v1, v0
> ; CHECK-P9-NEXT: mffprwz r4, f2
> -; CHECK-P9-NEXT: mtfprd f2, r4
> -; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: vmrglw v3, v5, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> ; CHECK-P9-NEXT: xxmrgld vs2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> +; CHECK-P9-NEXT: mffprwz r4, f0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 3
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> +; CHECK-P9-NEXT: vmrghh v2, v4, v2
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrglw v2, v2, v0
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, vs1
> +; CHECK-P9-NEXT: mtvsrd v3, r4
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: vmrglh v2, v4, v2
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xscvspdpn f0, vs1
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> -; CHECK-P9-NEXT: mffprwz r5, f9
> -; CHECK-P9-NEXT: mtfprd f9, r5
> -; CHECK-P9-NEXT: mffprwz r5, f10
> -; CHECK-P9-NEXT: mtfprd f10, r5
> -; CHECK-P9-NEXT: xxswapd v0, vs9
> -; CHECK-P9-NEXT: xxswapd v1, vs10
> -; CHECK-P9-NEXT: vmrglh v0, v1, v0
> -; CHECK-P9-NEXT: vmrglw v2, v2, v0
> -; CHECK-P9-NEXT: stxv vs2, 0(r3)
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r4
> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2
> ; CHECK-P9-NEXT: stxv vs0, 16(r3)
> +; CHECK-P9-NEXT: stxv vs2, 0(r3)
> ; CHECK-P9-NEXT: blr
> ;
> ; CHECK-BE-LABEL: test16elt_signed:
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
> index 1f95eda2b1b5..928a19f3a55c 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
> @@ -20,12 +20,10 @@ define i16 @test2elt(i64 %a.coerce) local_unnamed_addr
> #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: vmrghb v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: clrldi r3, r3, 48
> @@ -43,13 +41,11 @@ define i16 @test2elt(i64 %a.coerce) local_unnamed_addr
> #0 {
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: addi r3, r1, -2
> -; CHECK-P9-NEXT: xxswapd v2, vs1
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8
> ; CHECK-P9-NEXT: stxsihx v2, 0, r3
> ; CHECK-P9-NEXT: lhz r3, -2(r1)
> @@ -97,20 +93,16 @@ define i32 @test4elt(<4 x float> %a)
> local_unnamed_addr #1 {
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> ; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: mtfprd f3, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs2
> -; CHECK-P8-NEXT: xxswapd v5, vs3
> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
> -; CHECK-P8-NEXT: vmrglb v3, v4, v5
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mtvsrd v5, r3
> +; CHECK-P8-NEXT: vmrghb v3, v4, v3
> +; CHECK-P8-NEXT: vmrghb v2, v2, v5
> +; CHECK-P8-NEXT: vmrglh v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -121,28 +113,24 @@ define i32 @test4elt(<4 x float> %a)
> local_unnamed_addr #1 {
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xscvspdpn f0, v2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: li r3, 0
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglb v2, v4, v2
> +; CHECK-P9-NEXT: vmrghb v2, v4, v2
> ; CHECK-P9-NEXT: vmrglh v2, v2, v3
> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
> ; CHECK-P9-NEXT: blr
> @@ -189,59 +177,51 @@ define i64 @test8elt(<8 x float>* nocapture
> readonly) local_unnamed_addr #2 {
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: lvx v2, 0, r3
> ; CHECK-P8-NEXT: li r4, 16
> -; CHECK-P8-NEXT: lvx v5, r3, r4
> -; CHECK-P8-NEXT: xxswapd vs1, v2
> +; CHECK-P8-NEXT: lvx v3, r3, r4
> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3
> -; CHECK-P8-NEXT: xscvspdpn f4, v5
> -; CHECK-P8-NEXT: xxswapd vs3, v5
> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xxswapd vs1, v2
> +; CHECK-P8-NEXT: xscvspdpn f2, v2
> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1
> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
> +; CHECK-P8-NEXT: xscvspdpn f3, v3
> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
> -; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: mffprwz r6, f1
> -; CHECK-P8-NEXT: mffprwz r5, f0
> -; CHECK-P8-NEXT: mtfprd f1, r6
> -; CHECK-P8-NEXT: mtfprd f0, r5
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: xscvspdpn f0, v2
> -; CHECK-P8-NEXT: mtfprd f4, r4
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v1, vs4
> -; CHECK-P8-NEXT: vmrglb v2, v4, v3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: xxswapd v5, vs2
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: xxswapd vs0, v3
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: mffprwz r3, f2
> +; CHECK-P8-NEXT: xscvdpsxws f2, f4
> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xscvdpsxws f4, f5
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghb v2, v4, v2
> +; CHECK-P8-NEXT: mffprwz r4, f2
> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v6, vs3
> -; CHECK-P8-NEXT: xxswapd v0, vs0
> -; CHECK-P8-NEXT: vmrglb v3, v3, v4
> -; CHECK-P8-NEXT: vmrglb v4, v0, v5
> -; CHECK-P8-NEXT: vmrglb v5, v1, v6
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: vmrghb v3, v3, v4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mtvsrd v5, r3
> +; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: vmrghb v5, v0, v5
> +; CHECK-P8-NEXT: mtvsrd v1, r3
> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
> +; CHECK-P8-NEXT: vmrghb v4, v4, v1
> +; CHECK-P8-NEXT: vmrglh v3, v4, v5
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> @@ -255,53 +235,45 @@ define i64 @test8elt(<8 x float>* nocapture
> readonly) local_unnamed_addr #2 {
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> ; CHECK-P9-NEXT: xxswapd vs2, vs1
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> ; CHECK-P9-NEXT: xxswapd vs1, vs0
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> @@ -376,117 +348,101 @@ entry:
> define <16 x i8> @test16elt(<16 x float>* nocapture readonly)
> local_unnamed_addr #3 {
> ; CHECK-P8-LABEL: test16elt:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: lvx v2, 0, r3
> +; CHECK-P8-NEXT: lvx v4, 0, r3
> ; CHECK-P8-NEXT: li r4, 16
> +; CHECK-P8-NEXT: li r5, 32
> ; CHECK-P8-NEXT: lvx v3, r3, r4
> -; CHECK-P8-NEXT: li r4, 32
> -; CHECK-P8-NEXT: xscvspdpn f2, v2
> -; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
> -; CHECK-P8-NEXT: xscvspdpn f4, v3
> -; CHECK-P8-NEXT: xxswapd vs1, v2
> -; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 1
> -; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
> -; CHECK-P8-NEXT: lvx v2, r3, r4
> +; CHECK-P8-NEXT: lvx v2, r3, r5
> +; CHECK-P8-NEXT: xxsldwi vs0, v4, v4, 3
> +; CHECK-P8-NEXT: xxswapd vs2, v4
> +; CHECK-P8-NEXT: xxsldwi vs4, v4, v4, 1
> +; CHECK-P8-NEXT: xscvspdpn f1, v4
> +; CHECK-P8-NEXT: xscvspdpn f3, v3
> +; CHECK-P8-NEXT: xxsldwi vs6, v3, v3, 3
> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
> -; CHECK-P8-NEXT: xxswapd vs6, v3
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 1
> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
> -; CHECK-P8-NEXT: xxsldwi vs8, v2, v2, 3
> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
> -; CHECK-P8-NEXT: xxswapd vs9, v2
> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
> -; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: xxswapd vs7, v3
> +; CHECK-P8-NEXT: xscvspdpn f2, vs2
> +; CHECK-P8-NEXT: xxsldwi vs8, v3, v3, 1
> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
> +; CHECK-P8-NEXT: xxsldwi vs9, v2, v2, 3
> ; CHECK-P8-NEXT: xscvspdpn f6, vs6
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: mffprwz r4, f2
> ; CHECK-P8-NEXT: xscvspdpn f7, vs7
> -; CHECK-P8-NEXT: mtfprd f2, r4
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: xscvdpsxws f3, f3
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
> +; CHECK-P8-NEXT: xscvdpsxws f4, f4
> ; CHECK-P8-NEXT: xscvspdpn f8, vs8
> -; CHECK-P8-NEXT: mtfprd f4, r4
> -; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: xscvdpsxws f0, f5
> -; CHECK-P8-NEXT: xxswapd v0, vs4
> +; CHECK-P8-NEXT: xscvdpsxws f3, f3
> ; CHECK-P8-NEXT: xscvspdpn f9, vs9
> -; CHECK-P8-NEXT: mtfprd f5, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: xxswapd vs0, v2
> +; CHECK-P8-NEXT: mffprwz r5, f2
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> ; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: mtvsrd v4, r5
> +; CHECK-P8-NEXT: mffprwz r5, f4
> ; CHECK-P8-NEXT: xscvdpsxws f1, f6
> -; CHECK-P8-NEXT: xxswapd v3, vs5
> -; CHECK-P8-NEXT: mtfprd f6, r4
> -; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: vmrghb v3, v4, v3
> +; CHECK-P8-NEXT: mtvsrd v4, r5
> +; CHECK-P8-NEXT: mffprwz r5, f3
> ; CHECK-P8-NEXT: xscvdpsxws f3, f7
> -; CHECK-P8-NEXT: xxswapd v4, vs6
> -; CHECK-P8-NEXT: mtfprd f7, r4
> -; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: xscvdpsxws f0, f8
> -; CHECK-P8-NEXT: xxswapd v5, vs7
> -; CHECK-P8-NEXT: mtfprd f8, r4
> -; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: xscvdpsxws f1, f9
> -; CHECK-P8-NEXT: xxswapd v1, vs8
> -; CHECK-P8-NEXT: mtfprd f9, r4
> -; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: vmrglb v3, v4, v3
> -; CHECK-P8-NEXT: xxswapd v4, vs2
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: xxswapd v6, vs9
> -; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: xscvspdpn f0, v2
> -; CHECK-P8-NEXT: xxswapd v7, vs3
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: vmrglb v4, v4, v5
> -; CHECK-P8-NEXT: xxswapd v5, vs5
> -; CHECK-P8-NEXT: mtfprd f1, r4
> +; CHECK-P8-NEXT: xscvdpsxws f4, f8
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> ; CHECK-P8-NEXT: li r4, 48
> -; CHECK-P8-NEXT: lvx v9, r3, r4
> -; CHECK-P8-NEXT: vmrglb v1, v6, v1
> -; CHECK-P8-NEXT: xxswapd v8, vs1
> +; CHECK-P8-NEXT: lvx v0, r3, r4
> +; CHECK-P8-NEXT: mffprwz r3, f1
> ; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
> -; CHECK-P8-NEXT: xxsldwi vs2, v9, v9, 3
> -; CHECK-P8-NEXT: xscvspdpn f4, v9
> -; CHECK-P8-NEXT: xxswapd vs3, v9
> -; CHECK-P8-NEXT: xxsldwi vs5, v9, v9, 1
> +; CHECK-P8-NEXT: xscvspdpn f5, v2
> +; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: xxsldwi vs3, v0, v0, 3
> +; CHECK-P8-NEXT: mtvsrd v1, r3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> +; CHECK-P8-NEXT: xxswapd vs4, v0
> ; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
> +; CHECK-P8-NEXT: mtvsrd v7, r3
> +; CHECK-P8-NEXT: mffprwz r3, f0
> +; CHECK-P8-NEXT: xxsldwi vs0, v0, v0, 1
> +; CHECK-P8-NEXT: xscvspdpn f2, v0
> ; CHECK-P8-NEXT: xscvspdpn f3, vs3
> -; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
> +; CHECK-P8-NEXT: xscvdpsxws f6, f9
> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: xscvdpsxws f5, f5
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f4, r4
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f6
> +; CHECK-P8-NEXT: xscvdpsxws f4, f4
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghb v2, v6, v1
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: mffprwz r4, f5
> +; CHECK-P8-NEXT: mtvsrd v6, r3
> ; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: vmrghb v4, v5, v4
> +; CHECK-P8-NEXT: mtvsrd v5, r5
> +; CHECK-P8-NEXT: vmrghb v0, v6, v1
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> ; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v9, vs4
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: mtvsrd v6, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> -; CHECK-P8-NEXT: xxswapd v6, vs1
> -; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: vmrglb v2, v0, v7
> -; CHECK-P8-NEXT: xxswapd v0, vs0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v7, vs2
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: vmrglb v5, v8, v5
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> -; CHECK-P8-NEXT: xxswapd v10, vs3
> -; CHECK-P8-NEXT: vmrglb v0, v0, v6
> +; CHECK-P8-NEXT: vmrghb v5, v5, v7
> +; CHECK-P8-NEXT: vmrghb v1, v1, v6
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f4
> +; CHECK-P8-NEXT: mtvsrd v7, r3
> +; CHECK-P8-NEXT: mffprwz r3, f0
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mtvsrd v9, r3
> +; CHECK-P8-NEXT: vmrghb v7, v8, v7
> +; CHECK-P8-NEXT: vmrghb v6, v6, v9
> ; CHECK-P8-NEXT: vmrglh v3, v4, v3
> -; CHECK-P8-NEXT: vmrglb v6, v8, v7
> -; CHECK-P8-NEXT: vmrglb v7, v9, v10
> -; CHECK-P8-NEXT: vmrglh v2, v2, v1
> -; CHECK-P8-NEXT: vmrglh v4, v0, v5
> -; CHECK-P8-NEXT: vmrglh v5, v7, v6
> +; CHECK-P8-NEXT: vmrglh v2, v5, v2
> +; CHECK-P8-NEXT: vmrglh v4, v1, v0
> +; CHECK-P8-NEXT: vmrglh v5, v6, v7
> ; CHECK-P8-NEXT: vmrglw v2, v2, v3
> ; CHECK-P8-NEXT: vmrglw v3, v5, v4
> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> @@ -494,114 +450,98 @@ define <16 x i8> @test16elt(<16 x float>* nocapture
> readonly) local_unnamed_addr
> ;
> ; CHECK-P9-LABEL: test16elt:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: lxv vs2, 0(r3)
> +; CHECK-P9-NEXT: lxv vs3, 0(r3)
> +; CHECK-P9-NEXT: xxsldwi vs4, vs3, vs3, 3
> +; CHECK-P9-NEXT: xscvspdpn f4, vs4
> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: lxv vs0, 48(r3)
> +; CHECK-P9-NEXT: lxv vs1, 32(r3)
> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
> +; CHECK-P9-NEXT: mffprwz r3, f4
> +; CHECK-P9-NEXT: xxswapd vs4, vs3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> +; CHECK-P9-NEXT: xscvspdpn f4, vs4
> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: mffprwz r3, f4
> +; CHECK-P9-NEXT: xscvspdpn f4, vs3
> +; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: mffprwz r3, f4
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: mffprwz r3, f3
> ; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f3, vs3
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> -; CHECK-P9-NEXT: lxv vs0, 48(r3)
> -; CHECK-P9-NEXT: lxv vs1, 32(r3)
> -; CHECK-P9-NEXT: lxv vs4, 16(r3)
> +; CHECK-P9-NEXT: vmrglh v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs3
> ; CHECK-P9-NEXT: xxswapd vs3, vs2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> ; CHECK-P9-NEXT: xscvspdpn f3, vs2
> ; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 3
> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: xxswapd vs2, vs4
> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: xscvspdpn f2, vs4
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 1
> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs2
> ; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> ; CHECK-P9-NEXT: xxswapd vs2, vs1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xxswapd vs1, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghb v4, v5, v4
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v4, v5, v4
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> -; CHECK-P9-NEXT: xxswapd v0, vs0
> -; CHECK-P9-NEXT: vmrglb v5, v5, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r3
> +; CHECK-P9-NEXT: vmrghb v5, v5, v0
> ; CHECK-P9-NEXT: vmrglh v4, v5, v4
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
> @@ -738,12 +678,10 @@ define i16 @test2elt_signed(i64 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: vmrghb v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: clrldi r3, r3, 48
> @@ -761,13 +699,11 @@ define i16 @test2elt_signed(i64 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: addi r3, r1, -2
> -; CHECK-P9-NEXT: xxswapd v2, vs1
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8
> ; CHECK-P9-NEXT: stxsihx v2, 0, r3
> ; CHECK-P9-NEXT: lhz r3, -2(r1)
> @@ -815,20 +751,16 @@ define i32 @test4elt_signed(<4 x float> %a)
> local_unnamed_addr #1 {
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> ; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: mtfprd f3, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs2
> -; CHECK-P8-NEXT: xxswapd v5, vs3
> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
> -; CHECK-P8-NEXT: vmrglb v3, v4, v5
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mtvsrd v5, r3
> +; CHECK-P8-NEXT: vmrghb v3, v4, v3
> +; CHECK-P8-NEXT: vmrghb v2, v2, v5
> +; CHECK-P8-NEXT: vmrglh v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -839,28 +771,24 @@ define i32 @test4elt_signed(<4 x float> %a)
> local_unnamed_addr #1 {
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xscvspdpn f0, v2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: li r3, 0
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglb v2, v4, v2
> +; CHECK-P9-NEXT: vmrghb v2, v4, v2
> ; CHECK-P9-NEXT: vmrglh v2, v2, v3
> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
> ; CHECK-P9-NEXT: blr
> @@ -907,59 +835,51 @@ define i64 @test8elt_signed(<8 x float>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: lvx v2, 0, r3
> ; CHECK-P8-NEXT: li r4, 16
> -; CHECK-P8-NEXT: lvx v5, r3, r4
> -; CHECK-P8-NEXT: xxswapd vs1, v2
> +; CHECK-P8-NEXT: lvx v3, r3, r4
> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3
> -; CHECK-P8-NEXT: xscvspdpn f4, v5
> -; CHECK-P8-NEXT: xxswapd vs3, v5
> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xxswapd vs1, v2
> +; CHECK-P8-NEXT: xscvspdpn f2, v2
> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1
> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
> +; CHECK-P8-NEXT: xscvspdpn f3, v3
> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
> -; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: mffprwz r6, f1
> -; CHECK-P8-NEXT: mffprwz r5, f0
> -; CHECK-P8-NEXT: mtfprd f1, r6
> -; CHECK-P8-NEXT: mtfprd f0, r5
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: xscvspdpn f0, v2
> -; CHECK-P8-NEXT: mtfprd f4, r4
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v1, vs4
> -; CHECK-P8-NEXT: vmrglb v2, v4, v3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: xxswapd v5, vs2
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mffprwz r3, f1
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: xxswapd vs0, v3
> +; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: mffprwz r3, f2
> +; CHECK-P8-NEXT: xscvdpsxws f2, f4
> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
> +; CHECK-P8-NEXT: xscvdpsxws f4, f5
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghb v2, v4, v2
> +; CHECK-P8-NEXT: mffprwz r4, f2
> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: xxswapd v4, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v6, vs3
> -; CHECK-P8-NEXT: xxswapd v0, vs0
> -; CHECK-P8-NEXT: vmrglb v3, v3, v4
> -; CHECK-P8-NEXT: vmrglb v4, v0, v5
> -; CHECK-P8-NEXT: vmrglb v5, v1, v6
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: vmrghb v3, v3, v4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> +; CHECK-P8-NEXT: mtvsrd v5, r3
> +; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: vmrghb v5, v0, v5
> +; CHECK-P8-NEXT: mtvsrd v1, r3
> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
> +; CHECK-P8-NEXT: vmrghb v4, v4, v1
> +; CHECK-P8-NEXT: vmrglh v3, v4, v5
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> @@ -973,53 +893,45 @@ define i64 @test8elt_signed(<8 x float>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> ; CHECK-P9-NEXT: xxswapd vs2, vs1
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> ; CHECK-P9-NEXT: xxswapd vs1, vs0
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> @@ -1094,117 +1006,101 @@ entry:
> define <16 x i8> @test16elt_signed(<16 x float>* nocapture readonly)
> local_unnamed_addr #3 {
> ; CHECK-P8-LABEL: test16elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: lvx v2, 0, r3
> +; CHECK-P8-NEXT: lvx v4, 0, r3
> ; CHECK-P8-NEXT: li r4, 16
> +; CHECK-P8-NEXT: li r5, 32
> ; CHECK-P8-NEXT: lvx v3, r3, r4
> -; CHECK-P8-NEXT: li r4, 32
> -; CHECK-P8-NEXT: xscvspdpn f2, v2
> -; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
> -; CHECK-P8-NEXT: xscvspdpn f4, v3
> -; CHECK-P8-NEXT: xxswapd vs1, v2
> -; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 1
> -; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
> -; CHECK-P8-NEXT: lvx v2, r3, r4
> +; CHECK-P8-NEXT: lvx v2, r3, r5
> +; CHECK-P8-NEXT: xxsldwi vs0, v4, v4, 3
> +; CHECK-P8-NEXT: xxswapd vs2, v4
> +; CHECK-P8-NEXT: xxsldwi vs4, v4, v4, 1
> +; CHECK-P8-NEXT: xscvspdpn f1, v4
> +; CHECK-P8-NEXT: xscvspdpn f3, v3
> +; CHECK-P8-NEXT: xxsldwi vs6, v3, v3, 3
> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
> -; CHECK-P8-NEXT: xxswapd vs6, v3
> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 1
> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
> -; CHECK-P8-NEXT: xxsldwi vs8, v2, v2, 3
> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
> -; CHECK-P8-NEXT: xxswapd vs9, v2
> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
> -; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: xxswapd vs7, v3
> +; CHECK-P8-NEXT: xscvspdpn f2, vs2
> +; CHECK-P8-NEXT: xxsldwi vs8, v3, v3, 1
> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
> +; CHECK-P8-NEXT: xxsldwi vs9, v2, v2, 3
> ; CHECK-P8-NEXT: xscvspdpn f6, vs6
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: mffprwz r4, f2
> ; CHECK-P8-NEXT: xscvspdpn f7, vs7
> -; CHECK-P8-NEXT: mtfprd f2, r4
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: xscvdpsxws f3, f3
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
> +; CHECK-P8-NEXT: xscvdpsxws f4, f4
> ; CHECK-P8-NEXT: xscvspdpn f8, vs8
> -; CHECK-P8-NEXT: mtfprd f4, r4
> -; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: xscvdpsxws f0, f5
> -; CHECK-P8-NEXT: xxswapd v0, vs4
> +; CHECK-P8-NEXT: xscvdpsxws f3, f3
> ; CHECK-P8-NEXT: xscvspdpn f9, vs9
> -; CHECK-P8-NEXT: mtfprd f5, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: xxswapd vs0, v2
> +; CHECK-P8-NEXT: mffprwz r5, f2
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> ; CHECK-P8-NEXT: mffprwz r4, f1
> +; CHECK-P8-NEXT: mtvsrd v4, r5
> +; CHECK-P8-NEXT: mffprwz r5, f4
> ; CHECK-P8-NEXT: xscvdpsxws f1, f6
> -; CHECK-P8-NEXT: xxswapd v3, vs5
> -; CHECK-P8-NEXT: mtfprd f6, r4
> -; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: vmrghb v3, v4, v3
> +; CHECK-P8-NEXT: mtvsrd v4, r5
> +; CHECK-P8-NEXT: mffprwz r5, f3
> ; CHECK-P8-NEXT: xscvdpsxws f3, f7
> -; CHECK-P8-NEXT: xxswapd v4, vs6
> -; CHECK-P8-NEXT: mtfprd f7, r4
> -; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: xscvdpsxws f0, f8
> -; CHECK-P8-NEXT: xxswapd v5, vs7
> -; CHECK-P8-NEXT: mtfprd f8, r4
> -; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: xscvdpsxws f1, f9
> -; CHECK-P8-NEXT: xxswapd v1, vs8
> -; CHECK-P8-NEXT: mtfprd f9, r4
> -; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: vmrglb v3, v4, v3
> -; CHECK-P8-NEXT: xxswapd v4, vs2
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: xxswapd v6, vs9
> -; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: xscvspdpn f0, v2
> -; CHECK-P8-NEXT: xxswapd v7, vs3
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: vmrglb v4, v4, v5
> -; CHECK-P8-NEXT: xxswapd v5, vs5
> -; CHECK-P8-NEXT: mtfprd f1, r4
> +; CHECK-P8-NEXT: xscvdpsxws f4, f8
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> ; CHECK-P8-NEXT: li r4, 48
> -; CHECK-P8-NEXT: lvx v9, r3, r4
> -; CHECK-P8-NEXT: vmrglb v1, v6, v1
> -; CHECK-P8-NEXT: xxswapd v8, vs1
> +; CHECK-P8-NEXT: lvx v0, r3, r4
> +; CHECK-P8-NEXT: mffprwz r3, f1
> ; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
> -; CHECK-P8-NEXT: xxsldwi vs2, v9, v9, 3
> -; CHECK-P8-NEXT: xscvspdpn f4, v9
> -; CHECK-P8-NEXT: xxswapd vs3, v9
> -; CHECK-P8-NEXT: xxsldwi vs5, v9, v9, 1
> +; CHECK-P8-NEXT: xscvspdpn f5, v2
> +; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: xxsldwi vs3, v0, v0, 3
> +; CHECK-P8-NEXT: mtvsrd v1, r3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> +; CHECK-P8-NEXT: xxswapd vs4, v0
> ; CHECK-P8-NEXT: xscvspdpn f1, vs1
> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
> +; CHECK-P8-NEXT: mtvsrd v7, r3
> +; CHECK-P8-NEXT: mffprwz r3, f0
> +; CHECK-P8-NEXT: xxsldwi vs0, v0, v0, 1
> +; CHECK-P8-NEXT: xscvspdpn f2, v0
> ; CHECK-P8-NEXT: xscvspdpn f3, vs3
> -; CHECK-P8-NEXT: xscvspdpn f5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
> +; CHECK-P8-NEXT: xscvdpsxws f6, f9
> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
> +; CHECK-P8-NEXT: xscvdpsxws f5, f5
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f4, r4
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f6
> +; CHECK-P8-NEXT: xscvdpsxws f4, f4
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghb v2, v6, v1
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: mffprwz r4, f5
> +; CHECK-P8-NEXT: mtvsrd v6, r3
> ; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: vmrghb v4, v5, v4
> +; CHECK-P8-NEXT: mtvsrd v5, r5
> +; CHECK-P8-NEXT: vmrghb v0, v6, v1
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> ; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: xxswapd v9, vs4
> -; CHECK-P8-NEXT: mtfprd f1, r3
> +; CHECK-P8-NEXT: mtvsrd v6, r3
> ; CHECK-P8-NEXT: mffprwz r3, f3
> -; CHECK-P8-NEXT: mtfprd f2, r4
> -; CHECK-P8-NEXT: xxswapd v6, vs1
> -; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: vmrglb v2, v0, v7
> -; CHECK-P8-NEXT: xxswapd v0, vs0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v7, vs2
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: vmrglb v5, v8, v5
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> -; CHECK-P8-NEXT: xxswapd v10, vs3
> -; CHECK-P8-NEXT: vmrglb v0, v0, v6
> +; CHECK-P8-NEXT: vmrghb v5, v5, v7
> +; CHECK-P8-NEXT: vmrghb v1, v1, v6
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f4
> +; CHECK-P8-NEXT: mtvsrd v7, r3
> +; CHECK-P8-NEXT: mffprwz r3, f0
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mtvsrd v9, r3
> +; CHECK-P8-NEXT: vmrghb v7, v8, v7
> +; CHECK-P8-NEXT: vmrghb v6, v6, v9
> ; CHECK-P8-NEXT: vmrglh v3, v4, v3
> -; CHECK-P8-NEXT: vmrglb v6, v8, v7
> -; CHECK-P8-NEXT: vmrglb v7, v9, v10
> -; CHECK-P8-NEXT: vmrglh v2, v2, v1
> -; CHECK-P8-NEXT: vmrglh v4, v0, v5
> -; CHECK-P8-NEXT: vmrglh v5, v7, v6
> +; CHECK-P8-NEXT: vmrglh v2, v5, v2
> +; CHECK-P8-NEXT: vmrglh v4, v1, v0
> +; CHECK-P8-NEXT: vmrglh v5, v6, v7
> ; CHECK-P8-NEXT: vmrglw v2, v2, v3
> ; CHECK-P8-NEXT: vmrglw v3, v5, v4
> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> @@ -1212,114 +1108,98 @@ define <16 x i8> @test16elt_signed(<16 x float>*
> nocapture readonly) local_unnam
> ;
> ; CHECK-P9-LABEL: test16elt_signed:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: lxv vs2, 0(r3)
> +; CHECK-P9-NEXT: lxv vs3, 0(r3)
> +; CHECK-P9-NEXT: xxsldwi vs4, vs3, vs3, 3
> +; CHECK-P9-NEXT: xscvspdpn f4, vs4
> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: lxv vs0, 48(r3)
> +; CHECK-P9-NEXT: lxv vs1, 32(r3)
> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
> +; CHECK-P9-NEXT: mffprwz r3, f4
> +; CHECK-P9-NEXT: xxswapd vs4, vs3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> +; CHECK-P9-NEXT: xscvspdpn f4, vs4
> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: mffprwz r3, f4
> +; CHECK-P9-NEXT: xscvspdpn f4, vs3
> +; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: mffprwz r3, f4
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: mffprwz r3, f3
> ; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f3, vs3
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> -; CHECK-P9-NEXT: lxv vs0, 48(r3)
> -; CHECK-P9-NEXT: lxv vs1, 32(r3)
> -; CHECK-P9-NEXT: lxv vs4, 16(r3)
> +; CHECK-P9-NEXT: vmrglh v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs3
> ; CHECK-P9-NEXT: xxswapd vs3, vs2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> ; CHECK-P9-NEXT: xscvspdpn f3, vs2
> ; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 3
> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: xxswapd vs2, vs4
> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: xscvspdpn f2, vs4
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 1
> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs2
> ; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> ; CHECK-P9-NEXT: xxswapd vs2, vs1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xxswapd vs1, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghb v4, v5, v4
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v4, v5, v4
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> -; CHECK-P9-NEXT: xxswapd v0, vs0
> -; CHECK-P9-NEXT: vmrglb v5, v5, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r3
> +; CHECK-P9-NEXT: vmrghb v5, v5, v0
> ; CHECK-P9-NEXT: vmrglh v4, v5, v4
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
> index c7d66ae784a0..dbc2774fed8c 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
> @@ -16,12 +16,10 @@ define i32 @test2elt(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: vmrglh v2, v2, v3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: vmrghh v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -30,15 +28,13 @@ define i32 @test2elt(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P9: # %bb.0: # %entry
> ; CHECK-P9-NEXT: xscvdpsxws f0, v2
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: li r3, 0
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -77,18 +73,14 @@ define i64 @test4elt(<4 x double>* nocapture readonly)
> local_unnamed_addr #1 {
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r3, f2
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: mtfprd f3, r4
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: xxswapd v2, vs2
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: xxswapd v4, vs3
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: xxswapd v5, vs1
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> +; CHECK-P8-NEXT: vmrghh v2, v4, v2
> +; CHECK-P8-NEXT: vmrghh v3, v5, v3
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> @@ -102,22 +94,18 @@ define i64 @test4elt(<4 x double>* nocapture
> readonly) local_unnamed_addr #1 {
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> @@ -176,36 +164,28 @@ define <8 x i16> @test8elt(<8 x double>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P8-NEXT: xxswapd vs3, vs3
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: mtfprd f4, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r3, f6
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs4
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r4, f7
> -; CHECK-P8-NEXT: mtfprd f6, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs5
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mtfprd f7, r4
> -; CHECK-P8-NEXT: xxswapd v4, vs6
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v1, vs7
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> ; CHECK-P8-NEXT: mffprwz r3, f2
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v5, vs0
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: xxswapd v0, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: xxswapd v6, vs2
> -; CHECK-P8-NEXT: vmrglh v2, v5, v2
> -; CHECK-P8-NEXT: xxswapd v5, vs0
> -; CHECK-P8-NEXT: vmrglh v3, v0, v3
> -; CHECK-P8-NEXT: vmrglh v4, v6, v4
> -; CHECK-P8-NEXT: vmrglh v5, v5, v1
> +; CHECK-P8-NEXT: vmrghh v2, v0, v2
> +; CHECK-P8-NEXT: vmrghh v3, v1, v3
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: vmrghh v4, v0, v4
> +; CHECK-P8-NEXT: vmrghh v5, v1, v5
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> ; CHECK-P8-NEXT: vmrglw v3, v5, v4
> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> @@ -217,47 +197,39 @@ define <8 x i16> @test8elt(<8 x double>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
> ; CHECK-P9-NEXT: xxswapd vs3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
> ; CHECK-P9-NEXT: lxv vs0, 48(r3)
> ; CHECK-P9-NEXT: lxv vs1, 32(r3)
> -; CHECK-P9-NEXT: lxv vs2, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f4
> -; CHECK-P9-NEXT: mtfprd f4, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: xxswapd v2, vs4
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
> ; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: mffprwz r3, f1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
> ; CHECK-P9-NEXT: blr
> @@ -321,209 +293,177 @@ entry:
> define void @test16elt(<16 x i16>* noalias nocapture sret %agg.result,
> <16 x double>* nocapture readonly) local_unnamed_addr #3 {
> ; CHECK-P8-LABEL: test16elt:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
> ; CHECK-P8-NEXT: li r5, 16
> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
> ; CHECK-P8-NEXT: li r6, 32
> +; CHECK-P8-NEXT: li r7, 48
> ; CHECK-P8-NEXT: lxvd2x vs1, r4, r5
> ; CHECK-P8-NEXT: lxvd2x vs2, r4, r6
> -; CHECK-P8-NEXT: li r6, 48
> -; CHECK-P8-NEXT: lxvd2x vs3, r4, r6
> ; CHECK-P8-NEXT: li r6, 64
> -; CHECK-P8-NEXT: xscvdpsxws f4, f0
> +; CHECK-P8-NEXT: lxvd2x vs3, r4, r7
> ; CHECK-P8-NEXT: lxvd2x vs5, r4, r6
> -; CHECK-P8-NEXT: li r6, 80
> +; CHECK-P8-NEXT: li r7, 80
> +; CHECK-P8-NEXT: li r6, 96
> +; CHECK-P8-NEXT: xscvdpsxws f4, f0
> +; CHECK-P8-NEXT: lxvd2x vs7, r4, r7
> +; CHECK-P8-NEXT: lxvd2x vs10, r4, r6
> +; CHECK-P8-NEXT: li r6, 112
> ; CHECK-P8-NEXT: xxswapd vs0, vs0
> ; CHECK-P8-NEXT: xscvdpsxws f6, f1
> -; CHECK-P8-NEXT: lxvd2x vs7, r4, r6
> -; CHECK-P8-NEXT: li r6, 96
> ; CHECK-P8-NEXT: xxswapd vs1, vs1
> ; CHECK-P8-NEXT: xscvdpsxws f8, f2
> -; CHECK-P8-NEXT: lxvd2x vs9, r4, r6
> -; CHECK-P8-NEXT: li r6, 112
> ; CHECK-P8-NEXT: xxswapd vs2, vs2
> -; CHECK-P8-NEXT: xscvdpsxws f10, f3
> -; CHECK-P8-NEXT: lxvd2x vs11, r4, r6
> +; CHECK-P8-NEXT: xscvdpsxws f9, f3
> ; CHECK-P8-NEXT: xxswapd vs3, vs3
> -; CHECK-P8-NEXT: xscvdpsxws f12, f5
> +; CHECK-P8-NEXT: xscvdpsxws f11, f5
> ; CHECK-P8-NEXT: xxswapd vs5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f13, f7
> +; CHECK-P8-NEXT: xscvdpsxws f12, f7
> ; CHECK-P8-NEXT: xxswapd vs7, vs7
> -; CHECK-P8-NEXT: xscvdpsxws v2, f9
> -; CHECK-P8-NEXT: xxswapd vs9, vs9
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: xscvdpsxws v3, f11
> -; CHECK-P8-NEXT: xxswapd vs11, vs11
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: mffprwz r6, f6
> -; CHECK-P8-NEXT: mtfprd f4, r4
> +; CHECK-P8-NEXT: mffprwz r7, f4
> +; CHECK-P8-NEXT: lxvd2x vs4, r4, r6
> +; CHECK-P8-NEXT: mffprwz r4, f6
> +; CHECK-P8-NEXT: xscvdpsxws f13, f10
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r4, f8
> +; CHECK-P8-NEXT: xscvdpsxws f6, f4
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mffprwz r4, f9
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> +; CHECK-P8-NEXT: mffprwz r4, f11
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xxswapd v4, vs4
> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
> -; CHECK-P8-NEXT: mtfprd f6, r6
> -; CHECK-P8-NEXT: mffprwz r6, f10
> -; CHECK-P8-NEXT: mtfprd f8, r4
> -; CHECK-P8-NEXT: xxswapd v5, vs6
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> ; CHECK-P8-NEXT: mffprwz r4, f12
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: xxswapd v0, vs8
> -; CHECK-P8-NEXT: mtfprd f10, r6
> -; CHECK-P8-NEXT: mffprwz r6, f13
> -; CHECK-P8-NEXT: mtfprd f12, r4
> -; CHECK-P8-NEXT: xxswapd v1, vs10
> -; CHECK-P8-NEXT: mfvsrwz r4, v2
> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: mffprwz r4, f13
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xxswapd v6, vs12
> -; CHECK-P8-NEXT: xscvdpsxws f9, f9
> -; CHECK-P8-NEXT: mtfprd f13, r6
> -; CHECK-P8-NEXT: mfvsrwz r6, v3
> -; CHECK-P8-NEXT: mtvsrd v2, r4
> -; CHECK-P8-NEXT: xxswapd v7, vs13
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f6
> +; CHECK-P8-NEXT: xxswapd vs6, vs10
> +; CHECK-P8-NEXT: xscvdpsxws f5, f5
> +; CHECK-P8-NEXT: mtvsrd v7, r4
> ; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: xxswapd vs0, vs4
> +; CHECK-P8-NEXT: mtvsrd v2, r7
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f1
> ; CHECK-P8-NEXT: xscvdpsxws f7, f7
> -; CHECK-P8-NEXT: xxswapd v2, v2
> -; CHECK-P8-NEXT: xscvdpsxws f11, f11
> -; CHECK-P8-NEXT: mtvsrd v3, r6
> -; CHECK-P8-NEXT: mffprwz r6, f1
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: xxswapd v3, v3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: mtfprd f1, r6
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> -; CHECK-P8-NEXT: mtfprd f2, r4
> +; CHECK-P8-NEXT: xscvdpsxws f4, f6
> +; CHECK-P8-NEXT: vmrghh v2, v8, v2
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghh v3, v9, v3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: xxswapd v9, vs1
> -; CHECK-P8-NEXT: mffprwz r6, f3
> -; CHECK-P8-NEXT: xxswapd v10, vs2
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: mffprwz r4, f9
> -; CHECK-P8-NEXT: mtfprd f3, r6
> -; CHECK-P8-NEXT: mffprwz r6, f7
> -; CHECK-P8-NEXT: mtfprd f9, r4
> -; CHECK-P8-NEXT: mffprwz r4, f11
> -; CHECK-P8-NEXT: vmrglh v4, v8, v4
> -; CHECK-P8-NEXT: xxswapd v8, vs3
> -; CHECK-P8-NEXT: vmrglh v5, v9, v5
> -; CHECK-P8-NEXT: xxswapd v9, vs5
> -; CHECK-P8-NEXT: mtfprd f7, r6
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: vmrglh v0, v10, v0
> -; CHECK-P8-NEXT: xxswapd v10, vs7
> -; CHECK-P8-NEXT: vmrglh v1, v8, v1
> -; CHECK-P8-NEXT: xxswapd v8, vs9
> -; CHECK-P8-NEXT: vmrglh v6, v9, v6
> -; CHECK-P8-NEXT: xxswapd v9, vs0
> -; CHECK-P8-NEXT: vmrglh v7, v10, v7
> -; CHECK-P8-NEXT: vmrglh v2, v8, v2
> -; CHECK-P8-NEXT: vmrglh v3, v9, v3
> -; CHECK-P8-NEXT: vmrglw v4, v5, v4
> -; CHECK-P8-NEXT: vmrglw v5, v1, v0
> -; CHECK-P8-NEXT: vmrglw v0, v7, v6
> +; CHECK-P8-NEXT: vmrghh v4, v8, v4
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f7
> +; CHECK-P8-NEXT: vmrghh v5, v9, v5
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> +; CHECK-P8-NEXT: mffprwz r4, f4
> +; CHECK-P8-NEXT: vmrghh v0, v8, v0
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: vmrghh v1, v9, v1
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> +; CHECK-P8-NEXT: vmrghh v6, v8, v6
> +; CHECK-P8-NEXT: vmrghh v7, v9, v7
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: vmrglw v3, v5, v4
> +; CHECK-P8-NEXT: vmrglw v4, v1, v0
> +; CHECK-P8-NEXT: vmrglw v5, v7, v6
> +; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> +; CHECK-P8-NEXT: stvx v2, 0, r3
> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4
> -; CHECK-P8-NEXT: stvx v3, 0, r3
> -; CHECK-P8-NEXT: xxmrgld v2, v2, v0
> -; CHECK-P8-NEXT: stvx v2, r3, r5
> +; CHECK-P8-NEXT: stvx v3, r3, r5
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: test16elt:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: lxv vs4, 0(r4)
> -; CHECK-P9-NEXT: lxv vs3, 16(r4)
> -; CHECK-P9-NEXT: lxv vs2, 32(r4)
> -; CHECK-P9-NEXT: xscvdpsxws f5, f4
> -; CHECK-P9-NEXT: lxv vs1, 48(r4)
> -; CHECK-P9-NEXT: xscvdpsxws f6, f3
> -; CHECK-P9-NEXT: lxv vs0, 64(r4)
> -; CHECK-P9-NEXT: xscvdpsxws f7, f2
> -; CHECK-P9-NEXT: xscvdpsxws f8, f1
> -; CHECK-P9-NEXT: xxswapd vs4, vs4
> -; CHECK-P9-NEXT: xscvdpsxws f4, f4
> -; CHECK-P9-NEXT: mffprwz r5, f5
> -; CHECK-P9-NEXT: xscvdpsxws f9, f0
> +; CHECK-P9-NEXT: lxv vs3, 0(r4)
> +; CHECK-P9-NEXT: lxv vs2, 16(r4)
> +; CHECK-P9-NEXT: lxv vs1, 32(r4)
> +; CHECK-P9-NEXT: xscvdpsxws f4, f3
> +; CHECK-P9-NEXT: lxv vs0, 48(r4)
> +; CHECK-P9-NEXT: xscvdpsxws f5, f2
> +; CHECK-P9-NEXT: xscvdpsxws f6, f1
> ; CHECK-P9-NEXT: xxswapd vs3, vs3
> +; CHECK-P9-NEXT: xscvdpsxws f7, f0
> +; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: mffprwz r5, f4
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> -; CHECK-P9-NEXT: mtfprd f5, r5
> -; CHECK-P9-NEXT: mffprwz r5, f6
> ; CHECK-P9-NEXT: xxswapd vs2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: mtfprd f6, r5
> +; CHECK-P9-NEXT: mtvsrd v2, r5
> +; CHECK-P9-NEXT: mffprwz r5, f5
> +; CHECK-P9-NEXT: mtvsrd v3, r5
> +; CHECK-P9-NEXT: mffprwz r5, f6
> +; CHECK-P9-NEXT: mtvsrd v4, r5
> ; CHECK-P9-NEXT: mffprwz r5, f7
> -; CHECK-P9-NEXT: mtfprd f7, r5
> -; CHECK-P9-NEXT: mffprwz r5, f8
> -; CHECK-P9-NEXT: mtfprd f8, r5
> -; CHECK-P9-NEXT: mffprwz r5, f9
> -; CHECK-P9-NEXT: mtfprd f9, r5
> -; CHECK-P9-NEXT: mffprwz r5, f4
> -; CHECK-P9-NEXT: mtfprd f4, r5
> +; CHECK-P9-NEXT: mtvsrd v5, r5
> ; CHECK-P9-NEXT: mffprwz r5, f3
> +; CHECK-P9-NEXT: lxv vs3, 64(r4)
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: xxswapd v2, vs5
> -; CHECK-P9-NEXT: xxswapd v5, vs8
> -; CHECK-P9-NEXT: xxswapd v0, vs9
> -; CHECK-P9-NEXT: mtfprd f3, r5
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> ; CHECK-P9-NEXT: mffprwz r5, f2
> -; CHECK-P9-NEXT: mtfprd f2, r5
> -; CHECK-P9-NEXT: xxswapd vs0, vs0
> -; CHECK-P9-NEXT: xscvdpsxws f0, f0
> -; CHECK-P9-NEXT: xxswapd v1, vs2
> ; CHECK-P9-NEXT: lxv vs2, 80(r4)
> -; CHECK-P9-NEXT: xxswapd v3, vs4
> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs6
> -; CHECK-P9-NEXT: xxswapd v4, vs3
> -; CHECK-P9-NEXT: xscvdpsxws f3, f2
> -; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: vmrghh v2, v2, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> ; CHECK-P9-NEXT: mffprwz r5, f1
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs7
> -; CHECK-P9-NEXT: mtfprd f1, r5
> +; CHECK-P9-NEXT: lxv vs1, 96(r4)
> +; CHECK-P9-NEXT: xscvdpsxws f4, f3
> +; CHECK-P9-NEXT: xxswapd vs3, vs3
> +; CHECK-P9-NEXT: vmrghh v3, v3, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> ; CHECK-P9-NEXT: mffprwz r5, f0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v1
> -; CHECK-P9-NEXT: xxswapd v1, vs1
> -; CHECK-P9-NEXT: mtfprd f0, r5
> -; CHECK-P9-NEXT: vmrglh v5, v5, v1
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: xxswapd v1, vs0
> ; CHECK-P9-NEXT: lxv vs0, 112(r4)
> -; CHECK-P9-NEXT: lxv vs1, 96(r4)
> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: vmrghh v4, v4, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
> +; CHECK-P9-NEXT: vmrghh v5, v5, v0
> +; CHECK-P9-NEXT: mffprwz r4, f4
> +; CHECK-P9-NEXT: vmrglw v4, v5, v4
> +; CHECK-P9-NEXT: mtvsrd v3, r4
> ; CHECK-P9-NEXT: mffprwz r4, f3
> -; CHECK-P9-NEXT: mtfprd f3, r4
> +; CHECK-P9-NEXT: xscvdpsxws f3, f2
> +; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: xxmrgld vs4, v4, v2
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> +; CHECK-P9-NEXT: stxv vs4, 0(r3)
> +; CHECK-P9-NEXT: mffprwz r4, f3
> +; CHECK-P9-NEXT: mtvsrd v3, r4
> ; CHECK-P9-NEXT: mffprwz r4, f2
> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
> -; CHECK-P9-NEXT: vmrglw v3, v5, v4
> -; CHECK-P9-NEXT: xxmrgld vs4, v3, v2
> -; CHECK-P9-NEXT: xxswapd v2, vs3
> -; CHECK-P9-NEXT: vmrglh v0, v0, v1
> -; CHECK-P9-NEXT: mtfprd f2, r4
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r4, f2
> -; CHECK-P9-NEXT: mtfprd f2, r4
> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r4
> ; CHECK-P9-NEXT: mffprwz r4, f1
> -; CHECK-P9-NEXT: mtfprd f1, r4
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r4, f1
> -; CHECK-P9-NEXT: mtfprd f1, r4
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: vmrglw v2, v2, v0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r4
> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2
> ; CHECK-P9-NEXT: stxv vs0, 16(r3)
> -; CHECK-P9-NEXT: stxv vs4, 0(r3)
> ; CHECK-P9-NEXT: blr
> ;
> ; CHECK-BE-LABEL: test16elt:
> @@ -639,12 +579,10 @@ define i32 @test2elt_signed(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: vmrglh v2, v2, v3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: vmrghh v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -653,15 +591,13 @@ define i32 @test2elt_signed(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P9: # %bb.0: # %entry
> ; CHECK-P9-NEXT: xscvdpsxws f0, v2
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: li r3, 0
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -700,18 +636,14 @@ define i64 @test4elt_signed(<4 x double>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r3, f2
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: mtfprd f3, r4
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: xxswapd v2, vs2
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: xxswapd v4, vs3
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: xxswapd v5, vs1
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> +; CHECK-P8-NEXT: vmrghh v2, v4, v2
> +; CHECK-P8-NEXT: vmrghh v3, v5, v3
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> @@ -725,22 +657,18 @@ define i64 @test4elt_signed(<4 x double>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> @@ -799,36 +727,28 @@ define <8 x i16> @test8elt_signed(<8 x double>*
> nocapture readonly) local_unname
> ; CHECK-P8-NEXT: xxswapd vs3, vs3
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: mtfprd f4, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r3, f6
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs4
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r4, f7
> -; CHECK-P8-NEXT: mtfprd f6, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs5
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mtfprd f7, r4
> -; CHECK-P8-NEXT: xxswapd v4, vs6
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v1, vs7
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> ; CHECK-P8-NEXT: mffprwz r3, f2
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v5, vs0
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: xxswapd v0, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: xxswapd v6, vs2
> -; CHECK-P8-NEXT: vmrglh v2, v5, v2
> -; CHECK-P8-NEXT: xxswapd v5, vs0
> -; CHECK-P8-NEXT: vmrglh v3, v0, v3
> -; CHECK-P8-NEXT: vmrglh v4, v6, v4
> -; CHECK-P8-NEXT: vmrglh v5, v5, v1
> +; CHECK-P8-NEXT: vmrghh v2, v0, v2
> +; CHECK-P8-NEXT: vmrghh v3, v1, v3
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: vmrghh v4, v0, v4
> +; CHECK-P8-NEXT: vmrghh v5, v1, v5
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> ; CHECK-P8-NEXT: vmrglw v3, v5, v4
> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> @@ -840,47 +760,39 @@ define <8 x i16> @test8elt_signed(<8 x double>*
> nocapture readonly) local_unname
> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
> ; CHECK-P9-NEXT: xxswapd vs3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
> ; CHECK-P9-NEXT: lxv vs0, 48(r3)
> ; CHECK-P9-NEXT: lxv vs1, 32(r3)
> -; CHECK-P9-NEXT: lxv vs2, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f4
> -; CHECK-P9-NEXT: mtfprd f4, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: xxswapd v2, vs4
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
> ; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: mffprwz r3, f1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
> ; CHECK-P9-NEXT: blr
> @@ -944,209 +856,177 @@ entry:
> define void @test16elt_signed(<16 x i16>* noalias nocapture sret
> %agg.result, <16 x double>* nocapture readonly) local_unnamed_addr #3 {
> ; CHECK-P8-LABEL: test16elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
> ; CHECK-P8-NEXT: li r5, 16
> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
> ; CHECK-P8-NEXT: li r6, 32
> +; CHECK-P8-NEXT: li r7, 48
> ; CHECK-P8-NEXT: lxvd2x vs1, r4, r5
> ; CHECK-P8-NEXT: lxvd2x vs2, r4, r6
> -; CHECK-P8-NEXT: li r6, 48
> -; CHECK-P8-NEXT: lxvd2x vs3, r4, r6
> ; CHECK-P8-NEXT: li r6, 64
> -; CHECK-P8-NEXT: xscvdpsxws f4, f0
> +; CHECK-P8-NEXT: lxvd2x vs3, r4, r7
> ; CHECK-P8-NEXT: lxvd2x vs5, r4, r6
> -; CHECK-P8-NEXT: li r6, 80
> +; CHECK-P8-NEXT: li r7, 80
> +; CHECK-P8-NEXT: li r6, 96
> +; CHECK-P8-NEXT: xscvdpsxws f4, f0
> +; CHECK-P8-NEXT: lxvd2x vs7, r4, r7
> +; CHECK-P8-NEXT: lxvd2x vs10, r4, r6
> +; CHECK-P8-NEXT: li r6, 112
> ; CHECK-P8-NEXT: xxswapd vs0, vs0
> ; CHECK-P8-NEXT: xscvdpsxws f6, f1
> -; CHECK-P8-NEXT: lxvd2x vs7, r4, r6
> -; CHECK-P8-NEXT: li r6, 96
> ; CHECK-P8-NEXT: xxswapd vs1, vs1
> ; CHECK-P8-NEXT: xscvdpsxws f8, f2
> -; CHECK-P8-NEXT: lxvd2x vs9, r4, r6
> -; CHECK-P8-NEXT: li r6, 112
> ; CHECK-P8-NEXT: xxswapd vs2, vs2
> -; CHECK-P8-NEXT: xscvdpsxws f10, f3
> -; CHECK-P8-NEXT: lxvd2x vs11, r4, r6
> +; CHECK-P8-NEXT: xscvdpsxws f9, f3
> ; CHECK-P8-NEXT: xxswapd vs3, vs3
> -; CHECK-P8-NEXT: xscvdpsxws f12, f5
> +; CHECK-P8-NEXT: xscvdpsxws f11, f5
> ; CHECK-P8-NEXT: xxswapd vs5, vs5
> -; CHECK-P8-NEXT: xscvdpsxws f13, f7
> +; CHECK-P8-NEXT: xscvdpsxws f12, f7
> ; CHECK-P8-NEXT: xxswapd vs7, vs7
> -; CHECK-P8-NEXT: xscvdpsxws v2, f9
> -; CHECK-P8-NEXT: xxswapd vs9, vs9
> -; CHECK-P8-NEXT: mffprwz r4, f4
> -; CHECK-P8-NEXT: xscvdpsxws v3, f11
> -; CHECK-P8-NEXT: xxswapd vs11, vs11
> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: mffprwz r6, f6
> -; CHECK-P8-NEXT: mtfprd f4, r4
> +; CHECK-P8-NEXT: mffprwz r7, f4
> +; CHECK-P8-NEXT: lxvd2x vs4, r4, r6
> +; CHECK-P8-NEXT: mffprwz r4, f6
> +; CHECK-P8-NEXT: xscvdpsxws f13, f10
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r4, f8
> +; CHECK-P8-NEXT: xscvdpsxws f6, f4
> +; CHECK-P8-NEXT: mtvsrd v4, r4
> +; CHECK-P8-NEXT: mffprwz r4, f9
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> +; CHECK-P8-NEXT: mffprwz r4, f11
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xxswapd v4, vs4
> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
> -; CHECK-P8-NEXT: mtfprd f6, r6
> -; CHECK-P8-NEXT: mffprwz r6, f10
> -; CHECK-P8-NEXT: mtfprd f8, r4
> -; CHECK-P8-NEXT: xxswapd v5, vs6
> +; CHECK-P8-NEXT: mtvsrd v0, r4
> ; CHECK-P8-NEXT: mffprwz r4, f12
> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: xxswapd v0, vs8
> -; CHECK-P8-NEXT: mtfprd f10, r6
> -; CHECK-P8-NEXT: mffprwz r6, f13
> -; CHECK-P8-NEXT: mtfprd f12, r4
> -; CHECK-P8-NEXT: xxswapd v1, vs10
> -; CHECK-P8-NEXT: mfvsrwz r4, v2
> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: mffprwz r4, f13
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xxswapd v6, vs12
> -; CHECK-P8-NEXT: xscvdpsxws f9, f9
> -; CHECK-P8-NEXT: mtfprd f13, r6
> -; CHECK-P8-NEXT: mfvsrwz r6, v3
> -; CHECK-P8-NEXT: mtvsrd v2, r4
> -; CHECK-P8-NEXT: xxswapd v7, vs13
> +; CHECK-P8-NEXT: mtvsrd v6, r4
> +; CHECK-P8-NEXT: mffprwz r4, f6
> +; CHECK-P8-NEXT: xxswapd vs6, vs10
> +; CHECK-P8-NEXT: xscvdpsxws f5, f5
> +; CHECK-P8-NEXT: mtvsrd v7, r4
> ; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: xxswapd vs0, vs4
> +; CHECK-P8-NEXT: mtvsrd v2, r7
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f1
> ; CHECK-P8-NEXT: xscvdpsxws f7, f7
> -; CHECK-P8-NEXT: xxswapd v2, v2
> -; CHECK-P8-NEXT: xscvdpsxws f11, f11
> -; CHECK-P8-NEXT: mtvsrd v3, r6
> -; CHECK-P8-NEXT: mffprwz r6, f1
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: xxswapd v3, v3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r4, f2
> -; CHECK-P8-NEXT: mtfprd f1, r6
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> -; CHECK-P8-NEXT: mtfprd f2, r4
> +; CHECK-P8-NEXT: xscvdpsxws f4, f6
> +; CHECK-P8-NEXT: vmrghh v2, v8, v2
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f3
> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
> +; CHECK-P8-NEXT: vmrghh v3, v9, v3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: xxswapd v9, vs1
> -; CHECK-P8-NEXT: mffprwz r6, f3
> -; CHECK-P8-NEXT: xxswapd v10, vs2
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: mffprwz r4, f9
> -; CHECK-P8-NEXT: mtfprd f3, r6
> -; CHECK-P8-NEXT: mffprwz r6, f7
> -; CHECK-P8-NEXT: mtfprd f9, r4
> -; CHECK-P8-NEXT: mffprwz r4, f11
> -; CHECK-P8-NEXT: vmrglh v4, v8, v4
> -; CHECK-P8-NEXT: xxswapd v8, vs3
> -; CHECK-P8-NEXT: vmrglh v5, v9, v5
> -; CHECK-P8-NEXT: xxswapd v9, vs5
> -; CHECK-P8-NEXT: mtfprd f7, r6
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: vmrglh v0, v10, v0
> -; CHECK-P8-NEXT: xxswapd v10, vs7
> -; CHECK-P8-NEXT: vmrglh v1, v8, v1
> -; CHECK-P8-NEXT: xxswapd v8, vs9
> -; CHECK-P8-NEXT: vmrglh v6, v9, v6
> -; CHECK-P8-NEXT: xxswapd v9, vs0
> -; CHECK-P8-NEXT: vmrglh v7, v10, v7
> -; CHECK-P8-NEXT: vmrglh v2, v8, v2
> -; CHECK-P8-NEXT: vmrglh v3, v9, v3
> -; CHECK-P8-NEXT: vmrglw v4, v5, v4
> -; CHECK-P8-NEXT: vmrglw v5, v1, v0
> -; CHECK-P8-NEXT: vmrglw v0, v7, v6
> +; CHECK-P8-NEXT: vmrghh v4, v8, v4
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f7
> +; CHECK-P8-NEXT: vmrghh v5, v9, v5
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> +; CHECK-P8-NEXT: mffprwz r4, f4
> +; CHECK-P8-NEXT: vmrghh v0, v8, v0
> +; CHECK-P8-NEXT: mtvsrd v8, r4
> +; CHECK-P8-NEXT: mffprwz r4, f0
> +; CHECK-P8-NEXT: vmrghh v1, v9, v1
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> +; CHECK-P8-NEXT: vmrghh v6, v8, v6
> +; CHECK-P8-NEXT: vmrghh v7, v9, v7
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: vmrglw v3, v5, v4
> +; CHECK-P8-NEXT: vmrglw v4, v1, v0
> +; CHECK-P8-NEXT: vmrglw v5, v7, v6
> +; CHECK-P8-NEXT: xxmrgld v2, v3, v2
> +; CHECK-P8-NEXT: stvx v2, 0, r3
> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4
> -; CHECK-P8-NEXT: stvx v3, 0, r3
> -; CHECK-P8-NEXT: xxmrgld v2, v2, v0
> -; CHECK-P8-NEXT: stvx v2, r3, r5
> +; CHECK-P8-NEXT: stvx v3, r3, r5
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: test16elt_signed:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: lxv vs4, 0(r4)
> -; CHECK-P9-NEXT: lxv vs3, 16(r4)
> -; CHECK-P9-NEXT: lxv vs2, 32(r4)
> -; CHECK-P9-NEXT: xscvdpsxws f5, f4
> -; CHECK-P9-NEXT: lxv vs1, 48(r4)
> -; CHECK-P9-NEXT: xscvdpsxws f6, f3
> -; CHECK-P9-NEXT: lxv vs0, 64(r4)
> -; CHECK-P9-NEXT: xscvdpsxws f7, f2
> -; CHECK-P9-NEXT: xscvdpsxws f8, f1
> -; CHECK-P9-NEXT: xxswapd vs4, vs4
> -; CHECK-P9-NEXT: xscvdpsxws f4, f4
> -; CHECK-P9-NEXT: mffprwz r5, f5
> -; CHECK-P9-NEXT: xscvdpsxws f9, f0
> +; CHECK-P9-NEXT: lxv vs3, 0(r4)
> +; CHECK-P9-NEXT: lxv vs2, 16(r4)
> +; CHECK-P9-NEXT: lxv vs1, 32(r4)
> +; CHECK-P9-NEXT: xscvdpsxws f4, f3
> +; CHECK-P9-NEXT: lxv vs0, 48(r4)
> +; CHECK-P9-NEXT: xscvdpsxws f5, f2
> +; CHECK-P9-NEXT: xscvdpsxws f6, f1
> ; CHECK-P9-NEXT: xxswapd vs3, vs3
> +; CHECK-P9-NEXT: xscvdpsxws f7, f0
> +; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: mffprwz r5, f4
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> -; CHECK-P9-NEXT: mtfprd f5, r5
> -; CHECK-P9-NEXT: mffprwz r5, f6
> ; CHECK-P9-NEXT: xxswapd vs2, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: mtfprd f6, r5
> +; CHECK-P9-NEXT: mtvsrd v2, r5
> +; CHECK-P9-NEXT: mffprwz r5, f5
> +; CHECK-P9-NEXT: mtvsrd v3, r5
> +; CHECK-P9-NEXT: mffprwz r5, f6
> +; CHECK-P9-NEXT: mtvsrd v4, r5
> ; CHECK-P9-NEXT: mffprwz r5, f7
> -; CHECK-P9-NEXT: mtfprd f7, r5
> -; CHECK-P9-NEXT: mffprwz r5, f8
> -; CHECK-P9-NEXT: mtfprd f8, r5
> -; CHECK-P9-NEXT: mffprwz r5, f9
> -; CHECK-P9-NEXT: mtfprd f9, r5
> -; CHECK-P9-NEXT: mffprwz r5, f4
> -; CHECK-P9-NEXT: mtfprd f4, r5
> +; CHECK-P9-NEXT: mtvsrd v5, r5
> ; CHECK-P9-NEXT: mffprwz r5, f3
> +; CHECK-P9-NEXT: lxv vs3, 64(r4)
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: xxswapd v2, vs5
> -; CHECK-P9-NEXT: xxswapd v5, vs8
> -; CHECK-P9-NEXT: xxswapd v0, vs9
> -; CHECK-P9-NEXT: mtfprd f3, r5
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> ; CHECK-P9-NEXT: mffprwz r5, f2
> -; CHECK-P9-NEXT: mtfprd f2, r5
> -; CHECK-P9-NEXT: xxswapd vs0, vs0
> -; CHECK-P9-NEXT: xscvdpsxws f0, f0
> -; CHECK-P9-NEXT: xxswapd v1, vs2
> ; CHECK-P9-NEXT: lxv vs2, 80(r4)
> -; CHECK-P9-NEXT: xxswapd v3, vs4
> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs6
> -; CHECK-P9-NEXT: xxswapd v4, vs3
> -; CHECK-P9-NEXT: xscvdpsxws f3, f2
> -; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: vmrghh v2, v2, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> ; CHECK-P9-NEXT: mffprwz r5, f1
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs7
> -; CHECK-P9-NEXT: mtfprd f1, r5
> +; CHECK-P9-NEXT: lxv vs1, 96(r4)
> +; CHECK-P9-NEXT: xscvdpsxws f4, f3
> +; CHECK-P9-NEXT: xxswapd vs3, vs3
> +; CHECK-P9-NEXT: vmrghh v3, v3, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> ; CHECK-P9-NEXT: mffprwz r5, f0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v1
> -; CHECK-P9-NEXT: xxswapd v1, vs1
> -; CHECK-P9-NEXT: mtfprd f0, r5
> -; CHECK-P9-NEXT: vmrglh v5, v5, v1
> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
> -; CHECK-P9-NEXT: xxswapd v1, vs0
> ; CHECK-P9-NEXT: lxv vs0, 112(r4)
> -; CHECK-P9-NEXT: lxv vs1, 96(r4)
> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: vmrghh v4, v4, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r5
> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
> +; CHECK-P9-NEXT: vmrghh v5, v5, v0
> +; CHECK-P9-NEXT: mffprwz r4, f4
> +; CHECK-P9-NEXT: vmrglw v4, v5, v4
> +; CHECK-P9-NEXT: mtvsrd v3, r4
> ; CHECK-P9-NEXT: mffprwz r4, f3
> -; CHECK-P9-NEXT: mtfprd f3, r4
> +; CHECK-P9-NEXT: xscvdpsxws f3, f2
> +; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: xxmrgld vs4, v4, v2
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> +; CHECK-P9-NEXT: stxv vs4, 0(r3)
> +; CHECK-P9-NEXT: mffprwz r4, f3
> +; CHECK-P9-NEXT: mtvsrd v3, r4
> ; CHECK-P9-NEXT: mffprwz r4, f2
> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
> -; CHECK-P9-NEXT: vmrglw v3, v5, v4
> -; CHECK-P9-NEXT: xxmrgld vs4, v3, v2
> -; CHECK-P9-NEXT: xxswapd v2, vs3
> -; CHECK-P9-NEXT: vmrglh v0, v0, v1
> -; CHECK-P9-NEXT: mtfprd f2, r4
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r4, f2
> -; CHECK-P9-NEXT: mtfprd f2, r4
> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r4
> ; CHECK-P9-NEXT: mffprwz r4, f1
> -; CHECK-P9-NEXT: mtfprd f1, r4
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r4, f1
> -; CHECK-P9-NEXT: mtfprd f1, r4
> +; CHECK-P9-NEXT: mtvsrd v4, r4
> ; CHECK-P9-NEXT: mffprwz r4, f0
> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: vmrglw v2, v2, v0
> -; CHECK-P9-NEXT: mtfprd f0, r4
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r4
> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2
> ; CHECK-P9-NEXT: stxv vs0, 16(r3)
> -; CHECK-P9-NEXT: stxv vs4, 0(r3)
> ; CHECK-P9-NEXT: blr
> ;
> ; CHECK-BE-LABEL: test16elt_signed:
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
> index 369fb3f10100..173ced964ad6 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
> @@ -16,12 +16,10 @@ define i64 @test2elt(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvdpuxws f1, v2
> ; CHECK-P8-NEXT: xscvdpuxws f0, f0
> ; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: mtvsrwz v2, r3
> ; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: vmrglw v2, v2, v3
> +; CHECK-P8-NEXT: mtvsrwz v3, r4
> +; CHECK-P8-NEXT: vmrghw v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -35,7 +33,7 @@ define i64 @test2elt(<2 x double> %a) local_unnamed_addr
> #0 {
> ; CHECK-P9-NEXT: xscvdpuxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> ; CHECK-P9-NEXT: mtvsrws v2, r3
> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
> +; CHECK-P9-NEXT: vmrghw v2, v3, v2
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -310,12 +308,10 @@ define i64 @test2elt_signed(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: mtvsrwz v2, r3
> ; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: vmrglw v2, v2, v3
> +; CHECK-P8-NEXT: mtvsrwz v3, r4
> +; CHECK-P8-NEXT: vmrghw v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -329,7 +325,7 @@ define i64 @test2elt_signed(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> ; CHECK-P9-NEXT: mtvsrws v2, r3
> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
> +; CHECK-P9-NEXT: vmrghw v2, v3, v2
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> ;
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
> index fb13d1bd71f5..fd28d9a1afdc 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
> @@ -16,12 +16,10 @@ define i16 @test2elt(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: vmrglb v2, v2, v3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: vmrghb v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: clrldi r3, r3, 48
> @@ -33,15 +31,13 @@ define i16 @test2elt(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P9: # %bb.0: # %entry
> ; CHECK-P9-NEXT: xscvdpsxws f0, v2
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: addi r3, r1, -2
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8
> ; CHECK-P9-NEXT: stxsihx v2, 0, r3
> ; CHECK-P9-NEXT: lhz r3, -2(r1)
> @@ -84,18 +80,14 @@ define i32 @test4elt(<4 x double>* nocapture readonly)
> local_unnamed_addr #1 {
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r3, f2
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: mtfprd f3, r4
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: xxswapd v2, vs2
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: xxswapd v4, vs3
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: xxswapd v5, vs1
> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
> -; CHECK-P8-NEXT: vmrglb v3, v5, v4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> +; CHECK-P8-NEXT: vmrghb v2, v4, v2
> +; CHECK-P8-NEXT: vmrghb v3, v5, v3
> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprwz r3, f0
> @@ -109,24 +101,20 @@ define i32 @test4elt(<4 x double>* nocapture
> readonly) local_unnamed_addr #1 {
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: li r3, 0
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> +; CHECK-P9-NEXT: vmrglh v2, v3, v2
> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -185,36 +173,28 @@ define i64 @test8elt(<8 x double>* nocapture
> readonly) local_unnamed_addr #1 {
> ; CHECK-P8-NEXT: xxswapd vs3, vs3
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: mtfprd f4, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r3, f6
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs4
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r4, f7
> -; CHECK-P8-NEXT: mtfprd f6, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs5
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mtfprd f7, r4
> -; CHECK-P8-NEXT: xxswapd v4, vs6
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v1, vs7
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> ; CHECK-P8-NEXT: mffprwz r3, f2
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v5, vs0
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: xxswapd v0, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: xxswapd v6, vs2
> -; CHECK-P8-NEXT: vmrglb v2, v5, v2
> -; CHECK-P8-NEXT: xxswapd v5, vs0
> -; CHECK-P8-NEXT: vmrglb v3, v0, v3
> -; CHECK-P8-NEXT: vmrglb v4, v6, v4
> -; CHECK-P8-NEXT: vmrglb v5, v5, v1
> +; CHECK-P8-NEXT: vmrghb v2, v0, v2
> +; CHECK-P8-NEXT: vmrghb v3, v1, v3
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: vmrghb v4, v0, v4
> +; CHECK-P8-NEXT: vmrghb v5, v1, v5
> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
> ; CHECK-P8-NEXT: vmrglh v3, v5, v4
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> @@ -228,47 +208,39 @@ define i64 @test8elt(<8 x double>* nocapture
> readonly) local_unnamed_addr #1 {
> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
> ; CHECK-P9-NEXT: xxswapd vs3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
> ; CHECK-P9-NEXT: lxv vs0, 48(r3)
> ; CHECK-P9-NEXT: lxv vs1, 32(r3)
> -; CHECK-P9-NEXT: lxv vs2, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f4
> -; CHECK-P9-NEXT: mtfprd f4, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: xxswapd v2, vs4
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
> ; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: mffprwz r3, f1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> @@ -364,79 +336,63 @@ define <16 x i8> @test16elt(<16 x double>* nocapture
> readonly) local_unnamed_add
> ; CHECK-P8-NEXT: xxswapd vs7, vs7
> ; CHECK-P8-NEXT: xscvdpsxws v2, f9
> ; CHECK-P8-NEXT: xxswapd vs9, vs9
> -; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: xscvdpsxws v3, f11
> ; CHECK-P8-NEXT: xxswapd vs11, vs11
> +; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: mffprwz r4, f6
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: mtfprd f4, r3
> -; CHECK-P8-NEXT: mffprwz r3, f8
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xxswapd v4, vs4
> -; CHECK-P8-NEXT: mtfprd f6, r4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mffprwz r3, f8
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> ; CHECK-P8-NEXT: mffprwz r4, f10
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> -; CHECK-P8-NEXT: xxswapd v5, vs6
> -; CHECK-P8-NEXT: mtfprd f8, r3
> -; CHECK-P8-NEXT: mffprwz r3, f12
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xxswapd v0, vs8
> -; CHECK-P8-NEXT: mtfprd f10, r4
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mffprwz r3, f12
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> ; CHECK-P8-NEXT: mffprwz r4, f13
> ; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: xxswapd v1, vs10
> -; CHECK-P8-NEXT: mtfprd f12, r3
> -; CHECK-P8-NEXT: mfvsrwz r3, v2
> ; CHECK-P8-NEXT: xscvdpsxws f7, f7
> -; CHECK-P8-NEXT: xxswapd v6, vs12
> -; CHECK-P8-NEXT: mtfprd f13, r4
> +; CHECK-P8-NEXT: mtvsrd v6, r3
> +; CHECK-P8-NEXT: mfvsrwz r3, v2
> +; CHECK-P8-NEXT: mtvsrd v2, r4
> ; CHECK-P8-NEXT: mfvsrwz r4, v3
> -; CHECK-P8-NEXT: mtvsrd v2, r3
> -; CHECK-P8-NEXT: xxswapd v7, vs13
> -; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: xscvdpsxws f9, f9
> -; CHECK-P8-NEXT: xxswapd v2, v2
> ; CHECK-P8-NEXT: xscvdpsxws f11, f11
> -; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> +; CHECK-P8-NEXT: mtvsrd v7, r4
> +; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v3, v3
> +; CHECK-P8-NEXT: mtvsrd v8, r3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r3, f2
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: xxswapd v9, vs1
> +; CHECK-P8-NEXT: vmrghb v4, v8, v4
> +; CHECK-P8-NEXT: vmrghb v5, v9, v5
> +; CHECK-P8-NEXT: mtvsrd v8, r3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r3, f5
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: xxswapd v10, vs2
> ; CHECK-P8-NEXT: mffprwz r4, f7
> -; CHECK-P8-NEXT: mtfprd f5, r3
> +; CHECK-P8-NEXT: vmrghb v0, v8, v0
> +; CHECK-P8-NEXT: vmrghb v1, v9, v1
> +; CHECK-P8-NEXT: mtvsrd v8, r3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r3, f9
> -; CHECK-P8-NEXT: mtfprd f7, r4
> ; CHECK-P8-NEXT: mffprwz r4, f11
> -; CHECK-P8-NEXT: vmrglb v4, v8, v4
> -; CHECK-P8-NEXT: xxswapd v8, vs3
> -; CHECK-P8-NEXT: vmrglb v5, v9, v5
> -; CHECK-P8-NEXT: xxswapd v9, vs5
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: vmrglb v0, v10, v0
> -; CHECK-P8-NEXT: xxswapd v10, vs7
> -; CHECK-P8-NEXT: vmrglb v1, v8, v1
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> -; CHECK-P8-NEXT: vmrglb v6, v9, v6
> -; CHECK-P8-NEXT: xxswapd v9, vs1
> -; CHECK-P8-NEXT: vmrglb v7, v10, v7
> -; CHECK-P8-NEXT: vmrglb v2, v8, v2
> -; CHECK-P8-NEXT: vmrglb v3, v9, v3
> +; CHECK-P8-NEXT: vmrghb v6, v8, v6
> +; CHECK-P8-NEXT: vmrghb v2, v9, v2
> +; CHECK-P8-NEXT: mtvsrd v8, r3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> +; CHECK-P8-NEXT: vmrghb v3, v8, v3
> +; CHECK-P8-NEXT: vmrghb v7, v9, v7
> ; CHECK-P8-NEXT: vmrglh v4, v5, v4
> ; CHECK-P8-NEXT: vmrglh v5, v1, v0
> -; CHECK-P8-NEXT: vmrglh v0, v7, v6
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
> -; CHECK-P8-NEXT: vmrglw v2, v2, v0
> -; CHECK-P8-NEXT: xxmrgld v2, v2, v3
> +; CHECK-P8-NEXT: vmrglh v2, v2, v6
> +; CHECK-P8-NEXT: vmrglh v3, v7, v3
> +; CHECK-P8-NEXT: vmrglw v4, v5, v4
> +; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: xxmrgld v2, v2, v4
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: test16elt:
> @@ -445,94 +401,78 @@ define <16 x i8> @test16elt(<16 x double>* nocapture
> readonly) local_unnamed_add
> ; CHECK-P9-NEXT: xscvdpsxws f8, f7
> ; CHECK-P9-NEXT: xxswapd vs7, vs7
> ; CHECK-P9-NEXT: xscvdpsxws f7, f7
> +; CHECK-P9-NEXT: lxv vs6, 16(r3)
> ; CHECK-P9-NEXT: lxv vs0, 112(r3)
> ; CHECK-P9-NEXT: lxv vs1, 96(r3)
> ; CHECK-P9-NEXT: lxv vs2, 80(r3)
> ; CHECK-P9-NEXT: lxv vs3, 64(r3)
> ; CHECK-P9-NEXT: lxv vs4, 48(r3)
> ; CHECK-P9-NEXT: lxv vs5, 32(r3)
> -; CHECK-P9-NEXT: lxv vs6, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f8
> -; CHECK-P9-NEXT: mtfprd f8, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f7
> -; CHECK-P9-NEXT: xxswapd v2, vs8
> -; CHECK-P9-NEXT: mtfprd f7, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs7
> ; CHECK-P9-NEXT: xscvdpsxws f7, f6
> ; CHECK-P9-NEXT: xxswapd vs6, vs6
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f6, f6
> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f7
> -; CHECK-P9-NEXT: mtfprd f7, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f6
> -; CHECK-P9-NEXT: mtfprd f6, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs6
> ; CHECK-P9-NEXT: xscvdpsxws f6, f5
> ; CHECK-P9-NEXT: xxswapd vs5, vs5
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f5, f5
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f6
> -; CHECK-P9-NEXT: mtfprd f6, r3
> -; CHECK-P9-NEXT: mffprwz r3, f5
> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs7
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs6
> -; CHECK-P9-NEXT: mtfprd f5, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs5
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: mffprwz r3, f5
> ; CHECK-P9-NEXT: xscvdpsxws f5, f4
> ; CHECK-P9-NEXT: xxswapd vs4, vs4
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f5
> -; CHECK-P9-NEXT: mtfprd f5, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f4
> -; CHECK-P9-NEXT: mtfprd f4, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs4
> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
> ; CHECK-P9-NEXT: xxswapd vs3, vs3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs5
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r3, f4
> -; CHECK-P9-NEXT: mtfprd f4, r3
> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
> ; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs4
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs3
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: vmrglh v3, v4, v3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> -; CHECK-P9-NEXT: xxswapd v0, vs0
> -; CHECK-P9-NEXT: vmrglb v5, v5, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r3
> +; CHECK-P9-NEXT: vmrghb v5, v5, v0
> ; CHECK-P9-NEXT: vmrglh v4, v5, v4
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
> @@ -649,12 +589,10 @@ define i16 @test2elt_signed(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: mffprwz r3, f1
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r4, f0
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: xxswapd v3, vs1
> -; CHECK-P8-NEXT: vmrglb v2, v2, v3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: vmrghb v2, v2, v3
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: clrldi r3, r3, 48
> @@ -666,15 +604,13 @@ define i16 @test2elt_signed(<2 x double> %a)
> local_unnamed_addr #0 {
> ; CHECK-P9: # %bb.0: # %entry
> ; CHECK-P9-NEXT: xscvdpsxws f0, v2
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs0
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: addi r3, r1, -2
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8
> ; CHECK-P9-NEXT: stxsihx v2, 0, r3
> ; CHECK-P9-NEXT: lhz r3, -2(r1)
> @@ -717,18 +653,14 @@ define i32 @test4elt_signed(<4 x double>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> ; CHECK-P8-NEXT: mffprwz r3, f2
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: mtfprd f3, r4
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: xxswapd v2, vs2
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: xxswapd v4, vs3
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: xxswapd v5, vs1
> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
> -; CHECK-P8-NEXT: vmrglb v3, v5, v4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> +; CHECK-P8-NEXT: vmrghb v2, v4, v2
> +; CHECK-P8-NEXT: vmrghb v3, v5, v3
> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprwz r3, f0
> @@ -742,24 +674,20 @@ define i32 @test4elt_signed(<4 x double>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: xxswapd v2, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs1
> -; CHECK-P9-NEXT: xxswapd v4, vs0
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: li r3, 0
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> +; CHECK-P9-NEXT: vmrglh v2, v3, v2
> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -818,36 +746,28 @@ define i64 @test8elt_signed(<8 x double>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P8-NEXT: xxswapd vs3, vs3
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> +; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: mffprwz r4, f5
> -; CHECK-P8-NEXT: mtfprd f4, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: mffprwz r3, f6
> -; CHECK-P8-NEXT: mtfprd f5, r4
> -; CHECK-P8-NEXT: xxswapd v2, vs4
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: mffprwz r4, f7
> -; CHECK-P8-NEXT: mtfprd f6, r3
> -; CHECK-P8-NEXT: xxswapd v3, vs5
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> ; CHECK-P8-NEXT: mffprwz r3, f0
> -; CHECK-P8-NEXT: mtfprd f7, r4
> -; CHECK-P8-NEXT: xxswapd v4, vs6
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v1, vs7
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> ; CHECK-P8-NEXT: mffprwz r3, f2
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v5, vs0
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: xxswapd v0, vs1
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: xxswapd v6, vs2
> -; CHECK-P8-NEXT: vmrglb v2, v5, v2
> -; CHECK-P8-NEXT: xxswapd v5, vs0
> -; CHECK-P8-NEXT: vmrglb v3, v0, v3
> -; CHECK-P8-NEXT: vmrglb v4, v6, v4
> -; CHECK-P8-NEXT: vmrglb v5, v5, v1
> +; CHECK-P8-NEXT: vmrghb v2, v0, v2
> +; CHECK-P8-NEXT: vmrghb v3, v1, v3
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> +; CHECK-P8-NEXT: vmrghb v4, v0, v4
> +; CHECK-P8-NEXT: vmrghb v5, v1, v5
> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
> ; CHECK-P8-NEXT: vmrglh v3, v5, v4
> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
> @@ -861,47 +781,39 @@ define i64 @test8elt_signed(<8 x double>* nocapture
> readonly) local_unnamed_addr
> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
> ; CHECK-P9-NEXT: xxswapd vs3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
> ; CHECK-P9-NEXT: lxv vs0, 48(r3)
> ; CHECK-P9-NEXT: lxv vs1, 32(r3)
> -; CHECK-P9-NEXT: lxv vs2, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f4
> -; CHECK-P9-NEXT: mtfprd f4, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: xxswapd v2, vs4
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
> ; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs3
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: mffprwz r3, f1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs1
> -; CHECK-P9-NEXT: xxswapd v5, vs0
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> @@ -997,79 +909,63 @@ define <16 x i8> @test16elt_signed(<16 x double>*
> nocapture readonly) local_unna
> ; CHECK-P8-NEXT: xxswapd vs7, vs7
> ; CHECK-P8-NEXT: xscvdpsxws v2, f9
> ; CHECK-P8-NEXT: xxswapd vs9, vs9
> -; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: xscvdpsxws v3, f11
> ; CHECK-P8-NEXT: xxswapd vs11, vs11
> +; CHECK-P8-NEXT: mffprwz r3, f4
> ; CHECK-P8-NEXT: mffprwz r4, f6
> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
> -; CHECK-P8-NEXT: mtfprd f4, r3
> -; CHECK-P8-NEXT: mffprwz r3, f8
> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
> -; CHECK-P8-NEXT: xxswapd v4, vs4
> -; CHECK-P8-NEXT: mtfprd f6, r4
> +; CHECK-P8-NEXT: mtvsrd v4, r3
> +; CHECK-P8-NEXT: mffprwz r3, f8
> +; CHECK-P8-NEXT: mtvsrd v5, r4
> ; CHECK-P8-NEXT: mffprwz r4, f10
> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
> -; CHECK-P8-NEXT: xxswapd v5, vs6
> -; CHECK-P8-NEXT: mtfprd f8, r3
> -; CHECK-P8-NEXT: mffprwz r3, f12
> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
> -; CHECK-P8-NEXT: xxswapd v0, vs8
> -; CHECK-P8-NEXT: mtfprd f10, r4
> +; CHECK-P8-NEXT: mtvsrd v0, r3
> +; CHECK-P8-NEXT: mffprwz r3, f12
> +; CHECK-P8-NEXT: mtvsrd v1, r4
> ; CHECK-P8-NEXT: mffprwz r4, f13
> ; CHECK-P8-NEXT: xscvdpsxws f5, f5
> -; CHECK-P8-NEXT: xxswapd v1, vs10
> -; CHECK-P8-NEXT: mtfprd f12, r3
> -; CHECK-P8-NEXT: mfvsrwz r3, v2
> ; CHECK-P8-NEXT: xscvdpsxws f7, f7
> -; CHECK-P8-NEXT: xxswapd v6, vs12
> -; CHECK-P8-NEXT: mtfprd f13, r4
> +; CHECK-P8-NEXT: mtvsrd v6, r3
> +; CHECK-P8-NEXT: mfvsrwz r3, v2
> +; CHECK-P8-NEXT: mtvsrd v2, r4
> ; CHECK-P8-NEXT: mfvsrwz r4, v3
> -; CHECK-P8-NEXT: mtvsrd v2, r3
> -; CHECK-P8-NEXT: xxswapd v7, vs13
> -; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: xscvdpsxws f9, f9
> -; CHECK-P8-NEXT: xxswapd v2, v2
> ; CHECK-P8-NEXT: xscvdpsxws f11, f11
> -; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> +; CHECK-P8-NEXT: mtvsrd v7, r4
> +; CHECK-P8-NEXT: mffprwz r3, f0
> ; CHECK-P8-NEXT: mffprwz r4, f1
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: xxswapd v3, v3
> +; CHECK-P8-NEXT: mtvsrd v8, r3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r3, f2
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> ; CHECK-P8-NEXT: mffprwz r4, f3
> -; CHECK-P8-NEXT: mtfprd f2, r3
> -; CHECK-P8-NEXT: xxswapd v9, vs1
> +; CHECK-P8-NEXT: vmrghb v4, v8, v4
> +; CHECK-P8-NEXT: vmrghb v5, v9, v5
> +; CHECK-P8-NEXT: mtvsrd v8, r3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r3, f5
> -; CHECK-P8-NEXT: mtfprd f3, r4
> -; CHECK-P8-NEXT: xxswapd v10, vs2
> ; CHECK-P8-NEXT: mffprwz r4, f7
> -; CHECK-P8-NEXT: mtfprd f5, r3
> +; CHECK-P8-NEXT: vmrghb v0, v8, v0
> +; CHECK-P8-NEXT: vmrghb v1, v9, v1
> +; CHECK-P8-NEXT: mtvsrd v8, r3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> ; CHECK-P8-NEXT: mffprwz r3, f9
> -; CHECK-P8-NEXT: mtfprd f7, r4
> ; CHECK-P8-NEXT: mffprwz r4, f11
> -; CHECK-P8-NEXT: vmrglb v4, v8, v4
> -; CHECK-P8-NEXT: xxswapd v8, vs3
> -; CHECK-P8-NEXT: vmrglb v5, v9, v5
> -; CHECK-P8-NEXT: xxswapd v9, vs5
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: mtfprd f1, r4
> -; CHECK-P8-NEXT: vmrglb v0, v10, v0
> -; CHECK-P8-NEXT: xxswapd v10, vs7
> -; CHECK-P8-NEXT: vmrglb v1, v8, v1
> -; CHECK-P8-NEXT: xxswapd v8, vs0
> -; CHECK-P8-NEXT: vmrglb v6, v9, v6
> -; CHECK-P8-NEXT: xxswapd v9, vs1
> -; CHECK-P8-NEXT: vmrglb v7, v10, v7
> -; CHECK-P8-NEXT: vmrglb v2, v8, v2
> -; CHECK-P8-NEXT: vmrglb v3, v9, v3
> +; CHECK-P8-NEXT: vmrghb v6, v8, v6
> +; CHECK-P8-NEXT: vmrghb v2, v9, v2
> +; CHECK-P8-NEXT: mtvsrd v8, r3
> +; CHECK-P8-NEXT: mtvsrd v9, r4
> +; CHECK-P8-NEXT: vmrghb v3, v8, v3
> +; CHECK-P8-NEXT: vmrghb v7, v9, v7
> ; CHECK-P8-NEXT: vmrglh v4, v5, v4
> ; CHECK-P8-NEXT: vmrglh v5, v1, v0
> -; CHECK-P8-NEXT: vmrglh v0, v7, v6
> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
> -; CHECK-P8-NEXT: vmrglw v2, v2, v0
> -; CHECK-P8-NEXT: xxmrgld v2, v2, v3
> +; CHECK-P8-NEXT: vmrglh v2, v2, v6
> +; CHECK-P8-NEXT: vmrglh v3, v7, v3
> +; CHECK-P8-NEXT: vmrglw v4, v5, v4
> +; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: xxmrgld v2, v2, v4
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: test16elt_signed:
> @@ -1078,94 +974,78 @@ define <16 x i8> @test16elt_signed(<16 x double>*
> nocapture readonly) local_unna
> ; CHECK-P9-NEXT: xscvdpsxws f8, f7
> ; CHECK-P9-NEXT: xxswapd vs7, vs7
> ; CHECK-P9-NEXT: xscvdpsxws f7, f7
> +; CHECK-P9-NEXT: lxv vs6, 16(r3)
> ; CHECK-P9-NEXT: lxv vs0, 112(r3)
> ; CHECK-P9-NEXT: lxv vs1, 96(r3)
> ; CHECK-P9-NEXT: lxv vs2, 80(r3)
> ; CHECK-P9-NEXT: lxv vs3, 64(r3)
> ; CHECK-P9-NEXT: lxv vs4, 48(r3)
> ; CHECK-P9-NEXT: lxv vs5, 32(r3)
> -; CHECK-P9-NEXT: lxv vs6, 16(r3)
> ; CHECK-P9-NEXT: mffprwz r3, f8
> -; CHECK-P9-NEXT: mtfprd f8, r3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> ; CHECK-P9-NEXT: mffprwz r3, f7
> -; CHECK-P9-NEXT: xxswapd v2, vs8
> -; CHECK-P9-NEXT: mtfprd f7, r3
> -; CHECK-P9-NEXT: xxswapd v3, vs7
> ; CHECK-P9-NEXT: xscvdpsxws f7, f6
> ; CHECK-P9-NEXT: xxswapd vs6, vs6
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: xscvdpsxws f6, f6
> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
> ; CHECK-P9-NEXT: mffprwz r3, f7
> -; CHECK-P9-NEXT: mtfprd f7, r3
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f6
> -; CHECK-P9-NEXT: mtfprd f6, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs6
> ; CHECK-P9-NEXT: xscvdpsxws f6, f5
> ; CHECK-P9-NEXT: xxswapd vs5, vs5
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f5, f5
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f6
> -; CHECK-P9-NEXT: mtfprd f6, r3
> -; CHECK-P9-NEXT: mffprwz r3, f5
> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
> -; CHECK-P9-NEXT: xxswapd v3, vs7
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs6
> -; CHECK-P9-NEXT: mtfprd f5, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs5
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> +; CHECK-P9-NEXT: mffprwz r3, f5
> ; CHECK-P9-NEXT: xscvdpsxws f5, f4
> ; CHECK-P9-NEXT: xxswapd vs4, vs4
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f4, f4
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f5
> -; CHECK-P9-NEXT: mtfprd f5, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f4
> -; CHECK-P9-NEXT: mtfprd f4, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs4
> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
> ; CHECK-P9-NEXT: xxswapd vs3, vs3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs5
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
> ; CHECK-P9-NEXT: mffprwz r3, f4
> -; CHECK-P9-NEXT: mtfprd f4, r3
> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
> +; CHECK-P9-NEXT: mtvsrd v3, r3
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> -; CHECK-P9-NEXT: xxswapd v4, vs3
> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
> ; CHECK-P9-NEXT: xxswapd vs2, vs2
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
> ; CHECK-P9-NEXT: mffprwz r3, f3
> -; CHECK-P9-NEXT: mtfprd f3, r3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs2
> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
> ; CHECK-P9-NEXT: xxswapd vs1, vs1
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
> -; CHECK-P9-NEXT: xxswapd v3, vs4
> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
> -; CHECK-P9-NEXT: xxswapd v4, vs3
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: mffprwz r3, f2
> -; CHECK-P9-NEXT: mtfprd f2, r3
> +; CHECK-P9-NEXT: vmrglh v3, v4, v3
> +; CHECK-P9-NEXT: mtvsrd v4, r3
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: xxswapd v4, vs2
> -; CHECK-P9-NEXT: mtfprd f1, r3
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
> ; CHECK-P9-NEXT: xxswapd vs0, vs0
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
> ; CHECK-P9-NEXT: mffprwz r3, f1
> -; CHECK-P9-NEXT: mtfprd f1, r3
> +; CHECK-P9-NEXT: mtvsrd v5, r3
> ; CHECK-P9-NEXT: mffprwz r3, f0
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
> -; CHECK-P9-NEXT: xxswapd v5, vs1
> -; CHECK-P9-NEXT: xxswapd v0, vs0
> -; CHECK-P9-NEXT: vmrglb v5, v5, v0
> +; CHECK-P9-NEXT: mtvsrd v0, r3
> +; CHECK-P9-NEXT: vmrghb v5, v5, v0
> ; CHECK-P9-NEXT: vmrglh v4, v5, v4
> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
> index e51af62cb128..5ecd34941b39 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
> @@ -24,9 +24,9 @@ define i64 @test2elt(i32 %a.coerce) local_unnamed_addr
> #0 {
> ; CHECK-P8-NEXT: xscvuxdsp f1, f1
> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -43,12 +43,12 @@ define i64 @test2elt(i32 %a.coerce) local_unnamed_addr
> #0 {
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> ; CHECK-P9-NEXT: vextuhrx r3, r3, v2
> ; CHECK-P9-NEXT: clrlwi r3, r3, 16
> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
> ; CHECK-P9-NEXT: mtfprwz f0, r3
> ; CHECK-P9-NEXT: xscvuxdsp f0, f0
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -80,25 +80,17 @@ entry:
> define <4 x float> @test4elt(i64 %a.coerce) local_unnamed_addr #1 {
> ; CHECK-P8-LABEL: test4elt:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: addi r3, r4, .LCPI1_0 at toc@l
> -; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: lvx v3, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v4, v2, v3
> +; CHECK-P8-NEXT: xxlxor v2, v2, v2
> +; CHECK-P8-NEXT: mtvsrd v3, r3
> +; CHECK-P8-NEXT: vmrghh v2, v2, v3
> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: test4elt:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: addis r3, r2, .LCPI1_0 at toc@ha
> -; CHECK-P9-NEXT: addi r3, r3, .LCPI1_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v3, 0, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: xxlxor v4, v4, v4
> -; CHECK-P9-NEXT: vperm v2, v4, v2, v3
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> +; CHECK-P9-NEXT: xxlxor v3, v3, v3
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> ; CHECK-P9-NEXT: xvcvuxwsp v2, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -121,17 +113,11 @@ entry:
> define void @test8elt(<8 x float>* noalias nocapture sret %agg.result, <8
> x i16> %a) local_unnamed_addr #2 {
> ; CHECK-P8-LABEL: test8elt:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
> -; CHECK-P8-NEXT: addis r5, r2, .LCPI2_1 at toc@ha
> -; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_0 at toc@l
> -; CHECK-P8-NEXT: lvx v3, 0, r4
> -; CHECK-P8-NEXT: addi r4, r5, .LCPI2_1 at toc@l
> -; CHECK-P8-NEXT: lvx v5, 0, r4
> +; CHECK-P8-NEXT: xxlxor v3, v3, v3
> ; CHECK-P8-NEXT: li r4, 16
> -; CHECK-P8-NEXT: vperm v3, v4, v2, v3
> -; CHECK-P8-NEXT: vperm v2, v4, v2, v5
> -; CHECK-P8-NEXT: xvcvuxwsp v3, v3
> +; CHECK-P8-NEXT: vmrglh v4, v3, v2
> +; CHECK-P8-NEXT: vmrghh v2, v3, v2
> +; CHECK-P8-NEXT: xvcvuxwsp v3, v4
> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2
> ; CHECK-P8-NEXT: stvx v3, 0, r3
> ; CHECK-P8-NEXT: stvx v2, r3, r4
> @@ -139,19 +125,13 @@ define void @test8elt(<8 x float>* noalias nocapture
> sret %agg.result, <8 x i16>
> ;
> ; CHECK-P9-LABEL: test8elt:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
> -; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l
> -; CHECK-P9-NEXT: lxvx v3, 0, r4
> -; CHECK-P9-NEXT: xxlxor v4, v4, v4
> -; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
> -; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l
> -; CHECK-P9-NEXT: vperm v3, v4, v2, v3
> -; CHECK-P9-NEXT: xvcvuxwsp vs0, v3
> -; CHECK-P9-NEXT: lxvx v3, 0, r4
> -; CHECK-P9-NEXT: vperm v2, v4, v2, v3
> -; CHECK-P9-NEXT: stxv vs0, 0(r3)
> +; CHECK-P9-NEXT: xxlxor v3, v3, v3
> +; CHECK-P9-NEXT: vmrglh v4, v3, v2
> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
> +; CHECK-P9-NEXT: xvcvuxwsp vs0, v4
> ; CHECK-P9-NEXT: xvcvuxwsp vs1, v2
> ; CHECK-P9-NEXT: stxv vs1, 16(r3)
> +; CHECK-P9-NEXT: stxv vs0, 0(r3)
> ; CHECK-P9-NEXT: blr
> ;
> ; CHECK-BE-LABEL: test8elt:
> @@ -276,9 +256,9 @@ define i64 @test2elt_signed(i32 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvsxdsp f1, f1
> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -295,12 +275,12 @@ define i64 @test2elt_signed(i32 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> ; CHECK-P9-NEXT: vextuhrx r3, r3, v2
> ; CHECK-P9-NEXT: extsh r3, r3
> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
> ; CHECK-P9-NEXT: mtfprwa f0, r3
> ; CHECK-P9-NEXT: xscvsxdsp f0, f0
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -332,11 +312,10 @@ entry:
> define <4 x float> @test4elt_signed(i64 %a.coerce) local_unnamed_addr #1 {
> ; CHECK-P8-LABEL: test4elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: mtfprd f0, r3
> +; CHECK-P8-NEXT: mtvsrd v2, r3
> ; CHECK-P8-NEXT: vspltisw v3, 8
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> +; CHECK-P8-NEXT: vmrghh v2, v2, v2
> ; CHECK-P8-NEXT: vadduwm v3, v3, v3
> -; CHECK-P8-NEXT: vmrglh v2, v2, v2
> ; CHECK-P8-NEXT: vslw v2, v2, v3
> ; CHECK-P8-NEXT: vsraw v2, v2, v3
> ; CHECK-P8-NEXT: xvcvsxwsp v2, v2
> @@ -344,9 +323,8 @@ define <4 x float> @test4elt_signed(i64 %a.coerce)
> local_unnamed_addr #1 {
> ;
> ; CHECK-P9-LABEL: test4elt_signed:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: mtfprd f0, r3
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vmrglh v2, v2, v2
> +; CHECK-P9-NEXT: mtvsrd v2, r3
> +; CHECK-P9-NEXT: vmrghh v2, v2, v2
> ; CHECK-P9-NEXT: vextsh2w v2, v2
> ; CHECK-P9-NEXT: xvcvsxwsp v2, v2
> ; CHECK-P9-NEXT: blr
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
> index faec95831816..ea8ede3af22a 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
> @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i32 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: test2elt:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r4, r2, .LCPI0_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: addi r3, r4, .LCPI0_0 at toc@l
> +; CHECK-P8-NEXT: mtvsrwz v2, r3
> +; CHECK-P8-NEXT: addi r4, r4, .LCPI0_0 at toc@l
> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: lvx v3, 0, r3
> +; CHECK-P8-NEXT: lvx v3, 0, r4
> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3
> ; CHECK-P8-NEXT: xvcvuxddp v2, v2
> ; CHECK-P8-NEXT: blr
> @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture
> sret %agg.result, i64 %a.c
> ; CHECK-P8-LABEL: test4elt:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r5, r2, .LCPI1_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_1 at toc@ha
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI1_1 at toc@ha
> +; CHECK-P8-NEXT: mtvsrd v2, r4
> ; CHECK-P8-NEXT: addi r5, r5, .LCPI1_0 at toc@l
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI1_1 at toc@l
> +; CHECK-P8-NEXT: addi r4, r6, .LCPI1_1 at toc@l
> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: lvx v2, 0, r5
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> +; CHECK-P8-NEXT: lvx v3, 0, r5
> ; CHECK-P8-NEXT: lvx v5, 0, r4
> ; CHECK-P8-NEXT: li r4, 16
> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2
> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5
> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2
> -; CHECK-P8-NEXT: xvcvuxddp vs1, v3
> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3
> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5
> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3
> +; CHECK-P8-NEXT: xvcvuxddp vs1, v2
> ; CHECK-P8-NEXT: xxswapd vs0, vs0
> ; CHECK-P8-NEXT: xxswapd vs1, vs1
> ; CHECK-P8-NEXT: stxvd2x vs1, r3, r4
> @@ -74,11 +72,10 @@ define void @test4elt(<4 x double>* noalias nocapture
> sret %agg.result, i64 %a.c
> ;
> ; CHECK-P9-LABEL: test4elt:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: mtfprd f0, r4
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI1_0 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI1_0 at toc@l
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> ; CHECK-P9-NEXT: xxlxor v4, v4, v4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI1_1 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI1_1 at toc@l
> @@ -370,14 +367,13 @@ define <2 x double> @test2elt_signed(i32 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: test2elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r4, r2, .LCPI4_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: addi r3, r4, .LCPI4_0 at toc@l
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: lvx v3, 0, r3
> +; CHECK-P8-NEXT: mtvsrwz v3, r3
> ; CHECK-P8-NEXT: addis r3, r2, .LCPI4_1 at toc@ha
> +; CHECK-P8-NEXT: addi r4, r4, .LCPI4_0 at toc@l
> ; CHECK-P8-NEXT: addi r3, r3, .LCPI4_1 at toc@l
> +; CHECK-P8-NEXT: lvx v2, 0, r4
> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3
> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2
> ; CHECK-P8-NEXT: xxswapd v3, vs0
> ; CHECK-P8-NEXT: vsld v2, v2, v3
> ; CHECK-P8-NEXT: vsrad v2, v2, v3
> @@ -415,17 +411,16 @@ define void @test4elt_signed(<4 x double>* noalias
> nocapture sret %agg.result, i
> ; CHECK-P8-LABEL: test4elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r5, r2, .LCPI5_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI5_2 at toc@ha
> -; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI5_2 at toc@l
> -; CHECK-P8-NEXT: lvx v2, 0, r5
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: lvx v4, 0, r4
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI5_2 at toc@ha
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_1 at toc@ha
> +; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l
> ; CHECK-P8-NEXT: addi r4, r4, .LCPI5_1 at toc@l
> +; CHECK-P8-NEXT: lvx v2, 0, r5
> +; CHECK-P8-NEXT: addi r5, r6, .LCPI5_2 at toc@l
> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
> ; CHECK-P8-NEXT: li r4, 16
> +; CHECK-P8-NEXT: lvx v4, 0, r5
> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2
> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4
> ; CHECK-P8-NEXT: xxswapd v4, vs0
> @@ -443,14 +438,13 @@ define void @test4elt_signed(<4 x double>* noalias
> nocapture sret %agg.result, i
> ;
> ; CHECK-P9-LABEL: test4elt_signed:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: mtfprd f0, r4
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI5_0 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI5_0 at toc@l
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI5_1 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI5_1 at toc@l
> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3
> ; CHECK-P9-NEXT: vextsh2d v3, v3
> ; CHECK-P9-NEXT: xvcvsxddp vs0, v3
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
> index 6f046f69ecca..f152c2b008ff 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
> @@ -18,9 +18,9 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr #0
> {
> ; CHECK-P8-NEXT: xscvuxdsp f0, f0
> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -30,12 +30,12 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr
> #0 {
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> ; CHECK-P9-NEXT: xscvuxdsp f0, f0
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
> ; CHECK-P9-NEXT: xxlor vs0, v2, v2
> ; CHECK-P9-NEXT: xscvuxdsp f0, f0
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -311,9 +311,9 @@ define i64 @test2elt_signed(<2 x i64> %a)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvsxdsp f0, f0
> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -323,12 +323,12 @@ define i64 @test2elt_signed(<2 x i64> %a)
> local_unnamed_addr #0 {
> ; CHECK-P9-NEXT: xxswapd vs0, v2
> ; CHECK-P9-NEXT: xscvsxdsp f0, f0
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
> ; CHECK-P9-NEXT: xxlor vs0, v2, v2
> ; CHECK-P9-NEXT: xscvsxdsp f0, f0
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> ;
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
> index ce97ed67baa1..f2cb9f5f45fb 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
> @@ -24,9 +24,9 @@ define i64 @test2elt(i16 %a.coerce) local_unnamed_addr
> #0 {
> ; CHECK-P8-NEXT: xscvuxdsp f1, f1
> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -43,12 +43,12 @@ define i64 @test2elt(i16 %a.coerce) local_unnamed_addr
> #0 {
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> ; CHECK-P9-NEXT: vextubrx r3, r3, v2
> ; CHECK-P9-NEXT: clrlwi r3, r3, 24
> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
> ; CHECK-P9-NEXT: mtfprwz f0, r3
> ; CHECK-P9-NEXT: xscvuxdsp f0, f0
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -81,11 +81,10 @@ define <4 x float> @test4elt(i32 %a.coerce)
> local_unnamed_addr #1 {
> ; CHECK-P8-LABEL: test4elt:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r4, r2, .LCPI1_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: addi r3, r4, .LCPI1_0 at toc@l
> +; CHECK-P8-NEXT: mtvsrwz v2, r3
> +; CHECK-P8-NEXT: addi r4, r4, .LCPI1_0 at toc@l
> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: lvx v3, 0, r3
> +; CHECK-P8-NEXT: lvx v3, 0, r4
> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3
> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2
> ; CHECK-P8-NEXT: blr
> @@ -121,30 +120,28 @@ define void @test8elt(<8 x float>* noalias nocapture
> sret %agg.result, i64 %a.co
> ; CHECK-P8-LABEL: test8elt:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r5, r2, .LCPI2_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI2_1 at toc@ha
> +; CHECK-P8-NEXT: mtvsrd v2, r4
> ; CHECK-P8-NEXT: addi r5, r5, .LCPI2_0 at toc@l
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_1 at toc@l
> +; CHECK-P8-NEXT: addi r4, r6, .LCPI2_1 at toc@l
> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: lvx v2, 0, r5
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> +; CHECK-P8-NEXT: lvx v3, 0, r5
> ; CHECK-P8-NEXT: lvx v5, 0, r4
> ; CHECK-P8-NEXT: li r4, 16
> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2
> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5
> -; CHECK-P8-NEXT: xvcvuxwsp v2, v2
> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3
> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5
> ; CHECK-P8-NEXT: xvcvuxwsp v3, v3
> -; CHECK-P8-NEXT: stvx v2, 0, r3
> -; CHECK-P8-NEXT: stvx v3, r3, r4
> +; CHECK-P8-NEXT: xvcvuxwsp v2, v2
> +; CHECK-P8-NEXT: stvx v3, 0, r3
> +; CHECK-P8-NEXT: stvx v2, r3, r4
> ; CHECK-P8-NEXT: blr
> ;
> ; CHECK-P9-LABEL: test8elt:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: mtfprd f0, r4
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> ; CHECK-P9-NEXT: xxlxor v4, v4, v4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l
> @@ -292,9 +289,9 @@ define i64 @test2elt_signed(i16 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P8-NEXT: xscvsxdsp f1, f1
> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
> ; CHECK-P8-NEXT: xxswapd vs0, v2
> ; CHECK-P8-NEXT: mffprd r3, f0
> ; CHECK-P8-NEXT: blr
> @@ -311,12 +308,12 @@ define i64 @test2elt_signed(i16 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> ; CHECK-P9-NEXT: vextubrx r3, r3, v2
> ; CHECK-P9-NEXT: extsb r3, r3
> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
> ; CHECK-P9-NEXT: mtfprwa f0, r3
> ; CHECK-P9-NEXT: xscvsxdsp f0, f0
> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
> ; CHECK-P9-NEXT: mfvsrld r3, v2
> ; CHECK-P9-NEXT: blr
> ;
> @@ -349,11 +346,10 @@ define <4 x float> @test4elt_signed(i32 %a.coerce)
> local_unnamed_addr #1 {
> ; CHECK-P8-LABEL: test4elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: addi r3, r4, .LCPI5_0 at toc@l
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: lvx v3, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3
> +; CHECK-P8-NEXT: mtvsrwz v3, r3
> +; CHECK-P8-NEXT: addi r4, r4, .LCPI5_0 at toc@l
> +; CHECK-P8-NEXT: lvx v2, 0, r4
> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2
> ; CHECK-P8-NEXT: vspltisw v3, 12
> ; CHECK-P8-NEXT: vadduwm v3, v3, v3
> ; CHECK-P8-NEXT: vslw v2, v2, v3
> @@ -392,15 +388,14 @@ define void @test8elt_signed(<8 x float>* noalias
> nocapture sret %agg.result, i6
> ; CHECK-P8-LABEL: test8elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r5, r2, .LCPI6_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_1 at toc@ha
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_1 at toc@ha
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> ; CHECK-P8-NEXT: vspltisw v5, 12
> +; CHECK-P8-NEXT: li r4, 16
> ; CHECK-P8-NEXT: addi r5, r5, .LCPI6_0 at toc@l
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_1 at toc@l
> ; CHECK-P8-NEXT: lvx v2, 0, r5
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: lvx v4, 0, r4
> -; CHECK-P8-NEXT: li r4, 16
> +; CHECK-P8-NEXT: addi r5, r6, .LCPI6_1 at toc@l
> +; CHECK-P8-NEXT: lvx v4, 0, r5
> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2
> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4
> ; CHECK-P8-NEXT: vadduwm v4, v5, v5
> @@ -416,14 +411,13 @@ define void @test8elt_signed(<8 x float>* noalias
> nocapture sret %agg.result, i6
> ;
> ; CHECK-P9-LABEL: test8elt_signed:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: mtfprd f0, r4
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_0 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_0 at toc@l
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_1 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_1 at toc@l
> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3
> ; CHECK-P9-NEXT: vextsb2w v3, v3
> ; CHECK-P9-NEXT: xvcvsxwsp vs0, v3
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>
> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
> b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
> index b4582e844f30..268fc9b7d4cc 100644
> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
> @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i16 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: test2elt:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r4, r2, .LCPI0_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: addi r3, r4, .LCPI0_0 at toc@l
> +; CHECK-P8-NEXT: mtvsrwz v2, r3
> +; CHECK-P8-NEXT: addi r4, r4, .LCPI0_0 at toc@l
> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: lvx v3, 0, r3
> +; CHECK-P8-NEXT: lvx v3, 0, r4
> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3
> ; CHECK-P8-NEXT: xvcvuxddp v2, v2
> ; CHECK-P8-NEXT: blr
> @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture
> sret %agg.result, i32 %a.c
> ; CHECK-P8-LABEL: test4elt:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r5, r2, .LCPI1_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_1 at toc@ha
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI1_1 at toc@ha
> +; CHECK-P8-NEXT: mtvsrwz v2, r4
> ; CHECK-P8-NEXT: addi r5, r5, .LCPI1_0 at toc@l
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI1_1 at toc@l
> +; CHECK-P8-NEXT: addi r4, r6, .LCPI1_1 at toc@l
> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: lvx v2, 0, r5
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> +; CHECK-P8-NEXT: lvx v3, 0, r5
> ; CHECK-P8-NEXT: lvx v5, 0, r4
> ; CHECK-P8-NEXT: li r4, 16
> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2
> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5
> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2
> -; CHECK-P8-NEXT: xvcvuxddp vs1, v3
> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3
> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5
> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3
> +; CHECK-P8-NEXT: xvcvuxddp vs1, v2
> ; CHECK-P8-NEXT: xxswapd vs0, vs0
> ; CHECK-P8-NEXT: xxswapd vs1, vs1
> ; CHECK-P8-NEXT: stxvd2x vs1, r3, r4
> @@ -118,33 +116,32 @@ define void @test8elt(<8 x double>* noalias
> nocapture sret %agg.result, i64 %a.c
> ; CHECK-P8-LABEL: test8elt:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r5, r2, .LCPI2_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_2 at toc@ha
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI2_2 at toc@ha
> +; CHECK-P8-NEXT: mtvsrd v2, r4
> +; CHECK-P8-NEXT: addis r4, r2, .LCPI2_3 at toc@ha
> ; CHECK-P8-NEXT: addi r5, r5, .LCPI2_0 at toc@l
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_2 at toc@l
> +; CHECK-P8-NEXT: addi r4, r4, .LCPI2_3 at toc@l
> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
> -; CHECK-P8-NEXT: lvx v2, 0, r5
> -; CHECK-P8-NEXT: addis r5, r2, .LCPI2_3 at toc@ha
> -; CHECK-P8-NEXT: lvx v5, 0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: addi r5, r5, .LCPI2_3 at toc@l
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_1 at toc@l
> -; CHECK-P8-NEXT: lvx v0, 0, r5
> -; CHECK-P8-NEXT: lvx v1, 0, r4
> +; CHECK-P8-NEXT: lvx v3, 0, r5
> +; CHECK-P8-NEXT: addi r5, r6, .LCPI2_2 at toc@l
> +; CHECK-P8-NEXT: lvx v0, 0, r4
> ; CHECK-P8-NEXT: li r4, 48
> +; CHECK-P8-NEXT: lvx v5, 0, r5
> +; CHECK-P8-NEXT: addis r5, r2, .LCPI2_1 at toc@ha
> +; CHECK-P8-NEXT: addi r5, r5, .LCPI2_1 at toc@l
> +; CHECK-P8-NEXT: lvx v1, 0, r5
> +; CHECK-P8-NEXT: vperm v0, v4, v2, v0
> ; CHECK-P8-NEXT: li r5, 32
> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2
> -; CHECK-P8-NEXT: vperm v5, v4, v3, v5
> -; CHECK-P8-NEXT: vperm v0, v4, v3, v0
> -; CHECK-P8-NEXT: vperm v3, v4, v3, v1
> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2
> -; CHECK-P8-NEXT: xvcvuxddp vs1, v5
> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3
> +; CHECK-P8-NEXT: vperm v5, v4, v2, v5
> +; CHECK-P8-NEXT: vperm v2, v4, v2, v1
> ; CHECK-P8-NEXT: xvcvuxddp vs2, v0
> -; CHECK-P8-NEXT: xvcvuxddp vs3, v3
> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3
> +; CHECK-P8-NEXT: xvcvuxddp vs1, v5
> +; CHECK-P8-NEXT: xvcvuxddp vs3, v2
> +; CHECK-P8-NEXT: xxswapd vs2, vs2
> ; CHECK-P8-NEXT: xxswapd vs0, vs0
> ; CHECK-P8-NEXT: xxswapd vs1, vs1
> -; CHECK-P8-NEXT: xxswapd vs2, vs2
> ; CHECK-P8-NEXT: xxswapd vs3, vs3
> ; CHECK-P8-NEXT: stxvd2x vs2, r3, r4
> ; CHECK-P8-NEXT: li r4, 16
> @@ -155,11 +152,10 @@ define void @test8elt(<8 x double>* noalias
> nocapture sret %agg.result, i64 %a.c
> ;
> ; CHECK-P9-LABEL: test8elt:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: mtfprd f0, r4
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> ; CHECK-P9-NEXT: xxlxor v4, v4, v4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l
> @@ -404,14 +400,13 @@ define <2 x double> @test2elt_signed(i16 %a.coerce)
> local_unnamed_addr #0 {
> ; CHECK-P8-LABEL: test2elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r4, r2, .LCPI4_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r3
> -; CHECK-P8-NEXT: addi r3, r4, .LCPI4_0 at toc@l
> -; CHECK-P8-NEXT: xxswapd v2, vs0
> -; CHECK-P8-NEXT: lvx v3, 0, r3
> +; CHECK-P8-NEXT: mtvsrwz v3, r3
> ; CHECK-P8-NEXT: addis r3, r2, .LCPI4_1 at toc@ha
> +; CHECK-P8-NEXT: addi r4, r4, .LCPI4_0 at toc@l
> ; CHECK-P8-NEXT: addi r3, r3, .LCPI4_1 at toc@l
> +; CHECK-P8-NEXT: lvx v2, 0, r4
> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r3
> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3
> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2
> ; CHECK-P8-NEXT: xxswapd v3, vs0
> ; CHECK-P8-NEXT: vsld v2, v2, v3
> ; CHECK-P8-NEXT: vsrad v2, v2, v3
> @@ -449,17 +444,16 @@ define void @test4elt_signed(<4 x double>* noalias
> nocapture sret %agg.result, i
> ; CHECK-P8-LABEL: test4elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> ; CHECK-P8-NEXT: addis r5, r2, .LCPI5_0 at toc@ha
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI5_2 at toc@ha
> -; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI5_2 at toc@l
> -; CHECK-P8-NEXT: lvx v2, 0, r5
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: lvx v4, 0, r4
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI5_2 at toc@ha
> +; CHECK-P8-NEXT: mtvsrwz v3, r4
> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_1 at toc@ha
> +; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l
> ; CHECK-P8-NEXT: addi r4, r4, .LCPI5_1 at toc@l
> +; CHECK-P8-NEXT: lvx v2, 0, r5
> +; CHECK-P8-NEXT: addi r5, r6, .LCPI5_2 at toc@l
> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
> ; CHECK-P8-NEXT: li r4, 16
> +; CHECK-P8-NEXT: lvx v4, 0, r5
> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2
> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4
> ; CHECK-P8-NEXT: xxswapd v4, vs0
> @@ -523,26 +517,25 @@ entry:
> define void @test8elt_signed(<8 x double>* noalias nocapture sret
> %agg.result, i64 %a.coerce) local_unnamed_addr #1 {
> ; CHECK-P8-LABEL: test8elt_signed:
> ; CHECK-P8: # %bb.0: # %entry
> -; CHECK-P8-NEXT: mtfprd f0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_2 at toc@ha
> ; CHECK-P8-NEXT: addis r5, r2, .LCPI6_0 at toc@ha
> -; CHECK-P8-NEXT: addis r6, r2, .LCPI6_3 at toc@ha
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_2 at toc@l
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_2 at toc@ha
> +; CHECK-P8-NEXT: mtvsrd v3, r4
> +; CHECK-P8-NEXT: addis r4, r2, .LCPI6_1 at toc@ha
> ; CHECK-P8-NEXT: addi r5, r5, .LCPI6_0 at toc@l
> -; CHECK-P8-NEXT: addi r6, r6, .LCPI6_3 at toc@l
> -; CHECK-P8-NEXT: lvx v4, 0, r4
> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_4 at toc@ha
> +; CHECK-P8-NEXT: addi r6, r6, .LCPI6_2 at toc@l
> +; CHECK-P8-NEXT: addi r4, r4, .LCPI6_1 at toc@l
> ; CHECK-P8-NEXT: lvx v2, 0, r5
> -; CHECK-P8-NEXT: xxswapd v3, vs0
> -; CHECK-P8-NEXT: lvx v5, 0, r6
> -; CHECK-P8-NEXT: addis r5, r2, .LCPI6_1 at toc@ha
> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_4 at toc@l
> -; CHECK-P8-NEXT: addi r5, r5, .LCPI6_1 at toc@l
> -; CHECK-P8-NEXT: lvx v0, 0, r4
> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r5
> +; CHECK-P8-NEXT: addis r5, r2, .LCPI6_3 at toc@ha
> +; CHECK-P8-NEXT: lvx v4, 0, r6
> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_4 at toc@ha
> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
> ; CHECK-P8-NEXT: li r4, 48
> -; CHECK-P8-NEXT: li r5, 32
> +; CHECK-P8-NEXT: addi r5, r5, .LCPI6_3 at toc@l
> +; CHECK-P8-NEXT: lvx v5, 0, r5
> +; CHECK-P8-NEXT: addi r5, r6, .LCPI6_4 at toc@l
> +; CHECK-P8-NEXT: lvx v0, 0, r5
> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2
> +; CHECK-P8-NEXT: li r5, 32
> ; CHECK-P8-NEXT: vperm v4, v3, v3, v4
> ; CHECK-P8-NEXT: vperm v5, v3, v3, v5
> ; CHECK-P8-NEXT: vperm v3, v3, v3, v0
> @@ -572,14 +565,13 @@ define void @test8elt_signed(<8 x double>* noalias
> nocapture sret %agg.result, i
> ;
> ; CHECK-P9-LABEL: test8elt_signed:
> ; CHECK-P9: # %bb.0: # %entry
> -; CHECK-P9-NEXT: mtfprd f0, r4
> +; CHECK-P9-NEXT: mtvsrd v2, r4
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_0 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_0 at toc@l
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
> -; CHECK-P9-NEXT: xxswapd v2, vs0
> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3
> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_1 at toc@ha
> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_1 at toc@l
> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3
> ; CHECK-P9-NEXT: vextsb2d v3, v3
> ; CHECK-P9-NEXT: xvcvsxddp vs0, v3
> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>
> diff --git
> a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
> b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
> index 7e51f2b862ab..29955dc17f67 100644
> --- a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
> +++ b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
> @@ -82,10 +82,10 @@ define <3 x float> @constrained_vector_fdiv_v3f32() #0
> {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: xscvdpspn 2, 2
> ; PC64LE-NEXT: xscvdpspn 0, 0
> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1
> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1
> -; PC64LE-NEXT: vmrglw 2, 3, 2
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3
> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3
> +; PC64LE-NEXT: vmrghw 2, 3, 2
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: blr
> ;
> @@ -106,12 +106,12 @@ define <3 x float> @constrained_vector_fdiv_v3f32()
> #0 {
> ; PC64LE9-NEXT: xsdivsp 2, 2, 0
> ; PC64LE9-NEXT: xsdivsp 0, 3, 0
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: blr
> entry:
> @@ -359,11 +359,11 @@ define <3 x float> @constrained_vector_frem_v3f32()
> #0 {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI7_4 at toc@l
> ; PC64LE-NEXT: lvx 4, 0, 3
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 30
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: addi 1, 1, 64
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -401,15 +401,15 @@ define <3 x float> @constrained_vector_frem_v3f32()
> #0 {
> ; PC64LE9-NEXT: bl fmodf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 29
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> ; PC64LE9-NEXT: addis 3, 2, .LCPI7_4 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI7_4 at toc@l
> ; PC64LE9-NEXT: lxvx 36, 0, 3
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: addi 1, 1, 64
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -710,10 +710,10 @@ define <3 x float> @constrained_vector_fmul_v3f32()
> #0 {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: xscvdpspn 2, 2
> ; PC64LE-NEXT: xscvdpspn 0, 0
> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1
> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1
> -; PC64LE-NEXT: vmrglw 2, 3, 2
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3
> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3
> +; PC64LE-NEXT: vmrghw 2, 3, 2
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: blr
> ;
> @@ -735,11 +735,11 @@ define <3 x float> @constrained_vector_fmul_v3f32()
> #0 {
> ; PC64LE9-NEXT: xsmulsp 1, 1, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> ; PC64LE9-NEXT: xscvdpspn 1, 1
> -; PC64LE9-NEXT: xxsldwi 34, 1, 1, 1
> +; PC64LE9-NEXT: xxsldwi 34, 1, 1, 3
> ; PC64LE9-NEXT: xscvdpspn 1, 2
> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: blr
> entry:
> @@ -925,10 +925,10 @@ define <3 x float> @constrained_vector_fadd_v3f32()
> #0 {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: xscvdpspn 2, 2
> ; PC64LE-NEXT: xscvdpspn 0, 0
> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1
> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1
> -; PC64LE-NEXT: vmrglw 2, 3, 2
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3
> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3
> +; PC64LE-NEXT: vmrghw 2, 3, 2
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: blr
> ;
> @@ -945,15 +945,15 @@ define <3 x float> @constrained_vector_fadd_v3f32()
> #0 {
> ; PC64LE9-NEXT: xsaddsp 1, 0, 1
> ; PC64LE9-NEXT: xsaddsp 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> ; PC64LE9-NEXT: addis 3, 2, .LCPI17_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI17_3 at toc@l
> ; PC64LE9-NEXT: lxvx 36, 0, 3
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: blr
> entry:
> @@ -1137,10 +1137,10 @@ define <3 x float>
> @constrained_vector_fsub_v3f32() #0 {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: xscvdpspn 2, 2
> ; PC64LE-NEXT: xscvdpspn 0, 0
> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1
> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1
> -; PC64LE-NEXT: vmrglw 2, 3, 2
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3
> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3
> +; PC64LE-NEXT: vmrghw 2, 3, 2
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: blr
> ;
> @@ -1157,15 +1157,15 @@ define <3 x float>
> @constrained_vector_fsub_v3f32() #0 {
> ; PC64LE9-NEXT: xssubsp 1, 0, 1
> ; PC64LE9-NEXT: xssubsp 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> ; PC64LE9-NEXT: addis 3, 2, .LCPI22_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI22_3 at toc@l
> ; PC64LE9-NEXT: lxvx 36, 0, 3
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: blr
> entry:
> @@ -1333,12 +1333,12 @@ define <3 x float>
> @constrained_vector_sqrt_v3f32() #0 {
> ; PC64LE-NEXT: xssqrtsp 2, 2
> ; PC64LE-NEXT: xscvdpspn 0, 0
> ; PC64LE-NEXT: xscvdpspn 1, 1
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 2
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 3, 2
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 3, 2
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: blr
> ;
> @@ -1358,10 +1358,10 @@ define <3 x float>
> @constrained_vector_sqrt_v3f32() #0 {
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> ; PC64LE9-NEXT: xscvdpspn 1, 1
> ; PC64LE9-NEXT: xscvdpspn 2, 2
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE9-NEXT: xxsldwi 34, 2, 2, 1
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE9-NEXT: xxsldwi 34, 2, 2, 3
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: blr
> @@ -1588,11 +1588,11 @@ define <3 x float> @constrained_vector_pow_v3f32()
> #0 {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI32_4 at toc@l
> ; PC64LE-NEXT: lvx 4, 0, 3
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 30
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: addi 1, 1, 64
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -1630,15 +1630,15 @@ define <3 x float> @constrained_vector_pow_v3f32()
> #0 {
> ; PC64LE9-NEXT: bl powf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 29
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> ; PC64LE9-NEXT: addis 3, 2, .LCPI32_4 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI32_4 at toc@l
> ; PC64LE9-NEXT: lxvx 36, 0, 3
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: addi 1, 1, 64
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -1992,11 +1992,11 @@ define <3 x float>
> @constrained_vector_powi_v3f32() #0 {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI37_3 at toc@l
> ; PC64LE-NEXT: lvx 4, 0, 3
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -2030,15 +2030,15 @@ define <3 x float>
> @constrained_vector_powi_v3f32() #0 {
> ; PC64LE9-NEXT: bl __powisf2
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI37_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI37_3 at toc@l
> ; PC64LE9-NEXT: lxvx 36, 0, 3
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -2360,12 +2360,12 @@ define <3 x float> @constrained_vector_sin_v3f32()
> #0 {
> ; PC64LE-NEXT: addis 3, 2, .LCPI42_3 at toc@ha
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI42_3 at toc@l
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -2396,15 +2396,15 @@ define <3 x float> @constrained_vector_sin_v3f32()
> #0 {
> ; PC64LE9-NEXT: bl sinf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI42_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI42_3 at toc@l
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -2709,12 +2709,12 @@ define <3 x float> @constrained_vector_cos_v3f32()
> #0 {
> ; PC64LE-NEXT: addis 3, 2, .LCPI47_3 at toc@ha
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI47_3 at toc@l
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -2745,15 +2745,15 @@ define <3 x float> @constrained_vector_cos_v3f32()
> #0 {
> ; PC64LE9-NEXT: bl cosf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI47_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI47_3 at toc@l
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -3058,12 +3058,12 @@ define <3 x float> @constrained_vector_exp_v3f32()
> #0 {
> ; PC64LE-NEXT: addis 3, 2, .LCPI52_3 at toc@ha
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI52_3 at toc@l
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -3094,15 +3094,15 @@ define <3 x float> @constrained_vector_exp_v3f32()
> #0 {
> ; PC64LE9-NEXT: bl expf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI52_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI52_3 at toc@l
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -3407,12 +3407,12 @@ define <3 x float>
> @constrained_vector_exp2_v3f32() #0 {
> ; PC64LE-NEXT: addis 3, 2, .LCPI57_3 at toc@ha
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI57_3 at toc@l
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -3443,15 +3443,15 @@ define <3 x float>
> @constrained_vector_exp2_v3f32() #0 {
> ; PC64LE9-NEXT: bl exp2f
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI57_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI57_3 at toc@l
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -3756,12 +3756,12 @@ define <3 x float> @constrained_vector_log_v3f32()
> #0 {
> ; PC64LE-NEXT: addis 3, 2, .LCPI62_3 at toc@ha
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI62_3 at toc@l
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -3792,15 +3792,15 @@ define <3 x float> @constrained_vector_log_v3f32()
> #0 {
> ; PC64LE9-NEXT: bl logf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI62_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI62_3 at toc@l
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -4105,12 +4105,12 @@ define <3 x float>
> @constrained_vector_log10_v3f32() #0 {
> ; PC64LE-NEXT: addis 3, 2, .LCPI67_3 at toc@ha
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI67_3 at toc@l
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -4141,15 +4141,15 @@ define <3 x float>
> @constrained_vector_log10_v3f32() #0 {
> ; PC64LE9-NEXT: bl log10f
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI67_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI67_3 at toc@l
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -4454,12 +4454,12 @@ define <3 x float>
> @constrained_vector_log2_v3f32() #0 {
> ; PC64LE-NEXT: addis 3, 2, .LCPI72_3 at toc@ha
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI72_3 at toc@l
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -4490,15 +4490,15 @@ define <3 x float>
> @constrained_vector_log2_v3f32() #0 {
> ; PC64LE9-NEXT: bl log2f
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI72_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI72_3 at toc@l
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -4748,12 +4748,12 @@ define <3 x float>
> @constrained_vector_rint_v3f32() #0 {
> ; PC64LE-NEXT: xsrdpic 2, 2
> ; PC64LE-NEXT: xscvdpspn 0, 0
> ; PC64LE-NEXT: xscvdpspn 1, 1
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 2
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 3, 2
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 3, 2
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: blr
> ;
> @@ -4773,10 +4773,10 @@ define <3 x float>
> @constrained_vector_rint_v3f32() #0 {
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> ; PC64LE9-NEXT: xscvdpspn 1, 1
> ; PC64LE9-NEXT: xscvdpspn 2, 2
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE9-NEXT: xxsldwi 34, 2, 2, 1
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE9-NEXT: xxsldwi 34, 2, 2, 3
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: blr
> @@ -4947,12 +4947,12 @@ define <3 x float>
> @constrained_vector_nearbyint_v3f32() #0 {
> ; PC64LE-NEXT: addis 3, 2, .LCPI82_3 at toc@ha
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI82_3 at toc@l
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 31
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: addi 1, 1, 48
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -4983,15 +4983,15 @@ define <3 x float>
> @constrained_vector_nearbyint_v3f32() #0 {
> ; PC64LE9-NEXT: bl nearbyintf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 31
> ; PC64LE9-NEXT: addis 3, 2, .LCPI82_3 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI82_3 at toc@l
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: addi 1, 1, 48
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -5184,11 +5184,11 @@ define <3 x float>
> @constrained_vector_maxnum_v3f32() #0 {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI87_5 at toc@l
> ; PC64LE-NEXT: lvx 4, 0, 3
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 30
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: addi 1, 1, 64
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -5227,15 +5227,15 @@ define <3 x float>
> @constrained_vector_maxnum_v3f32() #0 {
> ; PC64LE9-NEXT: bl fmaxf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 29
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> ; PC64LE9-NEXT: addis 3, 2, .LCPI87_5 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI87_5 at toc@l
> ; PC64LE9-NEXT: lxvx 36, 0, 3
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: addi 1, 1, 64
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -5471,11 +5471,11 @@ define <3 x float>
> @constrained_vector_minnum_v3f32() #0 {
> ; PC64LE-NEXT: xscvdpspn 1, 1
> ; PC64LE-NEXT: addi 3, 3, .LCPI92_5 at toc@l
> ; PC64LE-NEXT: lvx 4, 0, 3
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 30
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 2, 3
> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 2, 3
> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 3, 2, 4
> ; PC64LE-NEXT: addi 1, 1, 64
> ; PC64LE-NEXT: ld 0, 16(1)
> @@ -5514,15 +5514,15 @@ define <3 x float>
> @constrained_vector_minnum_v3f32() #0 {
> ; PC64LE9-NEXT: bl fminf
> ; PC64LE9-NEXT: nop
> ; PC64LE9-NEXT: xscvdpspn 0, 1
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 29
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: xscvdpspn 0, 30
> ; PC64LE9-NEXT: addis 3, 2, .LCPI92_5 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI92_5 at toc@l
> ; PC64LE9-NEXT: lxvx 36, 0, 3
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
> ; PC64LE9-NEXT: addi 1, 1, 64
> ; PC64LE9-NEXT: ld 0, 16(1)
> @@ -5686,9 +5686,9 @@ define <2 x float>
> @constrained_vector_fptrunc_v2f64() #0 {
> ; PC64LE-NEXT: xsrsp 1, 1
> ; PC64LE-NEXT: xscvdpspn 0, 0
> ; PC64LE-NEXT: xscvdpspn 1, 1
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 3, 2
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 3, 2
> ; PC64LE-NEXT: blr
> ;
> ; PC64LE9-LABEL: constrained_vector_fptrunc_v2f64:
> @@ -5698,12 +5698,12 @@ define <2 x float>
> @constrained_vector_fptrunc_v2f64() #0 {
> ; PC64LE9-NEXT: addis 3, 2, .LCPI96_1 at toc@ha
> ; PC64LE9-NEXT: xsrsp 0, 0
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: lfd 0, .LCPI96_1 at toc@l(3)
> ; PC64LE9-NEXT: xsrsp 0, 0
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: blr
> entry:
> %result = call <2 x float>
> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(
> @@ -5729,12 +5729,12 @@ define <3 x float>
> @constrained_vector_fptrunc_v3f64() #0 {
> ; PC64LE-NEXT: xsrsp 2, 2
> ; PC64LE-NEXT: xscvdpspn 0, 0
> ; PC64LE-NEXT: xscvdpspn 1, 1
> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE-NEXT: xscvdpspn 0, 2
> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
> -; PC64LE-NEXT: vmrglw 2, 3, 2
> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
> +; PC64LE-NEXT: vmrghw 2, 3, 2
> ; PC64LE-NEXT: lvx 3, 0, 3
> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE-NEXT: vperm 2, 4, 2, 3
> ; PC64LE-NEXT: blr
> ;
> @@ -5745,20 +5745,20 @@ define <3 x float>
> @constrained_vector_fptrunc_v3f64() #0 {
> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_1 at toc@ha
> ; PC64LE9-NEXT: xsrsp 0, 0
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
> ; PC64LE9-NEXT: lfd 0, .LCPI97_1 at toc@l(3)
> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_2 at toc@ha
> ; PC64LE9-NEXT: addi 3, 3, .LCPI97_2 at toc@l
> ; PC64LE9-NEXT: xsrsp 0, 0
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
> -; PC64LE9-NEXT: vmrglw 2, 3, 2
> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
> +; PC64LE9-NEXT: vmrghw 2, 3, 2
> ; PC64LE9-NEXT: lxvx 35, 0, 3
> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_3 at toc@ha
> ; PC64LE9-NEXT: lfd 0, .LCPI97_3 at toc@l(3)
> ; PC64LE9-NEXT: xsrsp 0, 0
> ; PC64LE9-NEXT: xscvdpspn 0, 0
> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
> ; PC64LE9-NEXT: blr
> entry:
>
> diff --git a/llvm/test/CodeGen/PowerPC/vsx.ll
> b/llvm/test/CodeGen/PowerPC/vsx.ll
> index 8b4e3640ef6b..4a78218262ca 100644
> --- a/llvm/test/CodeGen/PowerPC/vsx.ll
> +++ b/llvm/test/CodeGen/PowerPC/vsx.ll
> @@ -1404,9 +1404,9 @@ define <2 x float> @test44(<2 x i64> %a) {
> ; CHECK-LE-NEXT: xscvuxdsp f0, f0
> ; CHECK-LE-NEXT: xscvdpspn vs1, f1
> ; CHECK-LE-NEXT: xscvdpspn vs0, f0
> -; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 1
> -; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-LE-NEXT: vmrglw v2, v3, v2
> +; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 3
> +; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-LE-NEXT: vmrghw v2, v3, v2
> ; CHECK-LE-NEXT: blr
> %v = uitofp <2 x i64> %a to <2 x float>
> ret <2 x float> %v
> @@ -1486,9 +1486,9 @@ define <2 x float> @test45(<2 x i64> %a) {
> ; CHECK-LE-NEXT: xscvsxdsp f0, f0
> ; CHECK-LE-NEXT: xscvdpspn vs1, f1
> ; CHECK-LE-NEXT: xscvdpspn vs0, f0
> -; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 1
> -; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 1
> -; CHECK-LE-NEXT: vmrglw v2, v3, v2
> +; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 3
> +; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 3
> +; CHECK-LE-NEXT: vmrghw v2, v3, v2
> ; CHECK-LE-NEXT: blr
> %v = sitofp <2 x i64> %a to <2 x float>
> ret <2 x float> %v
> @@ -2437,12 +2437,11 @@ define <2 x i32> @test80(i32 %v) {
> ;
> ; CHECK-LE-LABEL: test80:
> ; CHECK-LE: # %bb.0:
> -; CHECK-LE-NEXT: mtfprd f0, r3
> +; CHECK-LE-NEXT: mtfprwz f0, r3
> ; CHECK-LE-NEXT: addis r4, r2, .LCPI65_0 at toc@ha
> ; CHECK-LE-NEXT: addi r3, r4, .LCPI65_0 at toc@l
> -; CHECK-LE-NEXT: xxswapd vs0, vs0
> +; CHECK-LE-NEXT: xxspltw v2, vs0, 1
> ; CHECK-LE-NEXT: lvx v3, 0, r3
> -; CHECK-LE-NEXT: xxspltw v2, vs0, 3
> ; CHECK-LE-NEXT: vadduwm v2, v2, v3
> ; CHECK-LE-NEXT: blr
> %b1 = insertelement <2 x i32> undef, i32 %v, i32 0
>
> diff --git a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
> b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
> index 5c05f8dc3d81..a198604f79a4 100644
> --- a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
> +++ b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
> @@ -17,17 +17,15 @@ define <2 x double> @testi0(<2 x double>* %p1, double*
> %p2) {
> ; CHECK-NEXT: lxvd2x vs0, 0, r3
> ; CHECK-NEXT: lfdx f1, 0, r4
> ; CHECK-NEXT: xxswapd vs0, vs0
> -; CHECK-NEXT: xxspltd vs1, vs1, 0
> -; CHECK-NEXT: xxpermdi v2, vs0, vs1, 1
> +; CHECK-NEXT: xxmrghd v2, vs0, vs1
> ; CHECK-NEXT: blr
> ;
> ; CHECK-P9-VECTOR-LABEL: testi0:
> ; CHECK-P9-VECTOR: # %bb.0:
> ; CHECK-P9-VECTOR-NEXT: lxvd2x vs0, 0, r3
> ; CHECK-P9-VECTOR-NEXT: lfdx f1, 0, r4
> -; CHECK-P9-VECTOR-NEXT: xxspltd vs1, vs1, 0
> ; CHECK-P9-VECTOR-NEXT: xxswapd vs0, vs0
> -; CHECK-P9-VECTOR-NEXT: xxpermdi v2, vs0, vs1, 1
> +; CHECK-P9-VECTOR-NEXT: xxmrghd v2, vs0, vs1
> ; CHECK-P9-VECTOR-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testi0:
> @@ -51,17 +49,15 @@ define <2 x double> @testi1(<2 x double>* %p1, double*
> %p2) {
> ; CHECK-NEXT: lxvd2x vs0, 0, r3
> ; CHECK-NEXT: lfdx f1, 0, r4
> ; CHECK-NEXT: xxswapd vs0, vs0
> -; CHECK-NEXT: xxspltd vs1, vs1, 0
> -; CHECK-NEXT: xxmrgld v2, vs1, vs0
> +; CHECK-NEXT: xxpermdi v2, vs1, vs0, 1
> ; CHECK-NEXT: blr
> ;
> ; CHECK-P9-VECTOR-LABEL: testi1:
> ; CHECK-P9-VECTOR: # %bb.0:
> ; CHECK-P9-VECTOR-NEXT: lxvd2x vs0, 0, r3
> ; CHECK-P9-VECTOR-NEXT: lfdx f1, 0, r4
> -; CHECK-P9-VECTOR-NEXT: xxspltd vs1, vs1, 0
> ; CHECK-P9-VECTOR-NEXT: xxswapd vs0, vs0
> -; CHECK-P9-VECTOR-NEXT: xxmrgld v2, vs1, vs0
> +; CHECK-P9-VECTOR-NEXT: xxpermdi v2, vs1, vs0, 1
> ; CHECK-P9-VECTOR-NEXT: blr
> ;
> ; CHECK-P9-LABEL: testi1:
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200706/8630b191/attachment-0001.html>
More information about the llvm-commits
mailing list