[llvm] 1fed131 - [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE
Eric Christopher via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 8 07:55:49 PDT 2020
To follow up here: 1b1539712e1ee30c02ed20493682fc05d52391c0 fixed the
crashes I was seeing. Thanks Nemanja! :)
On Mon, Jul 6, 2020 at 4:58 PM Eric Christopher <echristo at gmail.com> wrote:
> Hi Nemanja!
>
> Running into a compiler crash with this building skia (https://skia.org/)
> for power after this patch. I'll see what I can do to get a testcase (if it
> doesn't reproduce for you), but would you mind terribly reverting in the
> meantime?
>
> Thanks!
>
> -eric
>
> On Thu, Jun 18, 2020 at 7:55 PM Nemanja Ivanovic via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>>
>> Author: Nemanja Ivanovic
>> Date: 2020-06-18T21:54:22-05:00
>> New Revision: 1fed131660b2c5d3ea7007e273a7a5da80699445
>>
>> URL:
>> https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445
>> DIFF:
>> https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445.diff
>>
>> LOG: [PowerPC] Canonicalize shuffles to match more single-instruction
>> masks on LE
>>
>> We currently miss a number of opportunities to emit single-instruction
>> VMRG[LH][BHW] instructions for shuffles on little endian subtargets.
>> Although
>> this in itself is not a huge performance opportunity since loading the
>> permute
>> vector for a VPERM can always be pulled out of loops, producing such merge
>> instructions is useful to downstream optimizations.
>> Since VPERM is essentially opaque to all subsequent optimizations, we
>> want to
>> avoid it as much as possible. Other permute instructions have semantics
>> that can
>> be reasoned about much more easily in later optimizations.
>>
>> This patch does the following:
>> - Canonicalize shuffles so that the first element comes from the first
>> vector
>> (since that's what most of the mask matching functions want)
>> - Switch the elements that come from splat vectors so that they match the
>> corresponding elements from the other vector (to allow for merges)
>> - Adds debugging messages for when a shuffle is matched to a VPERM so that
>> anyone interested in improving this further can get the info for their
>> code
>>
>> Differential revision: https://reviews.llvm.org/D77448
>>
>> Added:
>>
>>
>> Modified:
>> llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> llvm/lib/Target/PowerPC/PPCISelLowering.h
>> llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> llvm/test/CodeGen/PowerPC/pr25080.ll
>> llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> llvm/test/CodeGen/PowerPC/pr38087.ll
>> llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> llvm/test/CodeGen/PowerPC/vsx.ll
>> llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>>
>> Removed:
>>
>>
>>
>>
>> ################################################################################
>> diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> index d7698a5ec962..28bd80610c84 100644
>> --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> @@ -125,6 +125,7 @@ cl::desc("use absolute jump tables on ppc"),
>> cl::Hidden);
>>
>> STATISTIC(NumTailCalls, "Number of tail calls");
>> STATISTIC(NumSiblingCalls, "Number of sibling calls");
>> +STATISTIC(ShufflesHandledWithVPERM, "Number of shuffles lowered to a
>> VPERM");
>>
>> static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int);
>>
>> @@ -1505,6 +1506,8 @@ const char
>> *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
>> case PPCISD::MTVSRZ: return "PPCISD::MTVSRZ";
>> case PPCISD::SINT_VEC_TO_FP: return "PPCISD::SINT_VEC_TO_FP";
>> case PPCISD::UINT_VEC_TO_FP: return "PPCISD::UINT_VEC_TO_FP";
>> + case PPCISD::SCALAR_TO_VECTOR_PERMUTED:
>> + return "PPCISD::SCALAR_TO_VECTOR_PERMUTED";
>> case PPCISD::ANDI_rec_1_EQ_BIT:
>> return "PPCISD::ANDI_rec_1_EQ_BIT";
>> case PPCISD::ANDI_rec_1_GT_BIT:
>> @@ -2716,7 +2719,8 @@ static bool usePartialVectorLoads(SDNode *N, const
>> PPCSubtarget& ST) {
>> for (SDNode::use_iterator UI = LD->use_begin(), UE = LD->use_end();
>> UI != UE; ++UI)
>> if (UI.getUse().get().getResNo() == 0 &&
>> - UI->getOpcode() != ISD::SCALAR_TO_VECTOR)
>> + UI->getOpcode() != ISD::SCALAR_TO_VECTOR &&
>> + UI->getOpcode() != PPCISD::SCALAR_TO_VECTOR_PERMUTED)
>> return false;
>>
>> return true;
>> @@ -9041,7 +9045,8 @@ static const SDValue *getNormalLoadInput(const
>> SDValue &Op) {
>> const SDValue *InputLoad = &Op;
>> if (InputLoad->getOpcode() == ISD::BITCAST)
>> InputLoad = &InputLoad->getOperand(0);
>> - if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR)
>> + if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR ||
>> + InputLoad->getOpcode() == PPCISD::SCALAR_TO_VECTOR_PERMUTED)
>> InputLoad = &InputLoad->getOperand(0);
>> if (InputLoad->getOpcode() != ISD::LOAD)
>> return nullptr;
>> @@ -9690,6 +9695,15 @@ SDValue
>> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
>> SDValue V1 = Op.getOperand(0);
>> SDValue V2 = Op.getOperand(1);
>> ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);
>> +
>> + // Any nodes that were combined in the target-independent combiner
>> prior
>> + // to vector legalization will not be sent to the target combine. Try
>> to
>> + // combine it here.
>> + if (SDValue NewShuffle = combineVectorShuffle(SVOp, DAG)) {
>> + DAG.ReplaceAllUsesOfValueWith(Op, NewShuffle);
>> + Op = NewShuffle;
>> + SVOp = cast<ShuffleVectorSDNode>(Op);
>> + }
>> EVT VT = Op.getValueType();
>> bool isLittleEndian = Subtarget.isLittleEndian();
>>
>> @@ -9715,6 +9729,11 @@ SDValue
>> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
>> Offset = isLittleEndian ? (3 - SplatIdx) * 4 : SplatIdx * 4;
>> else
>> Offset = isLittleEndian ? (1 - SplatIdx) * 8 : SplatIdx * 8;
>> +
>> + // If we are loading a partial vector, it does not make sense to
>> adjust
>> + // the base pointer. This happens with (splat (s_to_v_permuted
>> (ld))).
>> + if (LD->getMemoryVT().getSizeInBits() == (IsFourByte ? 32 : 64))
>> + Offset = 0;
>> SDValue BasePtr = LD->getBasePtr();
>> if (Offset != 0)
>> BasePtr = DAG.getNode(ISD::ADD, dl,
>> getPointerTy(DAG.getDataLayout()),
>> @@ -9988,7 +10007,13 @@ SDValue
>> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
>> MVT::i32));
>> }
>>
>> + ShufflesHandledWithVPERM++;
>> SDValue VPermMask = DAG.getBuildVector(MVT::v16i8, dl, ResultMask);
>> + LLVM_DEBUG(dbgs() << "Emitting a VPERM for the following shuffle:\n");
>> + LLVM_DEBUG(SVOp->dump());
>> + LLVM_DEBUG(dbgs() << "With the following permute control vector:\n");
>> + LLVM_DEBUG(VPermMask.dump());
>> +
>> if (isLittleEndian)
>> return DAG.getNode(PPCISD::VPERM, dl, V1.getValueType(),
>> V2, V1, VPermMask);
>> @@ -14114,6 +14139,199 @@ SDValue
>> PPCTargetLowering::combineStoreFPToInt(SDNode *N,
>> return Val;
>> }
>>
>> +static bool isAlternatingShuffMask(const ArrayRef<int> &Mask, int
>> NumElts) {
>> + // Check that the source of the element keeps flipping
>> + // (i.e. Mask[i] < NumElts -> Mask[i+i] >= NumElts).
>> + bool PrevElemFromFirstVec = Mask[0] < NumElts;
>> + for (int i = 1, e = Mask.size(); i < e; i++) {
>> + if (PrevElemFromFirstVec && Mask[i] < NumElts)
>> + return false;
>> + if (!PrevElemFromFirstVec && Mask[i] >= NumElts)
>> + return false;
>> + PrevElemFromFirstVec = !PrevElemFromFirstVec;
>> + }
>> + return true;
>> +}
>> +
>> +static bool isSplatBV(SDValue Op) {
>> + if (Op.getOpcode() != ISD::BUILD_VECTOR)
>> + return false;
>> + SDValue FirstOp;
>> +
>> + // Find first non-undef input.
>> + for (int i = 0, e = Op.getNumOperands(); i < e; i++) {
>> + FirstOp = Op.getOperand(i);
>> + if (!FirstOp.isUndef())
>> + break;
>> + }
>> +
>> + // All inputs are undef or the same as the first non-undef input.
>> + for (int i = 1, e = Op.getNumOperands(); i < e; i++)
>> + if (Op.getOperand(i) != FirstOp && !Op.getOperand(i).isUndef())
>> + return false;
>> + return true;
>> +}
>> +
>> +static SDValue isScalarToVec(SDValue Op) {
>> + if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR)
>> + return Op;
>> + if (Op.getOpcode() != ISD::BITCAST)
>> + return SDValue();
>> + Op = Op.getOperand(0);
>> + if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR)
>> + return Op;
>> + return SDValue();
>> +}
>> +
>> +static void fixupShuffleMaskForPermutedSToV(SmallVectorImpl<int> &ShuffV,
>> + int LHSMaxIdx, int RHSMinIdx,
>> + int RHSMaxIdx, int HalfVec) {
>> + for (int i = 0, e = ShuffV.size(); i < e; i++) {
>> + int Idx = ShuffV[i];
>> + if ((Idx >= 0 && Idx < LHSMaxIdx) || (Idx >= RHSMinIdx && Idx <
>> RHSMaxIdx))
>> + ShuffV[i] += HalfVec;
>> + }
>> + return;
>> +}
>> +
>> +// Replace a SCALAR_TO_VECTOR with a SCALAR_TO_VECTOR_PERMUTED except if
>> +// the original is:
>> +// (<n x Ty> (scalar_to_vector (Ty (extract_elt <n x Ty> %a, C))))
>> +// In such a case, just change the shuffle mask to extract the element
>> +// from the permuted index.
>> +static SDValue getSToVPermuted(SDValue OrigSToV, SelectionDAG &DAG) {
>> + SDLoc dl(OrigSToV);
>> + EVT VT = OrigSToV.getValueType();
>> + assert(OrigSToV.getOpcode() == ISD::SCALAR_TO_VECTOR &&
>> + "Expecting a SCALAR_TO_VECTOR here");
>> + SDValue Input = OrigSToV.getOperand(0);
>> +
>> + if (Input.getOpcode() == ISD::EXTRACT_VECTOR_ELT) {
>> + ConstantSDNode *Idx = dyn_cast<ConstantSDNode>(Input.getOperand(1));
>> + SDValue OrigVector = Input.getOperand(0);
>> +
>> + // Can't handle non-const element indices or
>> diff erent vector types
>> + // for the input to the extract and the output of the
>> scalar_to_vector.
>> + if (Idx && VT == OrigVector.getValueType()) {
>> + SmallVector<int, 16> NewMask(VT.getVectorNumElements(), -1);
>> + NewMask[VT.getVectorNumElements() / 2] = Idx->getZExtValue();
>> + return DAG.getVectorShuffle(VT, dl, OrigVector, OrigVector,
>> NewMask);
>> + }
>> + }
>> + return DAG.getNode(PPCISD::SCALAR_TO_VECTOR_PERMUTED, dl, VT,
>> + OrigSToV.getOperand(0));
>> +}
>> +
>> +// On little endian subtargets, combine shuffles such as:
>> +// vector_shuffle<16,1,17,3,18,5,19,7,20,9,21,11,22,13,23,15>, <zero>, %b
>> +// into:
>> +// vector_shuffle<16,0,17,1,18,2,19,3,20,4,21,5,22,6,23,7>, <zero>, %b
>> +// because the latter can be matched to a single instruction merge.
>> +// Furthermore, SCALAR_TO_VECTOR on little endian always involves a
>> permute
>> +// to put the value into element zero. Adjust the shuffle mask so that
>> the
>> +// vector can remain in permuted form (to prevent a swap prior to a
>> shuffle).
>> +SDValue PPCTargetLowering::combineVectorShuffle(ShuffleVectorSDNode *SVN,
>> + SelectionDAG &DAG) const
>> {
>> + SDValue LHS = SVN->getOperand(0);
>> + SDValue RHS = SVN->getOperand(1);
>> + auto Mask = SVN->getMask();
>> + int NumElts = LHS.getValueType().getVectorNumElements();
>> + SDValue Res(SVN, 0);
>> + SDLoc dl(SVN);
>> +
>> + // None of these combines are useful on big endian systems since the
>> ISA
>> + // already has a big endian bias.
>> + if (!Subtarget.isLittleEndian())
>> + return Res;
>> +
>> + // If this is not a shuffle of a shuffle and the first element comes
>> from
>> + // the second vector, canonicalize to the commuted form. This will
>> make it
>> + // more likely to match one of the single instruction patterns.
>> + if (Mask[0] >= NumElts && LHS.getOpcode() != ISD::VECTOR_SHUFFLE &&
>> + RHS.getOpcode() != ISD::VECTOR_SHUFFLE) {
>> + std::swap(LHS, RHS);
>> + Res = DAG.getCommutedVectorShuffle(*SVN);
>> + Mask = cast<ShuffleVectorSDNode>(Res)->getMask();
>> + }
>> +
>> + // Adjust the shuffle mask if either input vector comes from a
>> + // SCALAR_TO_VECTOR and keep the respective input vector in permuted
>> + // form (to prevent the need for a swap).
>> + SmallVector<int, 16> ShuffV(Mask.begin(), Mask.end());
>> + SDValue SToVLHS = isScalarToVec(LHS);
>> + SDValue SToVRHS = isScalarToVec(RHS);
>> + if (SToVLHS || SToVRHS) {
>> + int NumEltsIn = SToVLHS ?
>> SToVLHS.getValueType().getVectorNumElements()
>> + :
>> SToVRHS.getValueType().getVectorNumElements();
>> + int NumEltsOut = ShuffV.size();
>> +
>> + // Initially assume that neither input is permuted. These will be
>> adjusted
>> + // accordingly if either input is.
>> + int LHSMaxIdx = -1;
>> + int RHSMinIdx = -1;
>> + int RHSMaxIdx = -1;
>> + int HalfVec = LHS.getValueType().getVectorNumElements() / 2;
>> +
>> + // Get the permuted scalar to vector nodes for the source(s) that
>> come from
>> + // ISD::SCALAR_TO_VECTOR.
>> + if (SToVLHS) {
>> + // Set up the values for the shuffle vector fixup.
>> + LHSMaxIdx = NumEltsOut / NumEltsIn;
>> + SToVLHS = getSToVPermuted(SToVLHS, DAG);
>> + if (SToVLHS.getValueType() != LHS.getValueType())
>> + SToVLHS = DAG.getBitcast(LHS.getValueType(), SToVLHS);
>> + LHS = SToVLHS;
>> + }
>> + if (SToVRHS) {
>> + RHSMinIdx = NumEltsOut;
>> + RHSMaxIdx = NumEltsOut / NumEltsIn + RHSMinIdx;
>> + SToVRHS = getSToVPermuted(SToVRHS, DAG);
>> + if (SToVRHS.getValueType() != RHS.getValueType())
>> + SToVRHS = DAG.getBitcast(RHS.getValueType(), SToVRHS);
>> + RHS = SToVRHS;
>> + }
>> +
>> + // Fix up the shuffle mask to reflect where the desired element
>> actually is.
>> + // The minimum and maximum indices that correspond to element zero
>> for both
>> + // the LHS and RHS are computed and will control which shuffle mask
>> entries
>> + // are to be changed. For example, if the RHS is permuted, any
>> shuffle mask
>> + // entries in the range [RHSMinIdx,RHSMaxIdx) will be incremented by
>> + // HalfVec to refer to the corresponding element in the permuted
>> vector.
>> + fixupShuffleMaskForPermutedSToV(ShuffV, LHSMaxIdx, RHSMinIdx,
>> RHSMaxIdx,
>> + HalfVec);
>> + Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS,
>> ShuffV);
>> +
>> + // We may have simplified away the shuffle. We won't be able to do
>> anything
>> + // further with it here.
>> + if (!isa<ShuffleVectorSDNode>(Res))
>> + return Res;
>> + Mask = cast<ShuffleVectorSDNode>(Res)->getMask();
>> + }
>> +
>> + // The common case after we commuted the shuffle is that the RHS is a
>> splat
>> + // and we have elements coming in from the splat at indices that are
>> not
>> + // conducive to using a merge.
>> + // Example:
>> + // vector_shuffle<0,17,1,19,2,21,3,23,4,25,5,27,6,29,7,31> t1, <zero>
>> + if (!isSplatBV(RHS))
>> + return Res;
>> +
>> + // We are looking for a mask such that all even elements are from
>> + // one vector and all odd elements from the other.
>> + if (!isAlternatingShuffMask(Mask, NumElts))
>> + return Res;
>> +
>> + // Adjust the mask so we are pulling in the same index from the splat
>> + // as the index from the interesting vector in consecutive elements.
>> + // Example:
>> + // vector_shuffle<0,16,1,17,2,18,3,19,4,20,5,21,6,22,7,23> t1, <zero>
>> + for (int i = 1, e = Mask.size(); i < e; i += 2)
>> + ShuffV[i] = (ShuffV[i - 1] + NumElts);
>> +
>> + Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, ShuffV);
>> + return Res;
>> +}
>> +
>> SDValue PPCTargetLowering::combineVReverseMemOP(ShuffleVectorSDNode *SVN,
>> LSBaseSDNode *LSBase,
>> DAGCombinerInfo &DCI)
>> const {
>> @@ -14223,7 +14441,7 @@ SDValue
>> PPCTargetLowering::PerformDAGCombine(SDNode *N,
>> LSBaseSDNode* LSBase = cast<LSBaseSDNode>(N->getOperand(0));
>> return combineVReverseMemOP(cast<ShuffleVectorSDNode>(N), LSBase,
>> DCI);
>> }
>> - break;
>> + return combineVectorShuffle(cast<ShuffleVectorSDNode>(N), DCI.DAG);
>> case ISD::STORE: {
>>
>> EVT Op1VT = N->getOperand(1).getValueType();
>>
>> diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h
>> b/llvm/lib/Target/PowerPC/PPCISelLowering.h
>> index 77252e919553..9f7c6ab53a17 100644
>> --- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
>> +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
>> @@ -221,6 +221,14 @@ namespace llvm {
>> /// As with SINT_VEC_TO_FP, used for converting illegal types.
>> UINT_VEC_TO_FP,
>>
>> + /// PowerPC instructions that have SCALAR_TO_VECTOR semantics tend to
>> + /// place the value into the least significant element of the most
>> + /// significant doubleword in the vector. This is not element zero
>> for
>> + /// anything smaller than a doubleword on either endianness. This
>> node has
>> + /// the same semantics as SCALAR_TO_VECTOR except that the value
>> remains in
>> + /// the aforementioned location in the vector register.
>> + SCALAR_TO_VECTOR_PERMUTED,
>> +
>> // FIXME: Remove these once the ANDI glue bug is fixed:
>> /// i1 = ANDI_rec_1_[EQ|GT]_BIT(i32 or i64 x) - Represents the
>> result of the
>> /// eq or gt bit of CR0 after executing andi. x, 1. This is used to
>> @@ -1215,6 +1223,8 @@ namespace llvm {
>> SDValue combineSetCC(SDNode *N, DAGCombinerInfo &DCI) const;
>> SDValue combineABS(SDNode *N, DAGCombinerInfo &DCI) const;
>> SDValue combineVSelect(SDNode *N, DAGCombinerInfo &DCI) const;
>> + SDValue combineVectorShuffle(ShuffleVectorSDNode *SVN,
>> + SelectionDAG &DAG) const;
>> SDValue combineVReverseMemOP(ShuffleVectorSDNode *SVN, LSBaseSDNode
>> *LSBase,
>> DAGCombinerInfo &DCI) const;
>>
>>
>> diff --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> index e7ec1808ec3b..c43b2716cb37 100644
>> --- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> +++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> @@ -138,6 +138,8 @@ def PPCldvsxlh : SDNode<"PPCISD::LD_VSX_LH",
>> SDT_PPCldvsxlh,
>> [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
>> def PPCldsplat : SDNode<"PPCISD::LD_SPLAT", SDT_PPCldsplat,
>> [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
>> +def PPCSToV : SDNode<"PPCISD::SCALAR_TO_VECTOR_PERMUTED",
>> + SDTypeProfile<1, 1, []>, []>;
>>
>> //-------------------------- Predicate definitions
>> ---------------------------//
>> def HasVSX : Predicate<"PPCSubTarget->hasVSX()">;
>> @@ -288,6 +290,11 @@ class X_XS6_RA5_RB5<bits<6> opcode, bits<10> xo,
>> string opc,
>> } // Predicates = HasP9Vector
>> } // AddedComplexity = 400, hasSideEffects = 0
>>
>> +multiclass ScalToVecWPermute<ValueType Ty, dag In, dag NonPermOut, dag
>> PermOut> {
>> + def : Pat<(Ty (scalar_to_vector In)), (Ty NonPermOut)>;
>> + def : Pat<(Ty (PPCSToV In)), (Ty PermOut)>;
>> +}
>> +
>> //-------------------------- Instruction definitions
>> -------------------------//
>> // VSX instructions require the VSX feature, they are to be selected over
>> // equivalent Altivec patterns (as they address a larger register set)
>> and
>> @@ -2710,12 +2717,14 @@ def : Pat<(v2i64 (build_vector DblToLong.A,
>> DblToLong.A)),
>> def : Pat<(v2i64 (build_vector DblToULong.A, DblToULong.A)),
>> (v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC),
>> (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC),
>> 0))>;
>> -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
>> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> - (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
>> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> - (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC),
>> 1))>;
>> +defm : ScalToVecWPermute<
>> + v4i32, FltToIntLoad.A,
>> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC),
>> 1),
>> + (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC)>;
>> +defm : ScalToVecWPermute<
>> + v4i32, FltToUIntLoad.A,
>> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC),
>> 1),
>> + (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC)>;
>> def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)),
>> (v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>;
>> def : Pat<(v2f64 (PPCldsplat xoaddr:$A)),
>> @@ -2730,10 +2739,12 @@ def : Pat<(v2i64 (build_vector FltToLong.A,
>> FltToLong.A)),
>> def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)),
>> (v2i64 (XXPERMDIs
>> (COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>;
>> -def : Pat<(v2i64 (scalar_to_vector DblToLongLoad.A)),
>> - (v2i64 (XVCVDPSXDS (LXVDSX xoaddr:$A)))>;
>> -def : Pat<(v2i64 (scalar_to_vector DblToULongLoad.A)),
>> - (v2i64 (XVCVDPUXDS (LXVDSX xoaddr:$A)))>;
>> +defm : ScalToVecWPermute<
>> + v2i64, DblToLongLoad.A,
>> + (XVCVDPSXDS (LXVDSX xoaddr:$A)), (XVCVDPSXDS (LXVDSX xoaddr:$A))>;
>> +defm : ScalToVecWPermute<
>> + v2i64, DblToULongLoad.A,
>> + (XVCVDPUXDS (LXVDSX xoaddr:$A)), (XVCVDPUXDS (LXVDSX xoaddr:$A))>;
>> } // HasVSX
>>
>> // Any big endian VSX subtarget.
>> @@ -2831,9 +2842,10 @@ def : Pat<WToDPExtractConv.BV13U,
>>
>> // Any little endian VSX subtarget.
>> let Predicates = [HasVSX, IsLittleEndian] in {
>> -def : Pat<(v2f64 (scalar_to_vector f64:$A)),
>> - (v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),
>> - (SUBREG_TO_REG (i64 1), $A, sub_64), 0))>;
>> +defm : ScalToVecWPermute<v2f64, (f64 f64:$A),
>> + (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),
>> + (SUBREG_TO_REG (i64 1), $A, sub_64),
>> 0),
>> + (SUBREG_TO_REG (i64 1), $A, sub_64)>;
>>
>> def : Pat<(f64 (extractelt v2f64:$S, 0)),
>> (f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;
>> @@ -2943,18 +2955,24 @@ def : Pat<(PPCstore_scal_int_from_vsr
>> (STXSDX (XSCVDPUXDS f64:$src), xoaddr:$dst)>;
>>
>> // Load-and-splat with fp-to-int conversion (using X-Form VSX/FP loads).
>> -def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)),
>> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> - (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)),
>> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> - (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)),
>> - (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
>> - (XFLOADf32 xoaddr:$A),
>> VSFRC)), 0))>;
>> -def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)),
>> - (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
>> - (XFLOADf32 xoaddr:$A),
>> VSFRC)), 0))>;
>> +defm : ScalToVecWPermute<
>> + v4i32, DblToIntLoad.A,
>> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC),
>> 1),
>> + (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC)>;
>> +defm : ScalToVecWPermute<
>> + v4i32, DblToUIntLoad.A,
>> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC),
>> 1),
>> + (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC)>;
>> +defm : ScalToVecWPermute<
>> + v2i64, FltToLongLoad.A,
>> + (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A),
>> VSFRC)), 0),
>> + (SUBREG_TO_REG (i64 1), (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32
>> xoaddr:$A),
>> + VSFRC)),
>> sub_64)>;
>> +defm : ScalToVecWPermute<
>> + v2i64, FltToULongLoad.A,
>> + (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A),
>> VSFRC)), 0),
>> + (SUBREG_TO_REG (i64 1), (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32
>> xoaddr:$A),
>> + VSFRC)),
>> sub_64)>;
>> } // HasVSX, NoP9Vector
>>
>> // Any VSX subtarget that only has loads and stores that load in big
>> endian
>> @@ -3156,8 +3174,12 @@ def : Pat<DWToSPExtractConv.El1US1,
>> (f64 (COPY_TO_REGCLASS $S1, VSRC)),
>> VSFRC)))>;
>>
>> // v4f32 scalar <-> vector conversions (LE)
>> -def : Pat<(v4f32 (scalar_to_vector f32:$A)),
>> - (v4f32 (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1))>;
>> + // The permuted version is no better than the version that puts the
>> value
>> + // into the right element because XSCVDPSPN is
>> diff erent from all the other
>> + // instructions used for PPCSToV.
>> + defm : ScalToVecWPermute<v4f32, (f32 f32:$A),
>> + (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1),
>> + (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 3)>;
>> def : Pat<(f32 (vector_extract v4f32:$S, 0)),
>> (f32 (XSCVSPDPN (XXSLDWI $S, $S, 3)))>;
>> def : Pat<(f32 (vector_extract v4f32:$S, 1)),
>> @@ -3189,18 +3211,25 @@ def : Pat<(f64 (PPCfcfid (f64 (PPCmtvsra (i32
>> (extractelt v4i32:$A, 3)))))),
>> // LIWAX - This instruction is used for sign extending i32 -> i64.
>> // LIWZX - This instruction will be emitted for i32, f32, and when
>> // zero-extending i32 to i64 (zext i32 -> i64).
>> -def : Pat<(v2i64 (scalar_to_vector (i64 (sextloadi32 xoaddr:$src)))),
>> - (v2i64 (XXPERMDIs
>> - (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2))>;
>> -def : Pat<(v2i64 (scalar_to_vector (i64 (zextloadi32 xoaddr:$src)))),
>> - (v2i64 (XXPERMDIs
>> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
>> -def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),
>> - (v4i32 (XXPERMDIs
>> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
>> -def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))),
>> - (v4f32 (XXPERMDIs
>> - (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
>> +defm : ScalToVecWPermute<
>> + v2i64, (i64 (sextloadi32 xoaddr:$src)),
>> + (XXPERMDIs (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2),
>> + (SUBREG_TO_REG (i64 1), (LIWAX xoaddr:$src), sub_64)>;
>> +
>> +defm : ScalToVecWPermute<
>> + v2i64, (i64 (zextloadi32 xoaddr:$src)),
>> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
>> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
>> +
>> +defm : ScalToVecWPermute<
>> + v4i32, (i32 (load xoaddr:$src)),
>> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
>> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
>> +
>> +defm : ScalToVecWPermute<
>> + v4f32, (f32 (load xoaddr:$src)),
>> + (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
>> + (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
>>
>> def : Pat<DWToSPExtractConv.BVU,
>> (v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3),
>> @@ -3336,14 +3365,17 @@ def : Pat<(i64 (vector_extract v2i64:$S,
>> i64:$Idx)),
>> // Little endian VSX subtarget with direct moves.
>> let Predicates = [HasVSX, HasDirectMove, IsLittleEndian] in {
>> // v16i8 scalar <-> vector conversions (LE)
>> - def : Pat<(v16i8 (scalar_to_vector i32:$A)),
>> - (v16i8 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;
>> - def : Pat<(v8i16 (scalar_to_vector i32:$A)),
>> - (v8i16 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;
>> - def : Pat<(v4i32 (scalar_to_vector i32:$A)),
>> - (v4i32 MovesToVSR.LE_WORD_0)>;
>> - def : Pat<(v2i64 (scalar_to_vector i64:$A)),
>> - (v2i64 MovesToVSR.LE_DWORD_0)>;
>> + defm : ScalToVecWPermute<v16i8, (i32 i32:$A),
>> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC),
>> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1,
>> VSRC)>;
>> + defm : ScalToVecWPermute<v8i16, (i32 i32:$A),
>> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC),
>> + (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1,
>> VSRC)>;
>> + defm : ScalToVecWPermute<v4i32, (i32 i32:$A), MovesToVSR.LE_WORD_0,
>> + (SUBREG_TO_REG (i64 1), (MTVSRWZ $A),
>> sub_64)>;
>> + defm : ScalToVecWPermute<v2i64, (i64 i64:$A), MovesToVSR.LE_DWORD_0,
>> + MovesToVSR.LE_DWORD_1>;
>> +
>> // v2i64 scalar <-> vector conversions (LE)
>> def : Pat<(i64 (vector_extract v2i64:$S, 0)),
>> (i64 VectorExtractions.LE_DWORD_0)>;
>> @@ -3641,30 +3673,41 @@ def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS,
>> xoaddr:$dst),
>> (STXVX $rS, xoaddr:$dst)>;
>>
>> // Build vectors from i8 loads
>> -def : Pat<(v16i8 (scalar_to_vector ScalarLoads.Li8)),
>> - (v16i8 (VSPLTBs 7, (LXSIBZX xoaddr:$src)))>;
>> -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.ZELi8)),
>> - (v8i16 (VSPLTHs 3, (LXSIBZX xoaddr:$src)))>;
>> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi8)),
>> - (v4i32 (XXSPLTWs (LXSIBZX xoaddr:$src), 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi8i64)),
>> - (v2i64 (XXPERMDIs (LXSIBZX xoaddr:$src), 0))>;
>> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi8)),
>> - (v4i32 (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi8i64)),
>> - (v2i64 (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), 0))>;
>> +defm : ScalToVecWPermute<v16i8, ScalarLoads.Li8,
>> + (VSPLTBs 7, (LXSIBZX xoaddr:$src)),
>> + (VSPLTBs 7, (LXSIBZX xoaddr:$src))>;
>> +defm : ScalToVecWPermute<v8i16, ScalarLoads.ZELi8,
>> + (VSPLTHs 3, (LXSIBZX xoaddr:$src)),
>> + (VSPLTHs 3, (LXSIBZX xoaddr:$src))>;
>> +defm : ScalToVecWPermute<v4i32, ScalarLoads.ZELi8,
>> + (XXSPLTWs (LXSIBZX xoaddr:$src), 1),
>> + (XXSPLTWs (LXSIBZX xoaddr:$src), 1)>;
>> +defm : ScalToVecWPermute<v2i64, ScalarLoads.ZELi8i64,
>> + (XXPERMDIs (LXSIBZX xoaddr:$src), 0),
>> + (XXPERMDIs (LXSIBZX xoaddr:$src), 0)>;
>> +defm : ScalToVecWPermute<v4i32, ScalarLoads.SELi8,
>> + (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1),
>> + (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)),
>> 1)>;
>> +defm : ScalToVecWPermute<v2i64, ScalarLoads.SELi8i64,
>> + (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)),
>> 0),
>> + (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)),
>> 0)>;
>>
>> // Build vectors from i16 loads
>> -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.Li16)),
>> - (v8i16 (VSPLTHs 3, (LXSIHZX xoaddr:$src)))>;
>> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi16)),
>> - (v4i32 (XXSPLTWs (LXSIHZX xoaddr:$src), 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
>> - (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>;
>> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi16)),
>> - (v4i32 (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi16i64)),
>> - (v2i64 (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), 0))>;
>> +defm : ScalToVecWPermute<v8i16, ScalarLoads.Li16,
>> + (VSPLTHs 3, (LXSIHZX xoaddr:$src)),
>> + (VSPLTHs 3, (LXSIHZX xoaddr:$src))>;
>> +defm : ScalToVecWPermute<v4i32, ScalarLoads.ZELi16,
>> + (XXSPLTWs (LXSIHZX xoaddr:$src), 1),
>> + (XXSPLTWs (LXSIHZX xoaddr:$src), 1)>;
>> +defm : ScalToVecWPermute<v2i64, ScalarLoads.ZELi16i64,
>> + (XXPERMDIs (LXSIHZX xoaddr:$src), 0),
>> + (XXPERMDIs (LXSIHZX xoaddr:$src), 0)>;
>> +defm : ScalToVecWPermute<v4i32, ScalarLoads.SELi16,
>> + (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1),
>> + (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)),
>> 1)>;
>> +defm : ScalToVecWPermute<v2i64, ScalarLoads.SELi16i64,
>> + (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)),
>> 0),
>> + (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)),
>> 0)>;
>>
>> // Load/convert and convert/store patterns for f16.
>> def : Pat<(f64 (extloadf16 xoaddr:$src)),
>> @@ -3806,8 +3849,7 @@ def : Pat<(f32 (PPCxsminc f32:$XA, f32:$XB)),
>> VSSRC))>;
>>
>> // Endianness-neutral patterns for const splats with ISA 3.0
>> instructions.
>> -def : Pat<(v4i32 (scalar_to_vector i32:$A)),
>> - (v4i32 (MTVSRWS $A))>;
>> +defm : ScalToVecWPermute<v4i32, (i32 i32:$A), (MTVSRWS $A), (MTVSRWS
>> $A)>;
>> def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
>> (v4i32 (MTVSRWS $A))>;
>> def : Pat<(v16i8 (build_vector immNonAllOneAnyExt8:$A,
>> immNonAllOneAnyExt8:$A,
>> @@ -3819,24 +3861,32 @@ def : Pat<(v16i8 (build_vector
>> immNonAllOneAnyExt8:$A, immNonAllOneAnyExt8:$A,
>> immNonAllOneAnyExt8:$A,
>> immNonAllOneAnyExt8:$A,
>> immNonAllOneAnyExt8:$A,
>> immNonAllOneAnyExt8:$A)),
>> (v16i8 (COPY_TO_REGCLASS (XXSPLTIB imm:$A), VSRC))>;
>> -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
>> - (v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>;
>> -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
>> - (v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>;
>> -def : Pat<(v4i32 (scalar_to_vector DblToIntLoadP9.A)),
>> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> - (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoadP9.A)),
>> - (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> - (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector FltToLongLoadP9.A)),
>> - (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
>> - (DFLOADf32 iaddrX4:$A),
>> - VSFRC)), 0))>;
>> -def : Pat<(v2i64 (scalar_to_vector FltToULongLoadP9.A)),
>> - (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
>> - (DFLOADf32 iaddrX4:$A),
>> - VSFRC)), 0))>;
>> +defm : ScalToVecWPermute<v4i32, FltToIntLoad.A,
>> + (XVCVSPSXWS (LXVWSX xoaddr:$A)),
>> + (XVCVSPSXWS (LXVWSX xoaddr:$A))>;
>> +defm : ScalToVecWPermute<v4i32, FltToUIntLoad.A,
>> + (XVCVSPUXWS (LXVWSX xoaddr:$A)),
>> + (XVCVSPUXWS (LXVWSX xoaddr:$A))>;
>> +defm : ScalToVecWPermute<
>> + v4i32, DblToIntLoadP9.A,
>> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC),
>> 1),
>> + (SUBREG_TO_REG (i64 1), (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), sub_64)>;
>> +defm : ScalToVecWPermute<
>> + v4i32, DblToUIntLoadP9.A,
>> + (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC),
>> 1),
>> + (SUBREG_TO_REG (i64 1), (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), sub_64)>;
>> +defm : ScalToVecWPermute<
>> + v2i64, FltToLongLoadP9.A,
>> + (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A),
>> VSFRC)), 0),
>> + (SUBREG_TO_REG
>> + (i64 1),
>> + (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)),
>> sub_64)>;
>> +defm : ScalToVecWPermute<
>> + v2i64, FltToULongLoadP9.A,
>> + (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A),
>> VSFRC)), 0),
>> + (SUBREG_TO_REG
>> + (i64 1),
>> + (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)),
>> sub_64)>;
>> def : Pat<(v4f32 (PPCldsplat xoaddr:$A)),
>> (v4f32 (LXVWSX xoaddr:$A))>;
>> def : Pat<(v4i32 (PPCldsplat xoaddr:$A)),
>> @@ -4116,19 +4166,23 @@ def : Pat<(truncstorei16 (i32 (vector_extract
>> v8i16:$S, 6)), xoaddr:$dst),
>> def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 7)),
>> xoaddr:$dst),
>> (STXSIHXv (COPY_TO_REGCLASS (v16i8 (VSLDOI $S, $S, 10)),
>> VSRC), xoaddr:$dst)>;
>>
>> -def : Pat<(v2i64 (scalar_to_vector (i64 (load iaddrX4:$src)))),
>> - (v2i64 (XXPERMDIs
>> - (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>;
>> -def : Pat<(v2i64 (scalar_to_vector (i64 (load xaddrX4:$src)))),
>> - (v2i64 (XXPERMDIs
>> - (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>;
>> +defm : ScalToVecWPermute<
>> + v2i64, (i64 (load iaddrX4:$src)),
>> + (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2),
>> + (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>;
>> +defm : ScalToVecWPermute<
>> + v2i64, (i64 (load xaddrX4:$src)),
>> + (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2),
>> + (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>;
>> +defm : ScalToVecWPermute<
>> + v2f64, (f64 (load iaddrX4:$src)),
>> + (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2),
>> + (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>;
>> +defm : ScalToVecWPermute<
>> + v2f64, (f64 (load xaddrX4:$src)),
>> + (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2),
>> + (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>;
>>
>> -def : Pat<(v2f64 (scalar_to_vector (f64 (load iaddrX4:$src)))),
>> - (v2f64 (XXPERMDIs
>> - (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>;
>> -def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddrX4:$src)))),
>> - (v2f64 (XXPERMDIs
>> - (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>;
>> def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddrX4:$src),
>> (XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
>> sub_64), xaddrX4:$src)>;
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> index 8c9ffa815467..4d06571d0ec7 100644
>> --- a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> +++ b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> @@ -13,8 +13,7 @@ define void @testExpandPostRAPseudo(i32* nocapture
>> readonly %ptr) {
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8: lfiwzx f0, 0, r3
>> ; CHECK-P8: ld r4, .LC0 at toc@l(r4)
>> -; CHECK-P8: xxswapd vs0, f0
>> -; CHECK-P8: xxspltw v2, vs0, 3
>> +; CHECK-P8: xxspltw v2, vs0, 1
>> ; CHECK-P8: stvx v2, 0, r4
>> ; CHECK-P8: lis r4, 1024
>> ; CHECK-P8: lfiwax f0, 0, r3
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> index ee0cc41ea6bd..1cb7d7b62055 100644
>> --- a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> +++ b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> @@ -1282,8 +1282,7 @@ define <4 x i32> @spltMemVali(i32* nocapture
>> readonly %ptr) {
>> ; P8LE-LABEL: spltMemVali:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfiwzx f0, 0, r3
>> -; P8LE-NEXT: xxswapd vs0, f0
>> -; P8LE-NEXT: xxspltw v2, vs0, 3
>> +; P8LE-NEXT: xxspltw v2, vs0, 1
>> ; P8LE-NEXT: blr
>> entry:
>> %0 = load i32, i32* %ptr, align 4
>> @@ -2801,8 +2800,7 @@ define <4 x i32> @spltMemValui(i32* nocapture
>> readonly %ptr) {
>> ; P8LE-LABEL: spltMemValui:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfiwzx f0, 0, r3
>> -; P8LE-NEXT: xxswapd vs0, f0
>> -; P8LE-NEXT: xxspltw v2, vs0, 3
>> +; P8LE-NEXT: xxspltw v2, vs0, 1
>> ; P8LE-NEXT: blr
>> entry:
>> %0 = load i32, i32* %ptr, align 4
>> @@ -4573,7 +4571,7 @@ define <2 x i64> @spltMemValConvftoll(float*
>> nocapture readonly %ptr) {
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfs f0, 0(r3)
>> ; P9LE-NEXT: xscvdpsxds f0, f0
>> -; P9LE-NEXT: xxspltd v2, f0, 0
>> +; P9LE-NEXT: xxspltd v2, vs0, 0
>> ; P9LE-NEXT: blr
>> ;
>> ; P8BE-LABEL: spltMemValConvftoll:
>> @@ -4587,7 +4585,7 @@ define <2 x i64> @spltMemValConvftoll(float*
>> nocapture readonly %ptr) {
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfsx f0, 0, r3
>> ; P8LE-NEXT: xscvdpsxds f0, f0
>> -; P8LE-NEXT: xxspltd v2, f0, 0
>> +; P8LE-NEXT: xxspltd v2, vs0, 0
>> ; P8LE-NEXT: blr
>> entry:
>> %0 = load float, float* %ptr, align 4
>> @@ -5761,7 +5759,7 @@ define <2 x i64> @spltMemValConvftoull(float*
>> nocapture readonly %ptr) {
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfs f0, 0(r3)
>> ; P9LE-NEXT: xscvdpuxds f0, f0
>> -; P9LE-NEXT: xxspltd v2, f0, 0
>> +; P9LE-NEXT: xxspltd v2, vs0, 0
>> ; P9LE-NEXT: blr
>> ;
>> ; P8BE-LABEL: spltMemValConvftoull:
>> @@ -5775,7 +5773,7 @@ define <2 x i64> @spltMemValConvftoull(float*
>> nocapture readonly %ptr) {
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfsx f0, 0, r3
>> ; P8LE-NEXT: xscvdpuxds f0, f0
>> -; P8LE-NEXT: xxspltd v2, f0, 0
>> +; P8LE-NEXT: xxspltd v2, vs0, 0
>> ; P8LE-NEXT: blr
>> entry:
>> %0 = load float, float* %ptr, align 4
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> index 2ffe98e1f694..7fac0511e3c5 100644
>> --- a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> +++ b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> @@ -23,18 +23,12 @@ entry:
>> define dso_local <16 x i8> @testmrghb2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: testmrghb2:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: addis r3, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT: addi r3, r3, .LCPI1_0 at toc@l
>> -; CHECK-P8-NEXT: lvx v4, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testmrghb2:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: addis r3, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r3, r3, .LCPI1_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v4, 0, r3
>> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P9-NEXT: blr
>> entry:
>> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 24, i32 8, i32 25, i32 9, i32 26, i32 10, i32 27, i32 11, i32 28, i32 12,
>> i32 29, i32 13, i32 30, i32 14, i32 31, i32 15>
>> @@ -57,18 +51,12 @@ entry:
>> define dso_local <16 x i8> @testmrghh2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: testmrghh2:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: addis r3, r2, .LCPI3_0 at toc@ha
>> -; CHECK-P8-NEXT: addi r3, r3, .LCPI3_0 at toc@l
>> -; CHECK-P8-NEXT: lvx v4, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testmrghh2:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: addis r3, r2, .LCPI3_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r3, r3, .LCPI3_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v4, 0, r3
>> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P9-NEXT: blr
>> entry:
>> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 24, i32 25, i32 8, i32 9, i32 26, i32 27, i32 10, i32 11, i32 28, i32 29,
>> i32 12, i32 13, i32 30, i32 31, i32 14, i32 15>
>> @@ -91,18 +79,12 @@ entry:
>> define dso_local <16 x i8> @testmrglb2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: testmrglb2:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: addis r3, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P8-NEXT: addi r3, r3, .LCPI5_0 at toc@l
>> -; CHECK-P8-NEXT: lvx v4, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT: vmrglb v2, v2, v3
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testmrglb2:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: addis r3, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r3, r3, .LCPI5_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v4, 0, r3
>> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT: vmrglb v2, v2, v3
>> ; CHECK-P9-NEXT: blr
>> entry:
>> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 16, i32 0, i32 17, i32 1, i32 18, i32 2, i32 19, i32 3, i32 20, i32 4, i32
>> 21, i32 5, i32 22, i32 6, i32 23, i32 7>
>> @@ -125,18 +107,12 @@ entry:
>> define dso_local <16 x i8> @testmrglh2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: testmrglh2:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: addis r3, r2, .LCPI7_0 at toc@ha
>> -; CHECK-P8-NEXT: addi r3, r3, .LCPI7_0 at toc@l
>> -; CHECK-P8-NEXT: lvx v4, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT: vmrglh v2, v2, v3
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testmrglh2:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: addis r3, r2, .LCPI7_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r3, r3, .LCPI7_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v4, 0, r3
>> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> ; CHECK-P9-NEXT: blr
>> entry:
>> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 16, i32 17, i32 0, i32 1, i32 18, i32 19, i32 2, i32 3, i32 20, i32 21, i32
>> 4, i32 5, i32 22, i32 23, i32 6, i32 7>
>> @@ -159,18 +135,12 @@ entry:
>> define dso_local <16 x i8> @testmrghw2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: testmrghw2:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: addis r3, r2, .LCPI9_0 at toc@ha
>> -; CHECK-P8-NEXT: addi r3, r3, .LCPI9_0 at toc@l
>> -; CHECK-P8-NEXT: lvx v4, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testmrghw2:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: addis r3, r2, .LCPI9_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r3, r3, .LCPI9_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v4, 0, r3
>> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P9-NEXT: blr
>> entry:
>> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 24, i32 25, i32 26, i32 27, i32 8, i32 9, i32 10, i32 11, i32 28, i32 29,
>> i32 30, i32 31, i32 12, i32 13, i32 14, i32 15>
>> @@ -193,18 +163,12 @@ entry:
>> define dso_local <16 x i8> @testmrglw2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: testmrglw2:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: addis r3, r2, .LCPI11_0 at toc@ha
>> -; CHECK-P8-NEXT: addi r3, r3, .LCPI11_0 at toc@l
>> -; CHECK-P8-NEXT: lvx v4, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testmrglw2:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: addis r3, r2, .LCPI11_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r3, r3, .LCPI11_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v4, 0, r3
>> -; CHECK-P9-NEXT: vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> ; CHECK-P9-NEXT: blr
>> entry:
>> %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 16, i32 17, i32 18, i32 19, i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32
>> 22, i32 23, i32 4, i32 5, i32 6, i32 7>
>> @@ -215,24 +179,16 @@ define dso_local <8 x i16> @testmrglb3(<8 x i8>*
>> nocapture readonly %a) local_un
>> ; CHECK-P8-LABEL: testmrglb3:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: ld r3, 0(r3)
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI12_0 at toc@ha
>> -; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: addi r3, r4, .LCPI12_0 at toc@l
>> -; CHECK-P8-NEXT: lvx v3, 0, r3
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: vperm v2, v2, v4, v3
>> +; CHECK-P8-NEXT: xxlxor v2, v2, v2
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> +; CHECK-P8-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testmrglb3:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: lfd f0, 0(r3)
>> -; CHECK-P9-NEXT: addis r3, r2, .LCPI12_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r3, r3, .LCPI12_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v3, 0, r3
>> -; CHECK-P9-NEXT: xxswapd v2, f0
>> -; CHECK-P9-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P9-NEXT: vperm v2, v2, v4, v3
>> +; CHECK-P9-NEXT: lxsd v2, 0(r3)
>> +; CHECK-P9-NEXT: xxlxor v3, v3, v3
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P9-NEXT: blr
>> entry:
>> %0 = load <8 x i8>, <8 x i8>* %a, align 8
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> index a23db59635a4..3a43b3584caf 100644
>> --- a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> +++ b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> @@ -331,12 +331,12 @@ define <2 x float> @fptrunc_v2f32_v2f64(<2 x
>> double> %vf1) {
>> ; P9: # %bb.0:
>> ; P9-NEXT: xsrsp f0, v2
>> ; P9-NEXT: xscvdpspn vs0, f0
>> -; P9-NEXT: xxsldwi v3, vs0, vs0, 1
>> +; P9-NEXT: xxsldwi v3, vs0, vs0, 3
>> ; P9-NEXT: xxswapd vs0, v2
>> ; P9-NEXT: xsrsp f0, f0
>> ; P9-NEXT: xscvdpspn vs0, f0
>> -; P9-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; P9-NEXT: vmrglw v2, v3, v2
>> +; P9-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; P9-NEXT: vmrghw v2, v3, v2
>> ; P9-NEXT: blr
>> %res = call <2 x float>
>> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(
>> <2 x double> %vf1,
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> b/llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> index f411712ba3fa..26da1fdaefef 100644
>> --- a/llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> +++ b/llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> @@ -40,8 +40,7 @@ define dso_local void @test2(<4 x float>* nocapture %c,
>> float* nocapture readonl
>> ; P8: # %bb.0: # %entry
>> ; P8-NEXT: addi r4, r4, 12
>> ; P8-NEXT: lfiwzx f0, 0, r4
>> -; P8-NEXT: xxswapd vs0, f0
>> -; P8-NEXT: xxspltw v2, vs0, 3
>> +; P8-NEXT: xxspltw v2, vs0, 1
>> ; P8-NEXT: stvx v2, 0, r3
>> ; P8-NEXT: blr
>> entry:
>> @@ -65,8 +64,7 @@ define dso_local void @test3(<4 x i32>* nocapture %c,
>> i32* nocapture readonly %a
>> ; P8: # %bb.0: # %entry
>> ; P8-NEXT: addi r4, r4, 12
>> ; P8-NEXT: lfiwzx f0, 0, r4
>> -; P8-NEXT: xxswapd vs0, f0
>> -; P8-NEXT: xxspltw v2, vs0, 3
>> +; P8-NEXT: xxspltw v2, vs0, 1
>> ; P8-NEXT: stvx v2, 0, r3
>> ; P8-NEXT: blr
>> entry:
>> @@ -110,8 +108,7 @@ define <16 x i8> @unadjusted_lxvwsx(i32* %s, i32* %t)
>> {
>> ; P8-LABEL: unadjusted_lxvwsx:
>> ; P8: # %bb.0: # %entry
>> ; P8-NEXT: lfiwzx f0, 0, r3
>> -; P8-NEXT: xxswapd vs0, f0
>> -; P8-NEXT: xxspltw v2, vs0, 3
>> +; P8-NEXT: xxspltw v2, vs0, 1
>> ; P8-NEXT: blr
>> entry:
>> %0 = bitcast i32* %s to <4 x i8>*
>> @@ -131,8 +128,7 @@ define <16 x i8> @adjusted_lxvwsx(i64* %s, i64* %t) {
>> ; P8: # %bb.0: # %entry
>> ; P8-NEXT: ld r3, 0(r3)
>> ; P8-NEXT: mtfprd f0, r3
>> -; P8-NEXT: xxswapd v2, vs0
>> -; P8-NEXT: xxspltw v2, v2, 2
>> +; P8-NEXT: xxspltw v2, vs0, 0
>> ; P8-NEXT: blr
>> entry:
>> %0 = bitcast i64* %s to <8 x i8>*
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> index 409978549c36..a03ab5f9519e 100644
>> --- a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> +++ b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> @@ -9,8 +9,7 @@ define <16 x i8> @test(i32* %s, i32* %t) {
>> ; CHECK-LE-LABEL: test:
>> ; CHECK-LE: # %bb.0: # %entry
>> ; CHECK-LE-NEXT: lfiwzx f0, 0, r3
>> -; CHECK-LE-NEXT: xxswapd vs0, f0
>> -; CHECK-LE-NEXT: xxspltw v2, vs0, 3
>> +; CHECK-LE-NEXT: xxspltw v2, vs0, 1
>> ; CHECK-LE-NEXT: blr
>>
>> ; CHECK-LABEL: test:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> index e1f0e827b9f6..dffa0fb98fc0 100644
>> --- a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> +++ b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> @@ -21,8 +21,8 @@ entry:
>> ; CHECK: sldi r3, r3, 56
>> ; CHECK: mtvsrd v2, r3
>> ; CHECK-LE-LABEL: buildc
>> -; CHECK-LE: mtfprd f0, r3
>> -; CHECK-LE: xxswapd v2, vs0
>> +; CHECK-LE: mtvsrd v2, r3
>> +; CHECK-LE: vspltb v2, v2, 7
>> }
>>
>> ; Function Attrs: norecurse nounwind readnone
>> @@ -35,8 +35,8 @@ entry:
>> ; CHECK: sldi r3, r3, 48
>> ; CHECK: mtvsrd v2, r3
>> ; CHECK-LE-LABEL: builds
>> -; CHECK-LE: mtfprd f0, r3
>> -; CHECK-LE: xxswapd v2, vs0
>> +; CHECK-LE: mtvsrd v2, r3
>> +; CHECK-LE: vsplth v2, v2, 3
>> }
>>
>> ; Function Attrs: norecurse nounwind readnone
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/pr25080.ll
>> b/llvm/test/CodeGen/PowerPC/pr25080.ll
>> index 7a2fb76fd453..f87cb5b940ca 100644
>> --- a/llvm/test/CodeGen/PowerPC/pr25080.ll
>> +++ b/llvm/test/CodeGen/PowerPC/pr25080.ll
>> @@ -17,41 +17,33 @@ define <8 x i16> @pr25080(<8 x i32> %a) {
>> ; LE-NEXT: mfvsrwz 3, 34
>> ; LE-NEXT: xxsldwi 1, 34, 34, 1
>> ; LE-NEXT: mfvsrwz 4, 35
>> -; LE-NEXT: xxsldwi 4, 34, 34, 3
>> -; LE-NEXT: mtfprd 2, 3
>> +; LE-NEXT: xxsldwi 2, 34, 34, 3
>> +; LE-NEXT: mtvsrd 36, 3
>> ; LE-NEXT: mffprwz 3, 0
>> ; LE-NEXT: xxswapd 0, 35
>> -; LE-NEXT: mtfprd 3, 4
>> -; LE-NEXT: xxsldwi 5, 35, 35, 1
>> +; LE-NEXT: mtvsrd 37, 4
>> ; LE-NEXT: mffprwz 4, 1
>> -; LE-NEXT: xxsldwi 7, 35, 35, 3
>> -; LE-NEXT: mtfprd 1, 3
>> -; LE-NEXT: xxswapd 33, 3
>> -; LE-NEXT: mffprwz 3, 4
>> -; LE-NEXT: mtfprd 4, 4
>> -; LE-NEXT: xxswapd 34, 1
>> +; LE-NEXT: xxsldwi 1, 35, 35, 1
>> +; LE-NEXT: mtvsrd 34, 3
>> +; LE-NEXT: mffprwz 3, 2
>> +; LE-NEXT: mtvsrd 32, 4
>> ; LE-NEXT: mffprwz 4, 0
>> -; LE-NEXT: mtfprd 0, 3
>> -; LE-NEXT: xxswapd 35, 4
>> -; LE-NEXT: mffprwz 3, 5
>> -; LE-NEXT: mtfprd 6, 4
>> -; LE-NEXT: xxswapd 36, 0
>> -; LE-NEXT: mtfprd 1, 3
>> -; LE-NEXT: mffprwz 3, 7
>> -; LE-NEXT: xxswapd 37, 6
>> -; LE-NEXT: vmrglh 2, 3, 2
>> -; LE-NEXT: xxswapd 35, 2
>> -; LE-NEXT: mtfprd 2, 3
>> -; LE-NEXT: xxswapd 32, 1
>> +; LE-NEXT: xxsldwi 0, 35, 35, 3
>> +; LE-NEXT: mtvsrd 33, 3
>> +; LE-NEXT: mffprwz 3, 1
>> +; LE-NEXT: mtvsrd 38, 4
>> +; LE-NEXT: mtvsrd 35, 3
>> +; LE-NEXT: mffprwz 3, 0
>> +; LE-NEXT: vmrghh 2, 0, 2
>> +; LE-NEXT: mtvsrd 32, 3
>> ; LE-NEXT: addis 3, 2, .LCPI0_1 at toc@ha
>> +; LE-NEXT: vmrghh 4, 1, 4
>> ; LE-NEXT: addi 3, 3, .LCPI0_1 at toc@l
>> -; LE-NEXT: xxswapd 38, 2
>> -; LE-NEXT: vmrglh 3, 4, 3
>> -; LE-NEXT: vmrglh 4, 0, 5
>> -; LE-NEXT: vmrglh 5, 6, 1
>> -; LE-NEXT: vmrglw 2, 3, 2
>> -; LE-NEXT: vmrglw 3, 5, 4
>> +; LE-NEXT: vmrghh 3, 3, 6
>> +; LE-NEXT: vmrghh 5, 0, 5
>> +; LE-NEXT: vmrglw 2, 4, 2
>> ; LE-NEXT: vspltish 4, 15
>> +; LE-NEXT: vmrglw 3, 5, 3
>> ; LE-NEXT: xxmrgld 34, 35, 34
>> ; LE-NEXT: lvx 3, 0, 3
>> ; LE-NEXT: xxlor 34, 34, 35
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> index 4c10c3813fb5..d3bfb910fc9f 100644
>> --- a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> +++ b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> @@ -58,12 +58,11 @@ L.LB38_2452:
>>
>> ; CHECK-LABEL: @aercalc_
>> ; CHECK: lfs
>> -; CHECK: xxspltd
>> +; CHECK: xxswapd
>> ; CHECK: stxvd2x
>> ; CHECK-NOT: xxswapd
>>
>> ; CHECK-P9-LABEL: @aercalc_
>> ; CHECK-P9: lfs
>> -; CHECK-P9: xxspltd
>> ; CHECK-P9: stxv
>> ; CHECK-P9-NOT: xxswapd
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/pr38087.ll
>> b/llvm/test/CodeGen/PowerPC/pr38087.ll
>> index e05a3d2b97aa..49b3d39bc18c 100644
>> --- a/llvm/test/CodeGen/PowerPC/pr38087.ll
>> +++ b/llvm/test/CodeGen/PowerPC/pr38087.ll
>> @@ -11,9 +11,8 @@ declare { i32, i1 } @llvm.usub.with.overflow.i32(i32,
>> i32) #0
>> define void @draw_llvm_vs_variant0(<4 x float> %x) {
>> ; CHECK-LABEL: draw_llvm_vs_variant0:
>> ; CHECK: # %bb.0: # %entry
>> -; CHECK-NEXT: lfd f0, 0(r3)
>> -; CHECK-NEXT: xxswapd v3, f0
>> -; CHECK-NEXT: vmrglh v3, v3, v3
>> +; CHECK-NEXT: lxsd v3, 0(r3)
>> +; CHECK-NEXT: vmrghh v3, v3, v3
>> ; CHECK-NEXT: vextsh2w v3, v3
>> ; CHECK-NEXT: xvcvsxwsp vs0, v3
>> ; CHECK-NEXT: xxspltw vs0, vs0, 2
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> index 4c9137d86124..6584cb74bdb5 100644
>> --- a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> +++ b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> @@ -11,34 +11,31 @@
>> define signext i32 @test_pre_inc_disable_1(i8* nocapture readonly %pix1,
>> i32 signext %i_stride_pix1, i8* nocapture readonly %pix2) {
>> ; CHECK-LABEL: test_pre_inc_disable_1:
>> ; CHECK: # %bb.0: # %entry
>> -; CHECK-NEXT: lfd f0, 0(r5)
>> +; CHECK-NEXT: lxsd v5, 0(r5)
>> ; CHECK-NEXT: addis r5, r2, .LCPI0_0 at toc@ha
>> ; CHECK-NEXT: addi r5, r5, .LCPI0_0 at toc@l
>> ; CHECK-NEXT: lxvx v2, 0, r5
>> ; CHECK-NEXT: addis r5, r2, .LCPI0_1 at toc@ha
>> ; CHECK-NEXT: addi r5, r5, .LCPI0_1 at toc@l
>> ; CHECK-NEXT: lxvx v4, 0, r5
>> -; CHECK-NEXT: xxswapd v5, f0
>> -; CHECK-NEXT: xxlxor v3, v3, v3
>> ; CHECK-NEXT: li r5, 4
>> +; CHECK-NEXT: xxlxor v3, v3, v3
>> ; CHECK-NEXT: vperm v0, v3, v5, v2
>> ; CHECK-NEXT: mtctr r5
>> ; CHECK-NEXT: li r5, 0
>> -; CHECK-NEXT: vperm v1, v5, v3, v4
>> +; CHECK-NEXT: vperm v1, v3, v5, v4
>> ; CHECK-NEXT: li r6, 0
>> ; CHECK-NEXT: xvnegsp v5, v0
>> ; CHECK-NEXT: xvnegsp v0, v1
>> ; CHECK-NEXT: .p2align 4
>> ; CHECK-NEXT: .LBB0_1: # %for.cond1.preheader
>> ; CHECK-NEXT: #
>> -; CHECK-NEXT: lfd f0, 0(r3)
>> -; CHECK-NEXT: xxswapd v1, f0
>> -; CHECK-NEXT: lfdx f0, r3, r4
>> -; CHECK-NEXT: vperm v6, v1, v3, v4
>> +; CHECK-NEXT: lxsd v1, 0(r3)
>> +; CHECK-NEXT: vperm v6, v3, v1, v4
>> ; CHECK-NEXT: vperm v1, v3, v1, v2
>> ; CHECK-NEXT: xvnegsp v1, v1
>> -; CHECK-NEXT: add r7, r3, r4
>> ; CHECK-NEXT: xvnegsp v6, v6
>> +; CHECK-NEXT: add r7, r3, r4
>> ; CHECK-NEXT: vabsduw v1, v1, v5
>> ; CHECK-NEXT: vabsduw v6, v6, v0
>> ; CHECK-NEXT: vadduwm v1, v6, v1
>> @@ -46,15 +43,14 @@ define signext i32 @test_pre_inc_disable_1(i8*
>> nocapture readonly %pix1, i32 sig
>> ; CHECK-NEXT: vadduwm v1, v1, v6
>> ; CHECK-NEXT: xxspltw v6, v1, 2
>> ; CHECK-NEXT: vadduwm v1, v1, v6
>> -; CHECK-NEXT: xxswapd v6, f0
>> +; CHECK-NEXT: lxsdx v6, r3, r4
>> ; CHECK-NEXT: vextuwrx r3, r5, v1
>> -; CHECK-NEXT: vperm v7, v6, v3, v4
>> +; CHECK-NEXT: vperm v7, v3, v6, v4
>> ; CHECK-NEXT: vperm v6, v3, v6, v2
>> -; CHECK-NEXT: add r6, r3, r6
>> -; CHECK-NEXT: add r3, r7, r4
>> ; CHECK-NEXT: xvnegsp v6, v6
>> ; CHECK-NEXT: xvnegsp v1, v7
>> ; CHECK-NEXT: vabsduw v6, v6, v5
>> +; CHECK-NEXT: add r6, r3, r6
>> ; CHECK-NEXT: vabsduw v1, v1, v0
>> ; CHECK-NEXT: vadduwm v1, v1, v6
>> ; CHECK-NEXT: xxswapd v6, v1
>> @@ -62,6 +58,7 @@ define signext i32 @test_pre_inc_disable_1(i8*
>> nocapture readonly %pix1, i32 sig
>> ; CHECK-NEXT: xxspltw v6, v1, 2
>> ; CHECK-NEXT: vadduwm v1, v1, v6
>> ; CHECK-NEXT: vextuwrx r8, r5, v1
>> +; CHECK-NEXT: add r3, r7, r4
>> ; CHECK-NEXT: add r6, r8, r6
>> ; CHECK-NEXT: bdnz .LBB0_1
>> ; CHECK-NEXT: # %bb.2: # %for.cond.cleanup
>> @@ -181,29 +178,27 @@ for.cond.cleanup: ;
>> preds = %for.cond1.preheader
>> define signext i32 @test_pre_inc_disable_2(i8* nocapture readonly %pix1,
>> i8* nocapture readonly %pix2) {
>> ; CHECK-LABEL: test_pre_inc_disable_2:
>> ; CHECK: # %bb.0: # %entry
>> -; CHECK-NEXT: lfd f0, 0(r3)
>> +; CHECK-NEXT: lxsd v2, 0(r3)
>> ; CHECK-NEXT: addis r3, r2, .LCPI1_0 at toc@ha
>> ; CHECK-NEXT: addi r3, r3, .LCPI1_0 at toc@l
>> ; CHECK-NEXT: lxvx v4, 0, r3
>> ; CHECK-NEXT: addis r3, r2, .LCPI1_1 at toc@ha
>> -; CHECK-NEXT: xxswapd v2, f0
>> -; CHECK-NEXT: lfd f0, 0(r4)
>> ; CHECK-NEXT: addi r3, r3, .LCPI1_1 at toc@l
>> -; CHECK-NEXT: xxlxor v3, v3, v3
>> ; CHECK-NEXT: lxvx v0, 0, r3
>> -; CHECK-NEXT: xxswapd v1, f0
>> -; CHECK-NEXT: vperm v5, v2, v3, v4
>> +; CHECK-NEXT: lxsd v1, 0(r4)
>> +; CHECK-NEXT: xxlxor v3, v3, v3
>> +; CHECK-NEXT: vperm v5, v3, v2, v4
>> ; CHECK-NEXT: vperm v2, v3, v2, v0
>> ; CHECK-NEXT: vperm v0, v3, v1, v0
>> -; CHECK-NEXT: vperm v3, v1, v3, v4
>> +; CHECK-NEXT: vperm v3, v3, v1, v4
>> ; CHECK-NEXT: vabsduw v2, v2, v0
>> ; CHECK-NEXT: vabsduw v3, v5, v3
>> ; CHECK-NEXT: vadduwm v2, v3, v2
>> ; CHECK-NEXT: xxswapd v3, v2
>> -; CHECK-NEXT: li r3, 0
>> ; CHECK-NEXT: vadduwm v2, v2, v3
>> ; CHECK-NEXT: xxspltw v3, v2, 2
>> ; CHECK-NEXT: vadduwm v2, v2, v3
>> +; CHECK-NEXT: li r3, 0
>> ; CHECK-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-NEXT: extsw r3, r3
>> ; CHECK-NEXT: blr
>> @@ -286,16 +281,14 @@ define void @test32(i8* nocapture readonly %pix2,
>> i32 signext %i_pix2) {
>> ; CHECK-LABEL: test32:
>> ; CHECK: # %bb.0: # %entry
>> ; CHECK-NEXT: add r5, r3, r4
>> -; CHECK-NEXT: lfiwzx f0, r3, r4
>> +; CHECK-NEXT: lxsiwzx v2, r3, r4
>> ; CHECK-NEXT: addis r3, r2, .LCPI2_0 at toc@ha
>> ; CHECK-NEXT: addi r3, r3, .LCPI2_0 at toc@l
>> ; CHECK-NEXT: lxvx v4, 0, r3
>> ; CHECK-NEXT: li r3, 4
>> -; CHECK-NEXT: xxswapd v2, f0
>> -; CHECK-NEXT: lfiwzx f0, r5, r3
>> +; CHECK-NEXT: lxsiwzx v5, r5, r3
>> ; CHECK-NEXT: xxlxor v3, v3, v3
>> ; CHECK-NEXT: vperm v2, v2, v3, v4
>> -; CHECK-NEXT: xxswapd v5, f0
>> ; CHECK-NEXT: vperm v3, v5, v3, v4
>> ; CHECK-NEXT: vspltisw v4, 8
>> ; CHECK-NEXT: vnegw v3, v3
>> @@ -361,16 +354,15 @@ define void @test16(i16* nocapture readonly %sums,
>> i32 signext %delta, i32 signe
>> ; CHECK-NEXT: lxsihzx v2, r6, r7
>> ; CHECK-NEXT: lxsihzx v4, r3, r4
>> ; CHECK-NEXT: li r6, 0
>> -; CHECK-NEXT: mtfprd f0, r6
>> +; CHECK-NEXT: mtvsrd v3, r6
>> ; CHECK-NEXT: vsplth v4, v4, 3
>> -; CHECK-NEXT: xxswapd v3, vs0
>> ; CHECK-NEXT: vsplth v2, v2, 3
>> ; CHECK-NEXT: addis r3, r2, .LCPI3_0 at toc@ha
>> ; CHECK-NEXT: addi r3, r3, .LCPI3_0 at toc@l
>> -; CHECK-NEXT: vmrglh v2, v3, v2
>> -; CHECK-NEXT: vmrglh v3, v3, v4
>> -; CHECK-NEXT: xxlxor v4, v4, v4
>> -; CHECK-NEXT: vmrglw v3, v3, v4
>> +; CHECK-NEXT: vmrghh v4, v3, v4
>> +; CHECK-NEXT: vmrghh v2, v3, v2
>> +; CHECK-NEXT: vsplth v3, v3, 3
>> +; CHECK-NEXT: vmrglw v3, v4, v3
>> ; CHECK-NEXT: lxvx v4, 0, r3
>> ; CHECK-NEXT: li r3, 0
>> ; CHECK-NEXT: vperm v2, v2, v3, v4
>> @@ -446,18 +438,17 @@ define void @test8(i8* nocapture readonly %sums,
>> i32 signext %delta, i32 signext
>> ; CHECK-NEXT: add r6, r3, r4
>> ; CHECK-NEXT: lxsibzx v2, r3, r4
>> ; CHECK-NEXT: li r3, 0
>> -; CHECK-NEXT: mtfprd f0, r3
>> +; CHECK-NEXT: mtvsrd v3, r3
>> ; CHECK-NEXT: li r3, 8
>> ; CHECK-NEXT: lxsibzx v5, r6, r3
>> -; CHECK-NEXT: xxswapd v3, vs0
>> -; CHECK-NEXT: vspltb v4, v3, 15
>> -; CHECK-NEXT: vspltb v2, v2, 7
>> -; CHECK-NEXT: vmrglb v2, v3, v2
>> ; CHECK-NEXT: addis r3, r2, .LCPI4_0 at toc@ha
>> ; CHECK-NEXT: addi r3, r3, .LCPI4_0 at toc@l
>> +; CHECK-NEXT: vspltb v2, v2, 7
>> +; CHECK-NEXT: vmrghb v2, v3, v2
>> +; CHECK-NEXT: vspltb v4, v3, 7
>> ; CHECK-NEXT: vspltb v5, v5, 7
>> ; CHECK-NEXT: vmrglh v2, v2, v4
>> -; CHECK-NEXT: vmrglb v3, v3, v5
>> +; CHECK-NEXT: vmrghb v3, v3, v5
>> ; CHECK-NEXT: vmrglw v2, v2, v4
>> ; CHECK-NEXT: vmrglh v3, v3, v4
>> ; CHECK-NEXT: vmrglw v3, v4, v3
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> index 099611a7b5e3..50b864980d98 100644
>> --- a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> +++ b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> @@ -53,8 +53,7 @@ define <4 x float> @foof(float* nocapture readonly %a)
>> #0 {
>> ; CHECK-LABEL: foof:
>> ; CHECK: # %bb.0: # %entry
>> ; CHECK-NEXT: lfiwzx f0, 0, r3
>> -; CHECK-NEXT: xxswapd vs0, f0
>> -; CHECK-NEXT: xxspltw v2, vs0, 3
>> +; CHECK-NEXT: xxspltw v2, vs0, 1
>> ; CHECK-NEXT: blr
>> entry:
>> %0 = load float, float* %a, align 4
>> @@ -68,8 +67,7 @@ define <4 x float> @foofx(float* nocapture readonly %a,
>> i64 %idx) #0 {
>> ; CHECK: # %bb.0: # %entry
>> ; CHECK-NEXT: sldi r4, r4, 2
>> ; CHECK-NEXT: lfiwzx f0, r3, r4
>> -; CHECK-NEXT: xxswapd vs0, f0
>> -; CHECK-NEXT: xxspltw v2, vs0, 3
>> +; CHECK-NEXT: xxspltw v2, vs0, 1
>> ; CHECK-NEXT: blr
>> entry:
>> %p = getelementptr float, float* %a, i64 %idx
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> index b43e2c8b97af..c12f7f9a9f05 100644
>> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> @@ -13,8 +13,7 @@ define <2 x i64> @s2v_test1(i64* nocapture readonly
>> %int64, <2 x i64> %vec) {
>> ; P9LE-LABEL: s2v_test1:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfd f0, 0(r3)
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test1:
>> @@ -33,8 +32,7 @@ define <2 x i64> @s2v_test2(i64* nocapture readonly
>> %int64, <2 x i64> %vec) {
>> ; P9LE-LABEL: s2v_test2:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfd f0, 8(r3)
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test2:
>> @@ -55,8 +53,7 @@ define <2 x i64> @s2v_test3(i64* nocapture readonly
>> %int64, <2 x i64> %vec, i32
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: sldi r4, r7, 3
>> ; P9LE-NEXT: lfdx f0, r3, r4
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test3
>> @@ -78,8 +75,7 @@ define <2 x i64> @s2v_test4(i64* nocapture readonly
>> %int64, <2 x i64> %vec) {
>> ; P9LE-LABEL: s2v_test4:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfd f0, 8(r3)
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test4:
>> @@ -99,8 +95,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i64*
>> nocapture readonly %ptr1) {
>> ; P9LE-LABEL: s2v_test5:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfd f0, 0(r5)
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test5:
>> @@ -119,8 +114,7 @@ define <2 x double> @s2v_test_f1(double* nocapture
>> readonly %f64, <2 x double> %
>> ; P9LE-LABEL: s2v_test_f1:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfd f0, 0(r3)
>> -; P9LE-NEXT: xxswapd vs0, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f1:
>> @@ -132,8 +126,7 @@ define <2 x double> @s2v_test_f1(double* nocapture
>> readonly %f64, <2 x double> %
>> ; P8LE-LABEL: s2v_test_f1:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfdx f0, 0, r3
>> -; P8LE-NEXT: xxspltd vs0, vs0, 0
>> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f1:
>> @@ -152,8 +145,7 @@ define <2 x double> @s2v_test_f2(double* nocapture
>> readonly %f64, <2 x double> %
>> ; P9LE-LABEL: s2v_test_f2:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfd f0, 8(r3)
>> -; P9LE-NEXT: xxswapd vs0, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f2:
>> @@ -165,8 +157,7 @@ define <2 x double> @s2v_test_f2(double* nocapture
>> readonly %f64, <2 x double> %
>> ; P8LE-LABEL: s2v_test_f2:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfd f0, 8(r3)
>> -; P8LE-NEXT: xxspltd vs0, vs0, 0
>> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f2:
>> @@ -187,8 +178,7 @@ define <2 x double> @s2v_test_f3(double* nocapture
>> readonly %f64, <2 x double> %
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: sldi r4, r7, 3
>> ; P9LE-NEXT: lfdx f0, r3, r4
>> -; P9LE-NEXT: xxswapd vs0, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f3:
>> @@ -202,8 +192,7 @@ define <2 x double> @s2v_test_f3(double* nocapture
>> readonly %f64, <2 x double> %
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: sldi r4, r7, 3
>> ; P8LE-NEXT: lfdx f0, r3, r4
>> -; P8LE-NEXT: xxspltd vs0, vs0, 0
>> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f3:
>> @@ -225,8 +214,7 @@ define <2 x double> @s2v_test_f4(double* nocapture
>> readonly %f64, <2 x double> %
>> ; P9LE-LABEL: s2v_test_f4:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfd f0, 8(r3)
>> -; P9LE-NEXT: xxswapd vs0, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f4:
>> @@ -238,8 +226,7 @@ define <2 x double> @s2v_test_f4(double* nocapture
>> readonly %f64, <2 x double> %
>> ; P8LE-LABEL: s2v_test_f4:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfd f0, 8(r3)
>> -; P8LE-NEXT: xxspltd vs0, vs0, 0
>> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f4:
>> @@ -259,8 +246,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec,
>> double* nocapture readonly %
>> ; P9LE-LABEL: s2v_test_f5:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfd f0, 0(r5)
>> -; P9LE-NEXT: xxswapd vs0, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f5:
>> @@ -272,8 +258,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec,
>> double* nocapture readonly %
>> ; P8LE-LABEL: s2v_test_f5:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfdx f0, 0, r5
>> -; P8LE-NEXT: xxspltd vs0, vs0, 0
>> -; P8LE-NEXT: xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f5:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> index 83691b52575d..f4572c359942 100644
>> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> @@ -12,8 +12,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly
>> %int32, <2 x i64> %vec) {
>> ; P9LE-LABEL: s2v_test1:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfiwax f0, 0, r3
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test1:
>> @@ -25,8 +24,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly
>> %int32, <2 x i64> %vec) {
>> ; P8LE-LABEL: s2v_test1:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfiwax f0, 0, r3
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test1:
>> @@ -47,8 +45,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly
>> %int32, <2 x i64> %vec) {
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: addi r3, r3, 4
>> ; P9LE-NEXT: lfiwax f0, 0, r3
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test2:
>> @@ -62,8 +59,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly
>> %int32, <2 x i64> %vec) {
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: addi r3, r3, 4
>> ; P8LE-NEXT: lfiwax f0, 0, r3
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test2:
>> @@ -86,8 +82,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly
>> %int32, <2 x i64> %vec, i32
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: sldi r4, r7, 2
>> ; P9LE-NEXT: lfiwax f0, r3, r4
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test3:
>> @@ -101,8 +96,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly
>> %int32, <2 x i64> %vec, i32
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: sldi r4, r7, 2
>> ; P8LE-NEXT: lfiwax f0, r3, r4
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test3:
>> @@ -126,8 +120,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly
>> %int32, <2 x i64> %vec) {
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: addi r3, r3, 4
>> ; P9LE-NEXT: lfiwax f0, 0, r3
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test4:
>> @@ -141,8 +134,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly
>> %int32, <2 x i64> %vec) {
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: addi r3, r3, 4
>> ; P8LE-NEXT: lfiwax f0, 0, r3
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test4:
>> @@ -164,8 +156,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32*
>> nocapture readonly %ptr1) {
>> ; P9LE-LABEL: s2v_test5:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfiwax f0, 0, r5
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT: xxmrghd v2, v2, vs0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test5:
>> @@ -177,8 +168,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32*
>> nocapture readonly %ptr1) {
>> ; P8LE-LABEL: s2v_test5:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfiwax f0, 0, r5
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT: xxmrghd v2, v2, vs0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test5:
>> @@ -198,8 +188,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly
>> %ptr) {
>> ; P9LE-LABEL: s2v_test6:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfiwax f0, 0, r3
>> -; P9LE-NEXT: xxswapd v2, f0
>> -; P9LE-NEXT: xxspltd v2, v2, 1
>> +; P9LE-NEXT: xxspltd v2, vs0, 0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test6:
>> @@ -211,8 +200,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly
>> %ptr) {
>> ; P8LE-LABEL: s2v_test6:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfiwax f0, 0, r3
>> -; P8LE-NEXT: xxswapd v2, f0
>> -; P8LE-NEXT: xxspltd v2, v2, 1
>> +; P8LE-NEXT: xxspltd v2, vs0, 0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test6:
>> @@ -233,8 +221,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly
>> %ptr) {
>> ; P9LE-LABEL: s2v_test7:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: lfiwax f0, 0, r3
>> -; P9LE-NEXT: xxswapd v2, f0
>> -; P9LE-NEXT: xxspltd v2, v2, 1
>> +; P9LE-NEXT: xxspltd v2, vs0, 0
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test7:
>> @@ -246,8 +233,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly
>> %ptr) {
>> ; P8LE-LABEL: s2v_test7:
>> ; P8LE: # %bb.0: # %entry
>> ; P8LE-NEXT: lfiwax f0, 0, r3
>> -; P8LE-NEXT: xxswapd v2, f0
>> -; P8LE-NEXT: xxspltd v2, v2, 1
>> +; P8LE-NEXT: xxspltd v2, vs0, 0
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test7:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> index 2261d75c6619..3dc34533420c 100644
>> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> @@ -11,12 +11,11 @@
>> define <4 x i32> @s2v_test1(i32* nocapture readonly %int32, <4 x i32>
>> %vec) {
>> ; P8LE-LABEL: s2v_test1:
>> ; P8LE: # %bb.0: # %entry
>> -; P8LE-NEXT: lfiwzx f0, 0, r3
>> ; P8LE-NEXT: addis r4, r2, .LCPI0_0 at toc@ha
>> -; P8LE-NEXT: addi r3, r4, .LCPI0_0 at toc@l
>> -; P8LE-NEXT: lvx v3, 0, r3
>> -; P8LE-NEXT: xxswapd v4, f0
>> -; P8LE-NEXT: vperm v2, v4, v2, v3
>> +; P8LE-NEXT: lxsiwzx v4, 0, r3
>> +; P8LE-NEXT: addi r4, r4, .LCPI0_0 at toc@l
>> +; P8LE-NEXT: lvx v3, 0, r4
>> +; P8LE-NEXT: vperm v2, v2, v4, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test1:
>> @@ -36,13 +35,12 @@ entry:
>> define <4 x i32> @s2v_test2(i32* nocapture readonly %int32, <4 x i32>
>> %vec) {
>> ; P8LE-LABEL: s2v_test2:
>> ; P8LE: # %bb.0: # %entry
>> -; P8LE-NEXT: addi r3, r3, 4
>> ; P8LE-NEXT: addis r4, r2, .LCPI1_0 at toc@ha
>> -; P8LE-NEXT: lfiwzx f0, 0, r3
>> -; P8LE-NEXT: addi r3, r4, .LCPI1_0 at toc@l
>> -; P8LE-NEXT: lvx v3, 0, r3
>> -; P8LE-NEXT: xxswapd v4, f0
>> -; P8LE-NEXT: vperm v2, v4, v2, v3
>> +; P8LE-NEXT: addi r3, r3, 4
>> +; P8LE-NEXT: addi r4, r4, .LCPI1_0 at toc@l
>> +; P8LE-NEXT: lxsiwzx v4, 0, r3
>> +; P8LE-NEXT: lvx v3, 0, r4
>> +; P8LE-NEXT: vperm v2, v2, v4, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test2:
>> @@ -64,13 +62,12 @@ entry:
>> define <4 x i32> @s2v_test3(i32* nocapture readonly %int32, <4 x i32>
>> %vec, i32 signext %Idx) {
>> ; P8LE-LABEL: s2v_test3:
>> ; P8LE: # %bb.0: # %entry
>> -; P8LE-NEXT: sldi r5, r7, 2
>> ; P8LE-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
>> -; P8LE-NEXT: lfiwzx f0, r3, r5
>> -; P8LE-NEXT: addi r3, r4, .LCPI2_0 at toc@l
>> -; P8LE-NEXT: lvx v4, 0, r3
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: vperm v2, v3, v2, v4
>> +; P8LE-NEXT: sldi r5, r7, 2
>> +; P8LE-NEXT: addi r4, r4, .LCPI2_0 at toc@l
>> +; P8LE-NEXT: lxsiwzx v3, r3, r5
>> +; P8LE-NEXT: lvx v4, 0, r4
>> +; P8LE-NEXT: vperm v2, v2, v3, v4
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test3:
>> @@ -93,13 +90,12 @@ entry:
>> define <4 x i32> @s2v_test4(i32* nocapture readonly %int32, <4 x i32>
>> %vec) {
>> ; P8LE-LABEL: s2v_test4:
>> ; P8LE: # %bb.0: # %entry
>> -; P8LE-NEXT: addi r3, r3, 4
>> ; P8LE-NEXT: addis r4, r2, .LCPI3_0 at toc@ha
>> -; P8LE-NEXT: lfiwzx f0, 0, r3
>> -; P8LE-NEXT: addi r3, r4, .LCPI3_0 at toc@l
>> -; P8LE-NEXT: lvx v3, 0, r3
>> -; P8LE-NEXT: xxswapd v4, f0
>> -; P8LE-NEXT: vperm v2, v4, v2, v3
>> +; P8LE-NEXT: addi r3, r3, 4
>> +; P8LE-NEXT: addi r4, r4, .LCPI3_0 at toc@l
>> +; P8LE-NEXT: lxsiwzx v4, 0, r3
>> +; P8LE-NEXT: lvx v3, 0, r4
>> +; P8LE-NEXT: vperm v2, v2, v4, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test4:
>> @@ -121,12 +117,11 @@ entry:
>> define <4 x i32> @s2v_test5(<4 x i32> %vec, i32* nocapture readonly
>> %ptr1) {
>> ; P8LE-LABEL: s2v_test5:
>> ; P8LE: # %bb.0: # %entry
>> -; P8LE-NEXT: lfiwzx f0, 0, r5
>> ; P8LE-NEXT: addis r3, r2, .LCPI4_0 at toc@ha
>> +; P8LE-NEXT: lxsiwzx v4, 0, r5
>> ; P8LE-NEXT: addi r3, r3, .LCPI4_0 at toc@l
>> ; P8LE-NEXT: lvx v3, 0, r3
>> -; P8LE-NEXT: xxswapd v4, f0
>> -; P8LE-NEXT: vperm v2, v4, v2, v3
>> +; P8LE-NEXT: vperm v2, v2, v4, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test5:
>> @@ -146,12 +141,11 @@ entry:
>> define <4 x float> @s2v_test_f1(float* nocapture readonly %f64, <4 x
>> float> %vec) {
>> ; P8LE-LABEL: s2v_test_f1:
>> ; P8LE: # %bb.0: # %entry
>> -; P8LE-NEXT: lfiwzx f0, 0, r3
>> ; P8LE-NEXT: addis r4, r2, .LCPI5_0 at toc@ha
>> -; P8LE-NEXT: addi r3, r4, .LCPI5_0 at toc@l
>> -; P8LE-NEXT: lvx v3, 0, r3
>> -; P8LE-NEXT: xxswapd v4, f0
>> -; P8LE-NEXT: vperm v2, v4, v2, v3
>> +; P8LE-NEXT: lxsiwzx v4, 0, r3
>> +; P8LE-NEXT: addi r4, r4, .LCPI5_0 at toc@l
>> +; P8LE-NEXT: lvx v3, 0, r4
>> +; P8LE-NEXT: vperm v2, v2, v4, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f1:
>> @@ -172,10 +166,9 @@ define <2 x float> @s2v_test_f2(float* nocapture
>> readonly %f64, <2 x float> %vec
>> ; P9LE-LABEL: s2v_test_f2:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: addi r3, r3, 4
>> -; P9LE-DAG: xxspltw v2, v2, 2
>> -; P9LE-DAG: lfiwzx f0, 0, r3
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: vmrglw v2, v2, v3
>> +; P9LE-NEXT: lxsiwzx v3, 0, r3
>> +; P9LE-NEXT: vmrglw v2, v2, v2
>> +; P9LE-NEXT: vmrghw v2, v2, v3
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f2:
>> @@ -189,11 +182,10 @@ define <2 x float> @s2v_test_f2(float* nocapture
>> readonly %f64, <2 x float> %vec
>>
>> ; P8LE-LABEL: s2v_test_f2:
>> ; P8LE: # %bb.0: # %entry
>> +; P8LE-NEXT: vmrglw v2, v2, v2
>> ; P8LE-NEXT: addi r3, r3, 4
>> -; P8LE-NEXT: xxspltw v2, v2, 2
>> -; P8LE-NEXT: lfiwzx f0, 0, r3
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: vmrglw v2, v2, v3
>> +; P8LE-NEXT: lxsiwzx v3, 0, r3
>> +; P8LE-NEXT: vmrghw v2, v2, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f2:
>> @@ -216,10 +208,9 @@ define <2 x float> @s2v_test_f3(float* nocapture
>> readonly %f64, <2 x float> %vec
>> ; P9LE-LABEL: s2v_test_f3:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: sldi r4, r7, 2
>> -; P9LE-NEXT: lfiwzx f0, r3, r4
>> -; P9LE-DAG: xxspltw v2, v2, 2
>> -; P9LE-DAG: xxswapd v3, f0
>> -; P9LE-NEXT: vmrglw v2, v2, v3
>> +; P9LE-NEXT: lxsiwzx v3, r3, r4
>> +; P9LE-NEXT: vmrglw v2, v2, v2
>> +; P9LE-NEXT: vmrghw v2, v2, v3
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f3:
>> @@ -233,11 +224,10 @@ define <2 x float> @s2v_test_f3(float* nocapture
>> readonly %f64, <2 x float> %vec
>>
>> ; P8LE-LABEL: s2v_test_f3:
>> ; P8LE: # %bb.0: # %entry
>> +; P8LE-NEXT: vmrglw v2, v2, v2
>> ; P8LE-NEXT: sldi r4, r7, 2
>> -; P8LE-NEXT: xxspltw v2, v2, 2
>> -; P8LE-NEXT: lfiwzx f0, r3, r4
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: vmrglw v2, v2, v3
>> +; P8LE-NEXT: lxsiwzx v3, r3, r4
>> +; P8LE-NEXT: vmrghw v2, v2, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f3:
>> @@ -261,10 +251,9 @@ define <2 x float> @s2v_test_f4(float* nocapture
>> readonly %f64, <2 x float> %vec
>> ; P9LE-LABEL: s2v_test_f4:
>> ; P9LE: # %bb.0: # %entry
>> ; P9LE-NEXT: addi r3, r3, 4
>> -; P9LE-NEXT: lfiwzx f0, 0, r3
>> -; P9LE-DAG: xxspltw v2, v2, 2
>> -; P9LE-DAG: xxswapd v3, f0
>> -; P9LE-NEXT: vmrglw v2, v2, v3
>> +; P9LE-NEXT: lxsiwzx v3, 0, r3
>> +; P9LE-NEXT: vmrglw v2, v2, v2
>> +; P9LE-NEXT: vmrghw v2, v2, v3
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f4:
>> @@ -278,11 +267,10 @@ define <2 x float> @s2v_test_f4(float* nocapture
>> readonly %f64, <2 x float> %vec
>>
>> ; P8LE-LABEL: s2v_test_f4:
>> ; P8LE: # %bb.0: # %entry
>> +; P8LE-NEXT: vmrglw v2, v2, v2
>> ; P8LE-NEXT: addi r3, r3, 4
>> -; P8LE-NEXT: xxspltw v2, v2, 2
>> -; P8LE-NEXT: lfiwzx f0, 0, r3
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: vmrglw v2, v2, v3
>> +; P8LE-NEXT: lxsiwzx v3, 0, r3
>> +; P8LE-NEXT: vmrghw v2, v2, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f4:
>> @@ -304,10 +292,9 @@ entry:
>> define <2 x float> @s2v_test_f5(<2 x float> %vec, float* nocapture
>> readonly %ptr1) {
>> ; P9LE-LABEL: s2v_test_f5:
>> ; P9LE: # %bb.0: # %entry
>> -; P9LE-NEXT: lfiwzx f0, 0, r5
>> -; P9LE-NEXT: xxspltw v2, v2, 2
>> -; P9LE-NEXT: xxswapd v3, f0
>> -; P9LE-NEXT: vmrglw v2, v2, v3
>> +; P9LE-NEXT: lxsiwzx v3, 0, r5
>> +; P9LE-NEXT: vmrglw v2, v2, v2
>> +; P9LE-NEXT: vmrghw v2, v2, v3
>> ; P9LE-NEXT: blr
>>
>> ; P9BE-LABEL: s2v_test_f5:
>> @@ -320,10 +307,9 @@ define <2 x float> @s2v_test_f5(<2 x float> %vec,
>> float* nocapture readonly %ptr
>>
>> ; P8LE-LABEL: s2v_test_f5:
>> ; P8LE: # %bb.0: # %entry
>> -; P8LE-NEXT: lfiwzx f0, 0, r5
>> -; P8LE-NEXT: xxspltw v2, v2, 2
>> -; P8LE-NEXT: xxswapd v3, f0
>> -; P8LE-NEXT: vmrglw v2, v2, v3
>> +; P8LE-NEXT: vmrglw v2, v2, v2
>> +; P8LE-NEXT: lxsiwzx v3, 0, r5
>> +; P8LE-NEXT: vmrghw v2, v2, v3
>> ; P8LE-NEXT: blr
>>
>> ; P8BE-LABEL: s2v_test_f5:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> index 935630745f47..097ba07a5b1e 100644
>> --- a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> +++ b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> @@ -13,60 +13,56 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 0
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, -21386
>> -; P9LE-NEXT: ori r5, r5, 37253
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r5, r4, r5
>> -; P9LE-NEXT: add r4, r5, r4
>> +; P9LE-NEXT: lis r4, -21386
>> +; P9LE-NEXT: ori r4, r4, 37253
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> +; P9LE-NEXT: add r4, r4, r3
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 6
>> ; P9LE-NEXT: add r4, r4, r5
>> -; P9LE-NEXT: lis r5, 31710
>> ; P9LE-NEXT: mulli r4, r4, 95
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, 31710
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: ori r5, r5, 63421
>> -; P9LE-NEXT: mulhw r5, r4, r5
>> -; P9LE-NEXT: sub r4, r5, r4
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: ori r4, r4, 63421
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> +; P9LE-NEXT: sub r4, r4, r3
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 6
>> ; P9LE-NEXT: add r4, r4, r5
>> -; P9LE-NEXT: lis r5, 21399
>> ; P9LE-NEXT: mulli r4, r4, -124
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, 21399
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: ori r5, r5, 33437
>> -; P9LE-NEXT: mulhw r4, r4, r5
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: ori r4, r4, 33437
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 5
>> ; P9LE-NEXT: add r4, r4, r5
>> -; P9LE-NEXT: lis r5, -16728
>> ; P9LE-NEXT: mulli r4, r4, 98
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: ori r5, r5, 63249
>> -; P9LE-NEXT: mulhw r4, r4, r5
>> +; P9LE-NEXT: lis r4, -16728
>> +; P9LE-NEXT: ori r4, r4, 63249
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 8
>> ; P9LE-NEXT: add r4, r4, r5
>> ; P9LE-NEXT: mulli r4, r4, -1003
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v2, v4
>> ; P9LE-NEXT: vmrglw v2, v2, v3
>> ; P9LE-NEXT: blr
>> ;
>> @@ -135,58 +131,54 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) {
>> ; P8LE: # %bb.0:
>> ; P8LE-NEXT: xxswapd vs0, v2
>> ; P8LE-NEXT: lis r3, 21399
>> -; P8LE-NEXT: lis r9, -21386
>> -; P8LE-NEXT: lis r11, 31710
>> ; P8LE-NEXT: lis r8, -16728
>> +; P8LE-NEXT: lis r9, -21386
>> +; P8LE-NEXT: lis r10, 31710
>> ; P8LE-NEXT: ori r3, r3, 33437
>> -; P8LE-NEXT: ori r9, r9, 37253
>> ; P8LE-NEXT: ori r8, r8, 63249
>> +; P8LE-NEXT: ori r9, r9, 37253
>> +; P8LE-NEXT: ori r10, r10, 63421
>> ; P8LE-NEXT: mffprd r4, f0
>> ; P8LE-NEXT: rldicl r5, r4, 32, 48
>> -; P8LE-NEXT: clrldi r7, r4, 48
>> ; P8LE-NEXT: rldicl r6, r4, 16, 48
>> +; P8LE-NEXT: clrldi r7, r4, 48
>> +; P8LE-NEXT: extsh r5, r5
>> +; P8LE-NEXT: extsh r6, r6
>> ; P8LE-NEXT: rldicl r4, r4, 48, 48
>> -; P8LE-NEXT: extsh r10, r5
>> -; P8LE-NEXT: extsh r0, r7
>> -; P8LE-NEXT: mulhw r3, r10, r3
>> -; P8LE-NEXT: ori r10, r11, 63421
>> -; P8LE-NEXT: extsh r11, r4
>> -; P8LE-NEXT: extsh r12, r6
>> -; P8LE-NEXT: mulhw r9, r0, r9
>> -; P8LE-NEXT: mulhw r10, r11, r10
>> -; P8LE-NEXT: mulhw r8, r12, r8
>> -; P8LE-NEXT: srwi r12, r3, 31
>> +; P8LE-NEXT: extsh r7, r7
>> +; P8LE-NEXT: mulhw r3, r5, r3
>> +; P8LE-NEXT: extsh r4, r4
>> +; P8LE-NEXT: mulhw r8, r6, r8
>> +; P8LE-NEXT: mulhw r9, r7, r9
>> +; P8LE-NEXT: mulhw r10, r4, r10
>> +; P8LE-NEXT: srwi r11, r3, 31
>> ; P8LE-NEXT: srawi r3, r3, 5
>> -; P8LE-NEXT: add r9, r9, r0
>> -; P8LE-NEXT: sub r10, r10, r11
>> -; P8LE-NEXT: add r3, r3, r12
>> +; P8LE-NEXT: add r3, r3, r11
>> +; P8LE-NEXT: srwi r11, r8, 31
>> +; P8LE-NEXT: add r9, r9, r7
>> +; P8LE-NEXT: srawi r8, r8, 8
>> +; P8LE-NEXT: sub r10, r10, r4
>> +; P8LE-NEXT: add r8, r8, r11
>> ; P8LE-NEXT: srwi r11, r9, 31
>> ; P8LE-NEXT: srawi r9, r9, 6
>> -; P8LE-NEXT: srwi r12, r8, 31
>> -; P8LE-NEXT: srawi r8, r8, 8
>> +; P8LE-NEXT: mulli r3, r3, 98
>> ; P8LE-NEXT: add r9, r9, r11
>> ; P8LE-NEXT: srwi r11, r10, 31
>> ; P8LE-NEXT: srawi r10, r10, 6
>> -; P8LE-NEXT: add r8, r8, r12
>> -; P8LE-NEXT: mulli r3, r3, 98
>> -; P8LE-NEXT: add r10, r10, r11
>> ; P8LE-NEXT: mulli r8, r8, -1003
>> +; P8LE-NEXT: add r10, r10, r11
>> ; P8LE-NEXT: mulli r9, r9, 95
>> ; P8LE-NEXT: mulli r10, r10, -124
>> ; P8LE-NEXT: sub r3, r5, r3
>> +; P8LE-NEXT: mtvsrd v2, r3
>> ; P8LE-NEXT: sub r5, r6, r8
>> -; P8LE-NEXT: mtfprd f0, r3
>> ; P8LE-NEXT: sub r3, r7, r9
>> +; P8LE-NEXT: mtvsrd v3, r5
>> ; P8LE-NEXT: sub r4, r4, r10
>> -; P8LE-NEXT: mtfprd f1, r5
>> -; P8LE-NEXT: mtfprd f2, r3
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: mtfprd f3, r4
>> -; P8LE-NEXT: xxswapd v3, vs1
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: xxswapd v5, vs3
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: vmrglh v3, v5, v4
>> +; P8LE-NEXT: mtvsrd v4, r3
>> +; P8LE-NEXT: mtvsrd v5, r4
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> +; P8LE-NEXT: vmrghh v3, v5, v4
>> ; P8LE-NEXT: vmrglw v2, v2, v3
>> ; P8LE-NEXT: blr
>> ;
>> @@ -256,56 +248,52 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 0
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, -21386
>> -; P9LE-NEXT: ori r5, r5, 37253
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r6, r4, r5
>> -; P9LE-NEXT: add r4, r6, r4
>> -; P9LE-NEXT: srwi r6, r4, 31
>> -; P9LE-NEXT: srawi r4, r4, 6
>> -; P9LE-NEXT: add r4, r4, r6
>> -; P9LE-NEXT: mulli r4, r4, 95
>> -; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, -21386
>> +; P9LE-NEXT: ori r4, r4, 37253
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r5, r3, r4
>> +; P9LE-NEXT: add r5, r5, r3
>> +; P9LE-NEXT: srwi r6, r5, 31
>> +; P9LE-NEXT: srawi r5, r5, 6
>> +; P9LE-NEXT: add r5, r5, r6
>> +; P9LE-NEXT: mulli r5, r5, 95
>> +; P9LE-NEXT: sub r3, r3, r5
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r6, r4, r5
>> -; P9LE-NEXT: add r4, r6, r4
>> -; P9LE-NEXT: srwi r6, r4, 31
>> -; P9LE-NEXT: srawi r4, r4, 6
>> -; P9LE-NEXT: add r4, r4, r6
>> -; P9LE-NEXT: mulli r4, r4, 95
>> -; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r5, r3, r4
>> +; P9LE-NEXT: add r5, r5, r3
>> +; P9LE-NEXT: srwi r6, r5, 31
>> +; P9LE-NEXT: srawi r5, r5, 6
>> +; P9LE-NEXT: add r5, r5, r6
>> +; P9LE-NEXT: mulli r5, r5, 95
>> +; P9LE-NEXT: sub r3, r3, r5
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r6, r4, r5
>> -; P9LE-NEXT: add r4, r6, r4
>> -; P9LE-NEXT: srwi r6, r4, 31
>> -; P9LE-NEXT: srawi r4, r4, 6
>> -; P9LE-NEXT: add r4, r4, r6
>> -; P9LE-NEXT: mulli r4, r4, 95
>> -; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r5, r3, r4
>> +; P9LE-NEXT: add r5, r5, r3
>> +; P9LE-NEXT: srwi r6, r5, 31
>> +; P9LE-NEXT: srawi r5, r5, 6
>> +; P9LE-NEXT: add r5, r5, r6
>> +; P9LE-NEXT: mulli r5, r5, 95
>> +; P9LE-NEXT: sub r3, r3, r5
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r5, r4, r5
>> -; P9LE-NEXT: add r4, r5, r4
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> +; P9LE-NEXT: add r4, r4, r3
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 6
>> ; P9LE-NEXT: add r4, r4, r5
>> ; P9LE-NEXT: mulli r4, r4, 95
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v2, v4
>> ; P9LE-NEXT: vmrglw v2, v2, v3
>> ; P9LE-NEXT: blr
>> ;
>> @@ -370,56 +358,50 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) {
>> ; P8LE: # %bb.0:
>> ; P8LE-NEXT: xxswapd vs0, v2
>> ; P8LE-NEXT: lis r3, -21386
>> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill
>> ; P8LE-NEXT: ori r3, r3, 37253
>> ; P8LE-NEXT: mffprd r4, f0
>> ; P8LE-NEXT: clrldi r5, r4, 48
>> ; P8LE-NEXT: rldicl r6, r4, 48, 48
>> -; P8LE-NEXT: extsh r8, r5
>> +; P8LE-NEXT: extsh r5, r5
>> ; P8LE-NEXT: rldicl r7, r4, 32, 48
>> -; P8LE-NEXT: extsh r9, r6
>> -; P8LE-NEXT: mulhw r10, r8, r3
>> +; P8LE-NEXT: extsh r6, r6
>> +; P8LE-NEXT: mulhw r8, r5, r3
>> ; P8LE-NEXT: rldicl r4, r4, 16, 48
>> -; P8LE-NEXT: extsh r11, r7
>> -; P8LE-NEXT: mulhw r12, r9, r3
>> -; P8LE-NEXT: extsh r0, r4
>> -; P8LE-NEXT: mulhw r30, r11, r3
>> -; P8LE-NEXT: mulhw r3, r0, r3
>> -; P8LE-NEXT: add r8, r10, r8
>> -; P8LE-NEXT: add r9, r12, r9
>> -; P8LE-NEXT: srwi r10, r8, 31
>> +; P8LE-NEXT: extsh r7, r7
>> +; P8LE-NEXT: mulhw r9, r6, r3
>> +; P8LE-NEXT: extsh r4, r4
>> +; P8LE-NEXT: mulhw r10, r7, r3
>> +; P8LE-NEXT: mulhw r3, r4, r3
>> +; P8LE-NEXT: add r8, r8, r5
>> +; P8LE-NEXT: add r9, r9, r6
>> +; P8LE-NEXT: srwi r11, r8, 31
>> ; P8LE-NEXT: srawi r8, r8, 6
>> -; P8LE-NEXT: add r11, r30, r11
>> -; P8LE-NEXT: add r3, r3, r0
>> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
>> -; P8LE-NEXT: add r8, r8, r10
>> -; P8LE-NEXT: srwi r10, r9, 31
>> +; P8LE-NEXT: add r10, r10, r7
>> +; P8LE-NEXT: add r3, r3, r4
>> +; P8LE-NEXT: add r8, r8, r11
>> +; P8LE-NEXT: srwi r11, r9, 31
>> ; P8LE-NEXT: srawi r9, r9, 6
>> ; P8LE-NEXT: mulli r8, r8, 95
>> -; P8LE-NEXT: add r9, r9, r10
>> -; P8LE-NEXT: srwi r10, r11, 31
>> -; P8LE-NEXT: srawi r11, r11, 6
>> +; P8LE-NEXT: add r9, r9, r11
>> +; P8LE-NEXT: srwi r11, r10, 31
>> +; P8LE-NEXT: srawi r10, r10, 6
>> ; P8LE-NEXT: mulli r9, r9, 95
>> -; P8LE-NEXT: add r10, r11, r10
>> +; P8LE-NEXT: add r10, r10, r11
>> ; P8LE-NEXT: srwi r11, r3, 31
>> ; P8LE-NEXT: srawi r3, r3, 6
>> ; P8LE-NEXT: mulli r10, r10, 95
>> ; P8LE-NEXT: sub r5, r5, r8
>> ; P8LE-NEXT: add r3, r3, r11
>> -; P8LE-NEXT: mtfprd f0, r5
>> +; P8LE-NEXT: mtvsrd v2, r5
>> ; P8LE-NEXT: mulli r3, r3, 95
>> ; P8LE-NEXT: sub r6, r6, r9
>> -; P8LE-NEXT: mtfprd f1, r6
>> -; P8LE-NEXT: xxswapd v2, vs0
>> +; P8LE-NEXT: mtvsrd v3, r6
>> ; P8LE-NEXT: sub r5, r7, r10
>> -; P8LE-NEXT: mtfprd f2, r5
>> -; P8LE-NEXT: xxswapd v3, vs1
>> +; P8LE-NEXT: mtvsrd v4, r5
>> ; P8LE-NEXT: sub r3, r4, r3
>> -; P8LE-NEXT: mtfprd f3, r3
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: xxswapd v5, vs3
>> -; P8LE-NEXT: vmrglh v3, v5, v4
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> +; P8LE-NEXT: mtvsrd v5, r3
>> +; P8LE-NEXT: vmrghh v3, v5, v4
>> ; P8LE-NEXT: vmrglw v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> @@ -487,67 +469,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 0
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, -21386
>> -; P9LE-NEXT: ori r5, r5, 37253
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r6, r4, r5
>> -; P9LE-NEXT: add r4, r6, r4
>> -; P9LE-NEXT: srwi r6, r4, 31
>> -; P9LE-NEXT: srawi r4, r4, 6
>> -; P9LE-NEXT: add r4, r4, r6
>> -; P9LE-NEXT: mulli r6, r4, 95
>> +; P9LE-NEXT: lis r4, -21386
>> +; P9LE-NEXT: ori r4, r4, 37253
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r5, r3, r4
>> +; P9LE-NEXT: add r5, r5, r3
>> +; P9LE-NEXT: srwi r6, r5, 31
>> +; P9LE-NEXT: srawi r5, r5, 6
>> +; P9LE-NEXT: add r5, r5, r6
>> +; P9LE-NEXT: mulli r6, r5, 95
>> ; P9LE-NEXT: sub r3, r3, r6
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: extsh r6, r3
>> -; P9LE-NEXT: mulhw r7, r6, r5
>> +; P9LE-NEXT: mulhw r7, r6, r4
>> ; P9LE-NEXT: add r6, r7, r6
>> ; P9LE-NEXT: srwi r7, r6, 31
>> ; P9LE-NEXT: srawi r6, r6, 6
>> ; P9LE-NEXT: add r6, r6, r7
>> ; P9LE-NEXT: mulli r7, r6, 95
>> ; P9LE-NEXT: sub r3, r3, r7
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: extsh r7, r3
>> -; P9LE-NEXT: mulhw r8, r7, r5
>> +; P9LE-NEXT: mulhw r8, r7, r4
>> ; P9LE-NEXT: add r7, r8, r7
>> ; P9LE-NEXT: srwi r8, r7, 31
>> ; P9LE-NEXT: srawi r7, r7, 6
>> ; P9LE-NEXT: add r7, r7, r8
>> ; P9LE-NEXT: mulli r8, r7, 95
>> ; P9LE-NEXT: sub r3, r3, r8
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: extsh r8, r3
>> -; P9LE-NEXT: mulhw r5, r8, r5
>> -; P9LE-NEXT: add r5, r5, r8
>> -; P9LE-NEXT: srwi r8, r5, 31
>> -; P9LE-NEXT: srawi r5, r5, 6
>> -; P9LE-NEXT: add r5, r5, r8
>> -; P9LE-NEXT: mulli r8, r5, 95
>> +; P9LE-NEXT: mulhw r4, r8, r4
>> +; P9LE-NEXT: add r4, r4, r8
>> +; P9LE-NEXT: srwi r8, r4, 31
>> +; P9LE-NEXT: srawi r4, r4, 6
>> +; P9LE-NEXT: add r4, r4, r8
>> +; P9LE-NEXT: mulli r8, r4, 95
>> ; P9LE-NEXT: sub r3, r3, r8
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: mtfprd f0, r4
>> -; P9LE-NEXT: vmrglh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v4, r6
>> ; P9LE-NEXT: vmrglw v2, v2, v3
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r6
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r7
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r5
>> -; P9LE-NEXT: xxswapd v5, vs0
>> -; P9LE-NEXT: vmrglh v4, v5, v4
>> +; P9LE-NEXT: mtvsrd v3, r5
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r7
>> +; P9LE-NEXT: mtvsrd v5, r4
>> +; P9LE-NEXT: vmrghh v4, v5, v4
>> ; P9LE-NEXT: vmrglw v3, v4, v3
>> ; P9LE-NEXT: vadduhm v2, v2, v3
>> ; P9LE-NEXT: blr
>> @@ -624,69 +598,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) {
>> ; P8LE-LABEL: combine_srem_sdiv:
>> ; P8LE: # %bb.0:
>> ; P8LE-NEXT: xxswapd vs0, v2
>> -; P8LE-NEXT: lis r4, -21386
>> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill
>> -; P8LE-NEXT: ori r4, r4, 37253
>> -; P8LE-NEXT: mffprd r5, f0
>> -; P8LE-NEXT: clrldi r3, r5, 48
>> -; P8LE-NEXT: rldicl r6, r5, 48, 48
>> -; P8LE-NEXT: rldicl r7, r5, 32, 48
>> -; P8LE-NEXT: extsh r8, r3
>> -; P8LE-NEXT: extsh r9, r6
>> -; P8LE-NEXT: extsh r10, r7
>> -; P8LE-NEXT: mulhw r11, r8, r4
>> -; P8LE-NEXT: rldicl r5, r5, 16, 48
>> -; P8LE-NEXT: mulhw r12, r9, r4
>> -; P8LE-NEXT: mulhw r0, r10, r4
>> -; P8LE-NEXT: extsh r30, r5
>> -; P8LE-NEXT: mulhw r4, r30, r4
>> +; P8LE-NEXT: lis r3, -21386
>> +; P8LE-NEXT: ori r3, r3, 37253
>> +; P8LE-NEXT: mffprd r4, f0
>> +; P8LE-NEXT: clrldi r5, r4, 48
>> +; P8LE-NEXT: rldicl r6, r4, 48, 48
>> +; P8LE-NEXT: rldicl r7, r4, 32, 48
>> +; P8LE-NEXT: extsh r5, r5
>> +; P8LE-NEXT: extsh r8, r6
>> +; P8LE-NEXT: extsh r9, r7
>> +; P8LE-NEXT: mulhw r10, r5, r3
>> +; P8LE-NEXT: mulhw r11, r8, r3
>> +; P8LE-NEXT: rldicl r4, r4, 16, 48
>> +; P8LE-NEXT: mulhw r12, r9, r3
>> +; P8LE-NEXT: extsh r0, r4
>> +; P8LE-NEXT: mulhw r3, r0, r3
>> +; P8LE-NEXT: add r10, r10, r5
>> ; P8LE-NEXT: add r8, r11, r8
>> +; P8LE-NEXT: srwi r11, r10, 31
>> ; P8LE-NEXT: add r9, r12, r9
>> -; P8LE-NEXT: srwi r11, r8, 31
>> -; P8LE-NEXT: add r10, r0, r10
>> -; P8LE-NEXT: srawi r8, r8, 6
>> -; P8LE-NEXT: srawi r12, r9, 6
>> +; P8LE-NEXT: srawi r10, r10, 6
>> +; P8LE-NEXT: srawi r12, r8, 6
>> +; P8LE-NEXT: srwi r8, r8, 31
>> +; P8LE-NEXT: add r10, r10, r11
>> +; P8LE-NEXT: add r3, r3, r0
>> +; P8LE-NEXT: srawi r11, r9, 6
>> ; P8LE-NEXT: srwi r9, r9, 31
>> -; P8LE-NEXT: add r8, r8, r11
>> -; P8LE-NEXT: add r4, r4, r30
>> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
>> -; P8LE-NEXT: srawi r11, r10, 6
>> -; P8LE-NEXT: srwi r10, r10, 31
>> -; P8LE-NEXT: add r9, r12, r9
>> -; P8LE-NEXT: mtfprd f0, r8
>> -; P8LE-NEXT: mulli r12, r8, 95
>> -; P8LE-NEXT: add r10, r11, r10
>> -; P8LE-NEXT: srwi r8, r4, 31
>> -; P8LE-NEXT: mtfprd f1, r9
>> -; P8LE-NEXT: srawi r4, r4, 6
>> -; P8LE-NEXT: mulli r11, r9, 95
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: mtfprd f2, r10
>> -; P8LE-NEXT: mulli r9, r10, 95
>> -; P8LE-NEXT: add r4, r4, r8
>> -; P8LE-NEXT: xxswapd v3, vs1
>> -; P8LE-NEXT: mtfprd f3, r4
>> -; P8LE-NEXT: mulli r4, r4, 95
>> -; P8LE-NEXT: xxswapd v1, vs2
>> -; P8LE-NEXT: sub r3, r3, r12
>> -; P8LE-NEXT: mtfprd f0, r3
>> -; P8LE-NEXT: sub r6, r6, r11
>> -; P8LE-NEXT: xxswapd v6, vs3
>> -; P8LE-NEXT: sub r3, r7, r9
>> -; P8LE-NEXT: mtfprd f1, r6
>> -; P8LE-NEXT: mtfprd f4, r3
>> -; P8LE-NEXT: sub r3, r5, r4
>> -; P8LE-NEXT: mtfprd f5, r3
>> -; P8LE-NEXT: xxswapd v4, vs1
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: xxswapd v3, vs0
>> -; P8LE-NEXT: xxswapd v5, vs4
>> -; P8LE-NEXT: xxswapd v0, vs5
>> -; P8LE-NEXT: vmrglh v3, v4, v3
>> -; P8LE-NEXT: vmrglh v4, v0, v5
>> -; P8LE-NEXT: vmrglh v5, v6, v1
>> -; P8LE-NEXT: vmrglw v3, v4, v3
>> -; P8LE-NEXT: vmrglw v2, v5, v2
>> +; P8LE-NEXT: add r8, r12, r8
>> +; P8LE-NEXT: mtvsrd v2, r10
>> +; P8LE-NEXT: mulli r12, r10, 95
>> +; P8LE-NEXT: add r9, r11, r9
>> +; P8LE-NEXT: srwi r11, r3, 31
>> +; P8LE-NEXT: mtvsrd v3, r8
>> +; P8LE-NEXT: srawi r3, r3, 6
>> +; P8LE-NEXT: mulli r10, r8, 95
>> +; P8LE-NEXT: mtvsrd v4, r9
>> +; P8LE-NEXT: add r3, r3, r11
>> +; P8LE-NEXT: mulli r8, r9, 95
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> +; P8LE-NEXT: mulli r9, r3, 95
>> +; P8LE-NEXT: sub r5, r5, r12
>> +; P8LE-NEXT: sub r6, r6, r10
>> +; P8LE-NEXT: mtvsrd v3, r5
>> +; P8LE-NEXT: mtvsrd v5, r6
>> +; P8LE-NEXT: sub r5, r7, r8
>> +; P8LE-NEXT: sub r4, r4, r9
>> +; P8LE-NEXT: mtvsrd v0, r5
>> +; P8LE-NEXT: mtvsrd v1, r4
>> +; P8LE-NEXT: vmrghh v3, v5, v3
>> +; P8LE-NEXT: mtvsrd v5, r3
>> +; P8LE-NEXT: vmrghh v0, v1, v0
>> +; P8LE-NEXT: vmrghh v4, v5, v4
>> +; P8LE-NEXT: vmrglw v3, v0, v3
>> +; P8LE-NEXT: vmrglw v2, v4, v2
>> ; P8LE-NEXT: vadduhm v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> @@ -767,47 +731,43 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x
>> i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 0
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: srawi r4, r4, 6
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: srawi r4, r3, 6
>> ; P9LE-NEXT: addze r4, r4
>> ; P9LE-NEXT: slwi r4, r4, 6
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: srawi r4, r4, 5
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: srawi r4, r3, 5
>> ; P9LE-NEXT: addze r4, r4
>> ; P9LE-NEXT: slwi r4, r4, 5
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, -21386
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, -21386
>> -; P9LE-NEXT: ori r5, r5, 37253
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r5, r4, r5
>> -; P9LE-NEXT: add r4, r5, r4
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: ori r4, r4, 37253
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> +; P9LE-NEXT: add r4, r4, r3
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 6
>> ; P9LE-NEXT: add r4, r4, r5
>> ; P9LE-NEXT: mulli r4, r4, 95
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: srawi r4, r4, 3
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: srawi r4, r3, 3
>> ; P9LE-NEXT: addze r4, r4
>> ; P9LE-NEXT: slwi r4, r4, 3
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v2, v4, v2
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v4, v2
>> ; P9LE-NEXT: vmrglw v2, v2, v3
>> ; P9LE-NEXT: blr
>> ;
>> @@ -866,42 +826,38 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x
>> i16> %x) {
>> ; P8LE-NEXT: ori r3, r3, 37253
>> ; P8LE-NEXT: mffprd r4, f0
>> ; P8LE-NEXT: rldicl r5, r4, 16, 48
>> -; P8LE-NEXT: clrldi r7, r4, 48
>> -; P8LE-NEXT: extsh r6, r5
>> -; P8LE-NEXT: extsh r8, r7
>> -; P8LE-NEXT: mulhw r3, r6, r3
>> -; P8LE-NEXT: rldicl r9, r4, 48, 48
>> -; P8LE-NEXT: srawi r8, r8, 6
>> -; P8LE-NEXT: extsh r10, r9
>> +; P8LE-NEXT: clrldi r6, r4, 48
>> +; P8LE-NEXT: extsh r5, r5
>> +; P8LE-NEXT: extsh r6, r6
>> +; P8LE-NEXT: mulhw r3, r5, r3
>> +; P8LE-NEXT: rldicl r7, r4, 48, 48
>> +; P8LE-NEXT: srawi r8, r6, 6
>> +; P8LE-NEXT: extsh r7, r7
>> ; P8LE-NEXT: addze r8, r8
>> ; P8LE-NEXT: rldicl r4, r4, 32, 48
>> -; P8LE-NEXT: srawi r10, r10, 5
>> +; P8LE-NEXT: srawi r9, r7, 5
>> +; P8LE-NEXT: extsh r4, r4
>> ; P8LE-NEXT: slwi r8, r8, 6
>> -; P8LE-NEXT: add r3, r3, r6
>> -; P8LE-NEXT: addze r6, r10
>> -; P8LE-NEXT: sub r7, r7, r8
>> +; P8LE-NEXT: add r3, r3, r5
>> +; P8LE-NEXT: addze r9, r9
>> +; P8LE-NEXT: sub r6, r6, r8
>> ; P8LE-NEXT: srwi r10, r3, 31
>> ; P8LE-NEXT: srawi r3, r3, 6
>> -; P8LE-NEXT: mtfprd f0, r7
>> -; P8LE-NEXT: slwi r6, r6, 5
>> +; P8LE-NEXT: slwi r8, r9, 5
>> +; P8LE-NEXT: mtvsrd v2, r6
>> ; P8LE-NEXT: add r3, r3, r10
>> -; P8LE-NEXT: extsh r10, r4
>> -; P8LE-NEXT: sub r6, r9, r6
>> +; P8LE-NEXT: srawi r9, r4, 3
>> +; P8LE-NEXT: sub r6, r7, r8
>> ; P8LE-NEXT: mulli r3, r3, 95
>> -; P8LE-NEXT: srawi r8, r10, 3
>> -; P8LE-NEXT: mtfprd f1, r6
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: addze r7, r8
>> -; P8LE-NEXT: xxswapd v3, vs1
>> +; P8LE-NEXT: addze r7, r9
>> +; P8LE-NEXT: mtvsrd v3, r6
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> ; P8LE-NEXT: sub r3, r5, r3
>> ; P8LE-NEXT: slwi r5, r7, 3
>> ; P8LE-NEXT: sub r4, r4, r5
>> -; P8LE-NEXT: mtfprd f2, r3
>> -; P8LE-NEXT: mtfprd f3, r4
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: xxswapd v5, vs3
>> -; P8LE-NEXT: vmrglh v3, v4, v5
>> +; P8LE-NEXT: mtvsrd v4, r3
>> +; P8LE-NEXT: mtvsrd v5, r4
>> +; P8LE-NEXT: vmrghh v3, v4, v5
>> ; P8LE-NEXT: vmrglw v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> @@ -959,48 +915,46 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, -14230
>> -; P9LE-NEXT: ori r5, r5, 30865
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r5, r4, r5
>> -; P9LE-NEXT: add r4, r5, r4
>> +; P9LE-NEXT: lis r4, -14230
>> +; P9LE-NEXT: ori r4, r4, 30865
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> +; P9LE-NEXT: add r4, r4, r3
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 9
>> ; P9LE-NEXT: add r4, r4, r5
>> -; P9LE-NEXT: lis r5, -19946
>> ; P9LE-NEXT: mulli r4, r4, 654
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, -19946
>> +; P9LE-NEXT: mtvsrd v3, r3
>> +; P9LE-NEXT: li r3, 0
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 4
>> -; P9LE-NEXT: ori r5, r5, 17097
>> -; P9LE-NEXT: xxlxor v3, v3, v3
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r5, r4, r5
>> -; P9LE-NEXT: add r4, r5, r4
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: ori r4, r4, 17097
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> +; P9LE-NEXT: add r4, r4, r3
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 4
>> ; P9LE-NEXT: add r4, r4, r5
>> -; P9LE-NEXT: lis r5, 24749
>> ; P9LE-NEXT: mulli r4, r4, 23
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: vmrghh v3, v3, v4
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: ori r5, r5, 47143
>> -; P9LE-NEXT: mulhw r4, r4, r5
>> +; P9LE-NEXT: lis r4, 24749
>> +; P9LE-NEXT: ori r4, r4, 47143
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 11
>> ; P9LE-NEXT: add r4, r4, r5
>> ; P9LE-NEXT: mulli r4, r4, 5423
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v2, v4
>> ; P9LE-NEXT: vmrglw v2, v2, v3
>> ; P9LE-NEXT: blr
>> ;
>> @@ -1058,49 +1012,47 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16>
>> %x) {
>> ; P8LE-LABEL: dont_fold_srem_one:
>> ; P8LE: # %bb.0:
>> ; P8LE-NEXT: xxswapd vs0, v2
>> -; P8LE-NEXT: lis r3, 24749
>> -; P8LE-NEXT: lis r7, -19946
>> -; P8LE-NEXT: lis r9, -14230
>> -; P8LE-NEXT: xxlxor v5, v5, v5
>> -; P8LE-NEXT: ori r3, r3, 47143
>> -; P8LE-NEXT: ori r7, r7, 17097
>> -; P8LE-NEXT: mffprd r4, f0
>> -; P8LE-NEXT: rldicl r5, r4, 16, 48
>> -; P8LE-NEXT: rldicl r6, r4, 32, 48
>> -; P8LE-NEXT: rldicl r4, r4, 48, 48
>> -; P8LE-NEXT: extsh r8, r5
>> -; P8LE-NEXT: extsh r10, r6
>> -; P8LE-NEXT: mulhw r3, r8, r3
>> -; P8LE-NEXT: ori r8, r9, 30865
>> -; P8LE-NEXT: extsh r9, r4
>> -; P8LE-NEXT: mulhw r7, r10, r7
>> -; P8LE-NEXT: mulhw r8, r9, r8
>> -; P8LE-NEXT: add r7, r7, r10
>> -; P8LE-NEXT: srwi r10, r3, 31
>> -; P8LE-NEXT: add r8, r8, r9
>> -; P8LE-NEXT: srawi r3, r3, 11
>> -; P8LE-NEXT: srwi r9, r7, 31
>> -; P8LE-NEXT: srawi r7, r7, 4
>> -; P8LE-NEXT: add r3, r3, r10
>> -; P8LE-NEXT: add r7, r7, r9
>> +; P8LE-NEXT: lis r5, 24749
>> +; P8LE-NEXT: lis r6, -19946
>> +; P8LE-NEXT: lis r8, -14230
>> +; P8LE-NEXT: ori r5, r5, 47143
>> +; P8LE-NEXT: ori r6, r6, 17097
>> +; P8LE-NEXT: ori r8, r8, 30865
>> +; P8LE-NEXT: mffprd r3, f0
>> +; P8LE-NEXT: rldicl r4, r3, 16, 48
>> +; P8LE-NEXT: rldicl r7, r3, 32, 48
>> +; P8LE-NEXT: rldicl r3, r3, 48, 48
>> +; P8LE-NEXT: extsh r4, r4
>> +; P8LE-NEXT: extsh r7, r7
>> +; P8LE-NEXT: extsh r3, r3
>> +; P8LE-NEXT: mulhw r5, r4, r5
>> +; P8LE-NEXT: mulhw r6, r7, r6
>> +; P8LE-NEXT: mulhw r8, r3, r8
>> +; P8LE-NEXT: srwi r9, r5, 31
>> +; P8LE-NEXT: srawi r5, r5, 11
>> +; P8LE-NEXT: add r6, r6, r7
>> +; P8LE-NEXT: add r8, r8, r3
>> +; P8LE-NEXT: add r5, r5, r9
>> +; P8LE-NEXT: srwi r9, r6, 31
>> +; P8LE-NEXT: srawi r6, r6, 4
>> +; P8LE-NEXT: add r6, r6, r9
>> ; P8LE-NEXT: srwi r9, r8, 31
>> ; P8LE-NEXT: srawi r8, r8, 9
>> -; P8LE-NEXT: mulli r3, r3, 5423
>> +; P8LE-NEXT: mulli r5, r5, 5423
>> ; P8LE-NEXT: add r8, r8, r9
>> -; P8LE-NEXT: mulli r7, r7, 23
>> +; P8LE-NEXT: mulli r6, r6, 23
>> +; P8LE-NEXT: li r9, 0
>> ; P8LE-NEXT: mulli r8, r8, 654
>> -; P8LE-NEXT: sub r3, r5, r3
>> -; P8LE-NEXT: mtfprd f0, r3
>> -; P8LE-NEXT: sub r3, r6, r7
>> -; P8LE-NEXT: sub r4, r4, r8
>> -; P8LE-NEXT: mtfprd f1, r3
>> -; P8LE-NEXT: mtfprd f2, r4
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: xxswapd v3, vs1
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: vmrglh v2, v2, v3
>> -; P8LE-NEXT: vmrglh v3, v4, v5
>> -; P8LE-NEXT: vmrglw v2, v2, v3
>> +; P8LE-NEXT: mtvsrd v2, r9
>> +; P8LE-NEXT: sub r4, r4, r5
>> +; P8LE-NEXT: sub r5, r7, r6
>> +; P8LE-NEXT: mtvsrd v3, r4
>> +; P8LE-NEXT: sub r3, r3, r8
>> +; P8LE-NEXT: mtvsrd v4, r5
>> +; P8LE-NEXT: mtvsrd v5, r3
>> +; P8LE-NEXT: vmrghh v3, v3, v4
>> +; P8LE-NEXT: vmrghh v2, v5, v2
>> +; P8LE-NEXT: vmrglw v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> ; P8BE-LABEL: dont_fold_srem_one:
>> @@ -1161,43 +1113,41 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x
>> i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, -19946
>> -; P9LE-NEXT: ori r5, r5, 17097
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: mulhw r5, r4, r5
>> -; P9LE-NEXT: add r4, r5, r4
>> +; P9LE-NEXT: lis r4, -19946
>> +; P9LE-NEXT: ori r4, r4, 17097
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> +; P9LE-NEXT: add r4, r4, r3
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 4
>> ; P9LE-NEXT: add r4, r4, r5
>> -; P9LE-NEXT: lis r5, 24749
>> ; P9LE-NEXT: mulli r4, r4, 23
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, 24749
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: ori r5, r5, 47143
>> -; P9LE-NEXT: mulhw r4, r4, r5
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: ori r4, r4, 47143
>> +; P9LE-NEXT: mulhw r4, r3, r4
>> ; P9LE-NEXT: srwi r5, r4, 31
>> ; P9LE-NEXT: srawi r4, r4, 11
>> ; P9LE-NEXT: add r4, r4, r5
>> ; P9LE-NEXT: mulli r4, r4, 5423
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: extsh r4, r3
>> -; P9LE-NEXT: srawi r4, r4, 15
>> +; P9LE-NEXT: extsh r3, r3
>> +; P9LE-NEXT: srawi r4, r3, 15
>> ; P9LE-NEXT: addze r4, r4
>> ; P9LE-NEXT: slwi r4, r4, 15
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxlxor v4, v4, v4
>> -; P9LE-NEXT: vmrglh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: li r3, 0
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> +; P9LE-NEXT: vmrghh v2, v2, v4
>> ; P9LE-NEXT: vmrglw v2, v3, v2
>> ; P9LE-NEXT: blr
>> ;
>> @@ -1252,42 +1202,40 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x
>> i16> %x) {
>> ; P8LE-NEXT: xxswapd vs0, v2
>> ; P8LE-NEXT: lis r4, 24749
>> ; P8LE-NEXT: lis r5, -19946
>> -; P8LE-NEXT: xxlxor v5, v5, v5
>> ; P8LE-NEXT: ori r4, r4, 47143
>> ; P8LE-NEXT: ori r5, r5, 17097
>> ; P8LE-NEXT: mffprd r3, f0
>> ; P8LE-NEXT: rldicl r6, r3, 16, 48
>> ; P8LE-NEXT: rldicl r7, r3, 32, 48
>> -; P8LE-NEXT: extsh r8, r6
>> -; P8LE-NEXT: extsh r9, r7
>> -; P8LE-NEXT: mulhw r4, r8, r4
>> -; P8LE-NEXT: mulhw r5, r9, r5
>> +; P8LE-NEXT: extsh r6, r6
>> +; P8LE-NEXT: extsh r7, r7
>> +; P8LE-NEXT: mulhw r4, r6, r4
>> +; P8LE-NEXT: mulhw r5, r7, r5
>> ; P8LE-NEXT: rldicl r3, r3, 48, 48
>> +; P8LE-NEXT: extsh r3, r3
>> ; P8LE-NEXT: srwi r8, r4, 31
>> ; P8LE-NEXT: srawi r4, r4, 11
>> -; P8LE-NEXT: add r5, r5, r9
>> +; P8LE-NEXT: add r5, r5, r7
>> ; P8LE-NEXT: add r4, r4, r8
>> ; P8LE-NEXT: srwi r8, r5, 31
>> ; P8LE-NEXT: srawi r5, r5, 4
>> ; P8LE-NEXT: mulli r4, r4, 5423
>> ; P8LE-NEXT: add r5, r5, r8
>> -; P8LE-NEXT: extsh r8, r3
>> +; P8LE-NEXT: srawi r9, r3, 15
>> +; P8LE-NEXT: li r8, 0
>> ; P8LE-NEXT: mulli r5, r5, 23
>> -; P8LE-NEXT: srawi r8, r8, 15
>> +; P8LE-NEXT: mtvsrd v2, r8
>> ; P8LE-NEXT: sub r4, r6, r4
>> -; P8LE-NEXT: addze r6, r8
>> -; P8LE-NEXT: mtfprd f0, r4
>> -; P8LE-NEXT: slwi r4, r6, 15
>> +; P8LE-NEXT: addze r6, r9
>> +; P8LE-NEXT: slwi r6, r6, 15
>> +; P8LE-NEXT: mtvsrd v3, r4
>> ; P8LE-NEXT: sub r5, r7, r5
>> -; P8LE-NEXT: sub r3, r3, r4
>> -; P8LE-NEXT: mtfprd f1, r5
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: mtfprd f2, r3
>> -; P8LE-NEXT: xxswapd v3, vs1
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: vmrglh v2, v2, v3
>> -; P8LE-NEXT: vmrglh v3, v4, v5
>> -; P8LE-NEXT: vmrglw v2, v2, v3
>> +; P8LE-NEXT: sub r3, r3, r6
>> +; P8LE-NEXT: mtvsrd v4, r5
>> +; P8LE-NEXT: mtvsrd v5, r3
>> +; P8LE-NEXT: vmrghh v3, v3, v4
>> +; P8LE-NEXT: vmrghh v2, v5, v2
>> +; P8LE-NEXT: vmrglw v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> ; P8BE-LABEL: dont_fold_urem_i16_smax:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> index 323397202c00..95f0fc25f2dd 100644
>> --- a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> +++ b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> @@ -15,10 +15,10 @@ entry:
>> }
>>
>> ; CHECK-LABEL: @bar0
>> +; CHECK-DAG: xxswapd 1, 1
>> ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
>> -; CHECK-DAG: xxspltd [[REG2:[0-9]+]]
>> -; CHECK: xxpermdi [[REG3:[0-9]+]], [[REG2]], [[REG1]], 1
>> -; CHECK: stxvd2x [[REG3]]
>> +; CHECK: xxmrgld [[REG2:[0-9]+]], 1, [[REG1]]
>> +; CHECK: stxvd2x [[REG2]]
>> ; CHECK-NOT: xxswapd
>>
>> define void @bar1(double %y) {
>> @@ -30,10 +30,10 @@ entry:
>> }
>>
>> ; CHECK-LABEL: @bar1
>> +; CHECK-DAG: xxswapd 1, 1
>> ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
>> -; CHECK-DAG: xxspltd [[REG2:[0-9]+]]
>> -; CHECK: xxmrghd [[REG3:[0-9]+]], [[REG1]], [[REG2]]
>> -; CHECK: stxvd2x [[REG3]]
>> +; CHECK: xxpermdi [[REG2:[0-9]+]], [[REG1]], 1, 1
>> +; CHECK: stxvd2x [[REG2]]
>> ; CHECK-NOT: xxswapd
>>
>> define void @baz0() {
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> index 23738eaa95a7..4437e6799269 100644
>> --- a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> +++ b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> @@ -27,7 +27,7 @@ define void @bar0() {
>> ; CHECK: ld r3, .LC0 at toc@l(r3)
>> ; CHECK: addis r3, r2, .LC2 at toc@ha
>> ; CHECK: ld r3, .LC2 at toc@l(r3)
>> -; CHECK: xxpermdi vs0, vs0, vs1, 1
>> +; CHECK: xxmrgld vs0, vs0, vs1
>> ; CHECK: stxvd2x vs0, 0, r3
>> ; CHECK: blr
>> ;
>> @@ -38,7 +38,7 @@ define void @bar0() {
>> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC1 at toc@ha
>> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC2 at toc@ha
>> ; CHECK-P9-NOVECTOR: ld r3, .LC2 at toc@l(r3)
>> -; CHECK-P9-NOVECTOR: xxpermdi vs0, vs1, vs0, 1
>> +; CHECK-P9-NOVECTOR: xxmrgld vs0, vs1, vs0
>> ; CHECK-P9-NOVECTOR: stxvd2x vs0, 0, r3
>> ; CHECK-P9-NOVECTOR: blr
>> ;
>> @@ -72,7 +72,7 @@ define void @bar1() {
>> ; CHECK: ld r3, .LC0 at toc@l(r3)
>> ; CHECK: addis r3, r2, .LC2 at toc@ha
>> ; CHECK: ld r3, .LC2 at toc@l(r3)
>> -; CHECK: xxmrghd vs0, vs1, vs0
>> +; CHECK: xxpermdi vs0, vs1, vs0, 1
>> ; CHECK: stxvd2x vs0, 0, r3
>> ; CHECK: blr
>> ;
>> @@ -83,7 +83,7 @@ define void @bar1() {
>> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC1 at toc@ha
>> ; CHECK-P9-NOVECTOR: addis r3, r2, .LC2 at toc@ha
>> ; CHECK-P9-NOVECTOR: ld r3, .LC2 at toc@l(r3)
>> -; CHECK-P9-NOVECTOR: xxmrghd vs0, vs0, vs1
>> +; CHECK-P9-NOVECTOR: xxpermdi vs0, vs0, vs1, 1
>> ; CHECK-P9-NOVECTOR: stxvd2x vs0, 0, r3
>> ; CHECK-P9-NOVECTOR: blr
>> ;
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> index d853a420dcd8..4bb3730aa043 100644
>> --- a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> +++ b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> @@ -13,53 +13,50 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, 21399
>> -; P9LE-NEXT: ori r5, r5, 33437
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r4, r4, r5
>> -; P9LE-NEXT: lis r5, 16727
>> -; P9LE-NEXT: ori r5, r5, 2287
>> +; P9LE-NEXT: lis r4, 21399
>> +; P9LE-NEXT: ori r4, r4, 33437
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: mulhwu r4, r3, r4
>> ; P9LE-NEXT: srwi r4, r4, 5
>> ; P9LE-NEXT: mulli r4, r4, 98
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, 16727
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r4, r4, r5
>> -; P9LE-NEXT: lis r5, 8456
>> -; P9LE-NEXT: ori r5, r5, 16913
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: ori r4, r4, 2287
>> +; P9LE-NEXT: mulhwu r4, r3, r4
>> ; P9LE-NEXT: srwi r4, r4, 8
>> ; P9LE-NEXT: mulli r4, r4, 1003
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: rlwinm r4, r3, 30, 18, 31
>> -; P9LE-NEXT: mulhwu r4, r4, r5
>> -; P9LE-NEXT: lis r5, 22765
>> -; P9LE-NEXT: ori r5, r5, 8969
>> -; P9LE-NEXT: srwi r4, r4, 2
>> -; P9LE-NEXT: mulli r4, r4, 124
>> -; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r5, 8456
>> +; P9LE-NEXT: ori r5, r5, 16913
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: clrlwi r4, r3, 16
>> +; P9LE-NEXT: rlwinm r3, r3, 30, 18, 31
>> +; P9LE-NEXT: mulhwu r3, r3, r5
>> +; P9LE-NEXT: srwi r3, r3, 2
>> +; P9LE-NEXT: mulli r3, r3, 124
>> +; P9LE-NEXT: sub r3, r4, r3
>> +; P9LE-NEXT: lis r4, 22765
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 0
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r5, r4, r5
>> -; P9LE-NEXT: sub r4, r4, r5
>> -; P9LE-NEXT: srwi r4, r4, 1
>> -; P9LE-NEXT: add r4, r4, r5
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: ori r4, r4, 8969
>> +; P9LE-NEXT: mulhwu r4, r3, r4
>> +; P9LE-NEXT: sub r5, r3, r4
>> +; P9LE-NEXT: srwi r5, r5, 1
>> +; P9LE-NEXT: add r4, r5, r4
>> ; P9LE-NEXT: srwi r4, r4, 6
>> ; P9LE-NEXT: mulli r4, r4, 95
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v2, v4, v2
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v4, v2
>> ; P9LE-NEXT: vmrglw v2, v3, v2
>> ; P9LE-NEXT: blr
>> ;
>> @@ -123,50 +120,47 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) {
>> ; P8LE-NEXT: xxswapd vs0, v2
>> ; P8LE-NEXT: lis r3, 22765
>> ; P8LE-NEXT: lis r7, 21399
>> -; P8LE-NEXT: lis r10, 16727
>> +; P8LE-NEXT: lis r9, 16727
>> +; P8LE-NEXT: lis r10, 8456
>> ; P8LE-NEXT: ori r3, r3, 8969
>> ; P8LE-NEXT: ori r7, r7, 33437
>> -; P8LE-NEXT: ori r10, r10, 2287
>> +; P8LE-NEXT: ori r9, r9, 2287
>> +; P8LE-NEXT: ori r10, r10, 16913
>> ; P8LE-NEXT: mffprd r4, f0
>> ; P8LE-NEXT: clrldi r6, r4, 48
>> ; P8LE-NEXT: rldicl r5, r4, 32, 48
>> -; P8LE-NEXT: clrlwi r9, r6, 16
>> +; P8LE-NEXT: clrlwi r6, r6, 16
>> ; P8LE-NEXT: rldicl r8, r4, 16, 48
>> -; P8LE-NEXT: clrlwi r11, r5, 16
>> -; P8LE-NEXT: mulhwu r3, r9, r3
>> -; P8LE-NEXT: clrlwi r12, r8, 16
>> -; P8LE-NEXT: mulhwu r7, r11, r7
>> -; P8LE-NEXT: lis r11, 8456
>> +; P8LE-NEXT: clrlwi r5, r5, 16
>> +; P8LE-NEXT: mulhwu r3, r6, r3
>> ; P8LE-NEXT: rldicl r4, r4, 48, 48
>> -; P8LE-NEXT: mulhwu r10, r12, r10
>> -; P8LE-NEXT: ori r11, r11, 16913
>> -; P8LE-NEXT: rlwinm r12, r4, 30, 18, 31
>> -; P8LE-NEXT: mulhwu r11, r12, r11
>> -; P8LE-NEXT: sub r9, r9, r3
>> -; P8LE-NEXT: srwi r9, r9, 1
>> +; P8LE-NEXT: clrlwi r8, r8, 16
>> +; P8LE-NEXT: rlwinm r11, r4, 30, 18, 31
>> +; P8LE-NEXT: mulhwu r7, r5, r7
>> +; P8LE-NEXT: clrlwi r4, r4, 16
>> +; P8LE-NEXT: mulhwu r9, r8, r9
>> +; P8LE-NEXT: mulhwu r10, r11, r10
>> +; P8LE-NEXT: sub r11, r6, r3
>> +; P8LE-NEXT: srwi r11, r11, 1
>> ; P8LE-NEXT: srwi r7, r7, 5
>> -; P8LE-NEXT: add r3, r9, r3
>> -; P8LE-NEXT: srwi r9, r10, 8
>> +; P8LE-NEXT: add r3, r11, r3
>> +; P8LE-NEXT: srwi r9, r9, 8
>> +; P8LE-NEXT: srwi r10, r10, 2
>> ; P8LE-NEXT: srwi r3, r3, 6
>> ; P8LE-NEXT: mulli r7, r7, 98
>> -; P8LE-NEXT: srwi r10, r11, 2
>> ; P8LE-NEXT: mulli r9, r9, 1003
>> ; P8LE-NEXT: mulli r3, r3, 95
>> ; P8LE-NEXT: mulli r10, r10, 124
>> ; P8LE-NEXT: sub r5, r5, r7
>> ; P8LE-NEXT: sub r7, r8, r9
>> -; P8LE-NEXT: mtfprd f0, r5
>> ; P8LE-NEXT: sub r3, r6, r3
>> +; P8LE-NEXT: mtvsrd v2, r5
>> ; P8LE-NEXT: sub r4, r4, r10
>> -; P8LE-NEXT: mtfprd f1, r7
>> -; P8LE-NEXT: mtfprd f2, r3
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: mtfprd f3, r4
>> -; P8LE-NEXT: xxswapd v3, vs1
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: xxswapd v5, vs3
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: vmrglh v3, v5, v4
>> +; P8LE-NEXT: mtvsrd v3, r7
>> +; P8LE-NEXT: mtvsrd v4, r3
>> +; P8LE-NEXT: mtvsrd v5, r4
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> +; P8LE-NEXT: vmrghh v3, v5, v4
>> ; P8LE-NEXT: vmrglw v2, v2, v3
>> ; P8LE-NEXT: blr
>> ;
>> @@ -230,56 +224,52 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 0
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, 22765
>> -; P9LE-NEXT: ori r5, r5, 8969
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r6, r4, r5
>> -; P9LE-NEXT: sub r4, r4, r6
>> -; P9LE-NEXT: srwi r4, r4, 1
>> -; P9LE-NEXT: add r4, r4, r6
>> -; P9LE-NEXT: srwi r4, r4, 6
>> -; P9LE-NEXT: mulli r4, r4, 95
>> -; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, 22765
>> +; P9LE-NEXT: ori r4, r4, 8969
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: mulhwu r5, r3, r4
>> +; P9LE-NEXT: sub r6, r3, r5
>> +; P9LE-NEXT: srwi r6, r6, 1
>> +; P9LE-NEXT: add r5, r6, r5
>> +; P9LE-NEXT: srwi r5, r5, 6
>> +; P9LE-NEXT: mulli r5, r5, 95
>> +; P9LE-NEXT: sub r3, r3, r5
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r6, r4, r5
>> -; P9LE-NEXT: sub r4, r4, r6
>> -; P9LE-NEXT: srwi r4, r4, 1
>> -; P9LE-NEXT: add r4, r4, r6
>> -; P9LE-NEXT: srwi r4, r4, 6
>> -; P9LE-NEXT: mulli r4, r4, 95
>> -; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: mulhwu r5, r3, r4
>> +; P9LE-NEXT: sub r6, r3, r5
>> +; P9LE-NEXT: srwi r6, r6, 1
>> +; P9LE-NEXT: add r5, r6, r5
>> +; P9LE-NEXT: srwi r5, r5, 6
>> +; P9LE-NEXT: mulli r5, r5, 95
>> +; P9LE-NEXT: sub r3, r3, r5
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r6, r4, r5
>> -; P9LE-NEXT: sub r4, r4, r6
>> -; P9LE-NEXT: srwi r4, r4, 1
>> -; P9LE-NEXT: add r4, r4, r6
>> -; P9LE-NEXT: srwi r4, r4, 6
>> -; P9LE-NEXT: mulli r4, r4, 95
>> -; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: mulhwu r5, r3, r4
>> +; P9LE-NEXT: sub r6, r3, r5
>> +; P9LE-NEXT: srwi r6, r6, 1
>> +; P9LE-NEXT: add r5, r6, r5
>> +; P9LE-NEXT: srwi r5, r5, 6
>> +; P9LE-NEXT: mulli r5, r5, 95
>> +; P9LE-NEXT: sub r3, r3, r5
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r5, r4, r5
>> -; P9LE-NEXT: sub r4, r4, r5
>> -; P9LE-NEXT: srwi r4, r4, 1
>> -; P9LE-NEXT: add r4, r4, r5
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: mulhwu r4, r3, r4
>> +; P9LE-NEXT: sub r5, r3, r4
>> +; P9LE-NEXT: srwi r5, r5, 1
>> +; P9LE-NEXT: add r4, r5, r4
>> ; P9LE-NEXT: srwi r4, r4, 6
>> ; P9LE-NEXT: mulli r4, r4, 95
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v2, v4
>> ; P9LE-NEXT: vmrglw v2, v2, v3
>> ; P9LE-NEXT: blr
>> ;
>> @@ -344,36 +334,34 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
>> ; P8LE: # %bb.0:
>> ; P8LE-NEXT: xxswapd vs0, v2
>> ; P8LE-NEXT: lis r3, 22765
>> -; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill
>> ; P8LE-NEXT: ori r3, r3, 8969
>> ; P8LE-NEXT: mffprd r4, f0
>> ; P8LE-NEXT: clrldi r5, r4, 48
>> ; P8LE-NEXT: rldicl r6, r4, 48, 48
>> -; P8LE-NEXT: clrlwi r8, r5, 16
>> +; P8LE-NEXT: clrlwi r5, r5, 16
>> ; P8LE-NEXT: rldicl r7, r4, 32, 48
>> -; P8LE-NEXT: clrlwi r9, r6, 16
>> +; P8LE-NEXT: clrlwi r6, r6, 16
>> +; P8LE-NEXT: mulhwu r8, r5, r3
>> ; P8LE-NEXT: rldicl r4, r4, 16, 48
>> -; P8LE-NEXT: mulhwu r10, r8, r3
>> -; P8LE-NEXT: clrlwi r11, r7, 16
>> -; P8LE-NEXT: clrlwi r0, r4, 16
>> -; P8LE-NEXT: mulhwu r12, r9, r3
>> -; P8LE-NEXT: mulhwu r30, r11, r3
>> -; P8LE-NEXT: mulhwu r3, r0, r3
>> -; P8LE-NEXT: sub r8, r8, r10
>> -; P8LE-NEXT: srwi r8, r8, 1
>> -; P8LE-NEXT: sub r9, r9, r12
>> -; P8LE-NEXT: add r8, r8, r10
>> -; P8LE-NEXT: sub r10, r11, r30
>> -; P8LE-NEXT: sub r11, r0, r3
>> -; P8LE-NEXT: srwi r9, r9, 1
>> -; P8LE-NEXT: srwi r10, r10, 1
>> +; P8LE-NEXT: clrlwi r7, r7, 16
>> +; P8LE-NEXT: mulhwu r9, r6, r3
>> +; P8LE-NEXT: clrlwi r4, r4, 16
>> +; P8LE-NEXT: mulhwu r10, r7, r3
>> +; P8LE-NEXT: mulhwu r3, r4, r3
>> +; P8LE-NEXT: sub r11, r5, r8
>> +; P8LE-NEXT: sub r12, r6, r9
>> +; P8LE-NEXT: srwi r11, r11, 1
>> +; P8LE-NEXT: add r8, r11, r8
>> +; P8LE-NEXT: sub r11, r7, r10
>> +; P8LE-NEXT: srwi r12, r12, 1
>> +; P8LE-NEXT: add r9, r12, r9
>> +; P8LE-NEXT: sub r12, r4, r3
>> ; P8LE-NEXT: srwi r11, r11, 1
>> -; P8LE-NEXT: add r9, r9, r12
>> ; P8LE-NEXT: srwi r8, r8, 6
>> -; P8LE-NEXT: add r10, r10, r30
>> -; P8LE-NEXT: add r3, r11, r3
>> +; P8LE-NEXT: add r10, r11, r10
>> +; P8LE-NEXT: srwi r11, r12, 1
>> ; P8LE-NEXT: srwi r9, r9, 6
>> -; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
>> +; P8LE-NEXT: add r3, r11, r3
>> ; P8LE-NEXT: mulli r8, r8, 95
>> ; P8LE-NEXT: srwi r10, r10, 6
>> ; P8LE-NEXT: srwi r3, r3, 6
>> @@ -382,18 +370,14 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
>> ; P8LE-NEXT: mulli r3, r3, 95
>> ; P8LE-NEXT: sub r5, r5, r8
>> ; P8LE-NEXT: sub r6, r6, r9
>> -; P8LE-NEXT: mtfprd f0, r5
>> +; P8LE-NEXT: mtvsrd v2, r5
>> ; P8LE-NEXT: sub r5, r7, r10
>> ; P8LE-NEXT: sub r3, r4, r3
>> -; P8LE-NEXT: mtfprd f1, r6
>> -; P8LE-NEXT: mtfprd f2, r5
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: mtfprd f3, r3
>> -; P8LE-NEXT: xxswapd v3, vs1
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: xxswapd v5, vs3
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: vmrglh v3, v5, v4
>> +; P8LE-NEXT: mtvsrd v3, r6
>> +; P8LE-NEXT: mtvsrd v4, r5
>> +; P8LE-NEXT: mtvsrd v5, r3
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> +; P8LE-NEXT: vmrghh v3, v5, v4
>> ; P8LE-NEXT: vmrglw v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> @@ -461,67 +445,59 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 0
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, 22765
>> -; P9LE-NEXT: ori r5, r5, 8969
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r6, r4, r5
>> -; P9LE-NEXT: sub r4, r4, r6
>> -; P9LE-NEXT: srwi r4, r4, 1
>> -; P9LE-NEXT: add r4, r4, r6
>> -; P9LE-NEXT: srwi r4, r4, 6
>> -; P9LE-NEXT: mulli r6, r4, 95
>> +; P9LE-NEXT: lis r4, 22765
>> +; P9LE-NEXT: ori r4, r4, 8969
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: mulhwu r5, r3, r4
>> +; P9LE-NEXT: sub r6, r3, r5
>> +; P9LE-NEXT: srwi r6, r6, 1
>> +; P9LE-NEXT: add r5, r6, r5
>> +; P9LE-NEXT: srwi r5, r5, 6
>> +; P9LE-NEXT: mulli r6, r5, 95
>> ; P9LE-NEXT: sub r3, r3, r6
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: clrlwi r6, r3, 16
>> -; P9LE-NEXT: mulhwu r7, r6, r5
>> +; P9LE-NEXT: mulhwu r7, r6, r4
>> ; P9LE-NEXT: sub r6, r6, r7
>> ; P9LE-NEXT: srwi r6, r6, 1
>> ; P9LE-NEXT: add r6, r6, r7
>> ; P9LE-NEXT: srwi r6, r6, 6
>> ; P9LE-NEXT: mulli r7, r6, 95
>> ; P9LE-NEXT: sub r3, r3, r7
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: clrlwi r7, r3, 16
>> -; P9LE-NEXT: mulhwu r8, r7, r5
>> +; P9LE-NEXT: mulhwu r8, r7, r4
>> ; P9LE-NEXT: sub r7, r7, r8
>> ; P9LE-NEXT: srwi r7, r7, 1
>> ; P9LE-NEXT: add r7, r7, r8
>> ; P9LE-NEXT: srwi r7, r7, 6
>> ; P9LE-NEXT: mulli r8, r7, 95
>> ; P9LE-NEXT: sub r3, r3, r8
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: clrlwi r8, r3, 16
>> -; P9LE-NEXT: mulhwu r5, r8, r5
>> -; P9LE-NEXT: sub r8, r8, r5
>> +; P9LE-NEXT: mulhwu r4, r8, r4
>> +; P9LE-NEXT: sub r8, r8, r4
>> ; P9LE-NEXT: srwi r8, r8, 1
>> -; P9LE-NEXT: add r5, r8, r5
>> -; P9LE-NEXT: srwi r5, r5, 6
>> -; P9LE-NEXT: mulli r8, r5, 95
>> +; P9LE-NEXT: add r4, r8, r4
>> +; P9LE-NEXT: srwi r4, r4, 6
>> +; P9LE-NEXT: mulli r8, r4, 95
>> ; P9LE-NEXT: sub r3, r3, r8
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: mtfprd f0, r4
>> -; P9LE-NEXT: vmrglh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v2, v4
>> +; P9LE-NEXT: mtvsrd v4, r6
>> ; P9LE-NEXT: vmrglw v2, v2, v3
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r6
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r7
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r5
>> -; P9LE-NEXT: xxswapd v5, vs0
>> -; P9LE-NEXT: vmrglh v4, v5, v4
>> +; P9LE-NEXT: mtvsrd v3, r5
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: mtvsrd v4, r7
>> +; P9LE-NEXT: mtvsrd v5, r4
>> +; P9LE-NEXT: vmrghh v4, v5, v4
>> ; P9LE-NEXT: vmrglw v3, v4, v3
>> ; P9LE-NEXT: vadduhm v2, v2, v3
>> ; P9LE-NEXT: blr
>> @@ -598,69 +574,61 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) {
>> ; P8LE-LABEL: combine_urem_udiv:
>> ; P8LE: # %bb.0:
>> ; P8LE-NEXT: xxswapd vs0, v2
>> -; P8LE-NEXT: lis r4, 22765
>> +; P8LE-NEXT: lis r3, 22765
>> ; P8LE-NEXT: std r30, -16(r1) # 8-byte Folded Spill
>> -; P8LE-NEXT: ori r4, r4, 8969
>> -; P8LE-NEXT: mffprd r5, f0
>> -; P8LE-NEXT: clrldi r3, r5, 48
>> -; P8LE-NEXT: rldicl r6, r5, 48, 48
>> -; P8LE-NEXT: clrlwi r8, r3, 16
>> -; P8LE-NEXT: rldicl r7, r5, 32, 48
>> -; P8LE-NEXT: clrlwi r9, r6, 16
>> -; P8LE-NEXT: mulhwu r10, r8, r4
>> -; P8LE-NEXT: clrlwi r11, r7, 16
>> -; P8LE-NEXT: rldicl r5, r5, 16, 48
>> -; P8LE-NEXT: mulhwu r12, r9, r4
>> -; P8LE-NEXT: mulhwu r0, r11, r4
>> -; P8LE-NEXT: clrlwi r30, r5, 16
>> -; P8LE-NEXT: mulhwu r4, r30, r4
>> -; P8LE-NEXT: sub r8, r8, r10
>> +; P8LE-NEXT: ori r3, r3, 8969
>> +; P8LE-NEXT: mffprd r4, f0
>> +; P8LE-NEXT: clrldi r5, r4, 48
>> +; P8LE-NEXT: rldicl r6, r4, 48, 48
>> +; P8LE-NEXT: clrlwi r5, r5, 16
>> +; P8LE-NEXT: clrlwi r8, r6, 16
>> +; P8LE-NEXT: rldicl r7, r4, 32, 48
>> +; P8LE-NEXT: rldicl r4, r4, 16, 48
>> +; P8LE-NEXT: mulhwu r9, r5, r3
>> +; P8LE-NEXT: mulhwu r11, r8, r3
>> +; P8LE-NEXT: clrlwi r10, r7, 16
>> +; P8LE-NEXT: clrlwi r12, r4, 16
>> +; P8LE-NEXT: mulhwu r0, r10, r3
>> +; P8LE-NEXT: mulhwu r3, r12, r3
>> +; P8LE-NEXT: sub r30, r5, r9
>> +; P8LE-NEXT: sub r8, r8, r11
>> +; P8LE-NEXT: srwi r30, r30, 1
>> ; P8LE-NEXT: srwi r8, r8, 1
>> -; P8LE-NEXT: sub r9, r9, r12
>> -; P8LE-NEXT: add r8, r8, r10
>> -; P8LE-NEXT: sub r10, r11, r0
>> -; P8LE-NEXT: srwi r9, r9, 1
>> +; P8LE-NEXT: sub r10, r10, r0
>> +; P8LE-NEXT: add r9, r30, r9
>> +; P8LE-NEXT: add r8, r8, r11
>> +; P8LE-NEXT: sub r11, r12, r3
>> ; P8LE-NEXT: srwi r10, r10, 1
>> -; P8LE-NEXT: sub r11, r30, r4
>> -; P8LE-NEXT: add r9, r9, r12
>> -; P8LE-NEXT: srwi r8, r8, 6
>> ; P8LE-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
>> -; P8LE-NEXT: add r10, r10, r0
>> -; P8LE-NEXT: srwi r11, r11, 1
>> ; P8LE-NEXT: srwi r9, r9, 6
>> -; P8LE-NEXT: mtfprd f0, r8
>> -; P8LE-NEXT: mulli r12, r8, 95
>> +; P8LE-NEXT: srwi r11, r11, 1
>> +; P8LE-NEXT: srwi r8, r8, 6
>> +; P8LE-NEXT: add r10, r10, r0
>> +; P8LE-NEXT: mulli r12, r9, 95
>> +; P8LE-NEXT: add r3, r11, r3
>> +; P8LE-NEXT: mtvsrd v2, r9
>> ; P8LE-NEXT: srwi r10, r10, 6
>> -; P8LE-NEXT: add r4, r11, r4
>> -; P8LE-NEXT: mtfprd f1, r9
>> -; P8LE-NEXT: mulli r8, r9, 95
>> -; P8LE-NEXT: mulli r9, r10, 95
>> -; P8LE-NEXT: srwi r4, r4, 6
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: mtfprd f2, r10
>> -; P8LE-NEXT: mtfprd f3, r4
>> -; P8LE-NEXT: mulli r4, r4, 95
>> -; P8LE-NEXT: xxswapd v3, vs1
>> -; P8LE-NEXT: xxswapd v1, vs2
>> -; P8LE-NEXT: sub r3, r3, r12
>> -; P8LE-NEXT: xxswapd v6, vs3
>> -; P8LE-NEXT: mtfprd f0, r3
>> -; P8LE-NEXT: sub r3, r7, r9
>> -; P8LE-NEXT: sub r6, r6, r8
>> -; P8LE-NEXT: mtfprd f4, r3
>> -; P8LE-NEXT: sub r3, r5, r4
>> -; P8LE-NEXT: mtfprd f1, r6
>> -; P8LE-NEXT: mtfprd f5, r3
>> -; P8LE-NEXT: xxswapd v5, vs4
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: xxswapd v3, vs0
>> -; P8LE-NEXT: xxswapd v4, vs1
>> -; P8LE-NEXT: xxswapd v0, vs5
>> -; P8LE-NEXT: vmrglh v3, v4, v3
>> -; P8LE-NEXT: vmrglh v4, v0, v5
>> -; P8LE-NEXT: vmrglh v5, v6, v1
>> -; P8LE-NEXT: vmrglw v3, v4, v3
>> -; P8LE-NEXT: vmrglw v2, v5, v2
>> +; P8LE-NEXT: mulli r9, r8, 95
>> +; P8LE-NEXT: srwi r3, r3, 6
>> +; P8LE-NEXT: mtvsrd v3, r8
>> +; P8LE-NEXT: mulli r8, r10, 95
>> +; P8LE-NEXT: mtvsrd v4, r10
>> +; P8LE-NEXT: mulli r10, r3, 95
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> +; P8LE-NEXT: sub r5, r5, r12
>> +; P8LE-NEXT: sub r6, r6, r9
>> +; P8LE-NEXT: mtvsrd v3, r5
>> +; P8LE-NEXT: mtvsrd v5, r6
>> +; P8LE-NEXT: sub r5, r7, r8
>> +; P8LE-NEXT: sub r4, r4, r10
>> +; P8LE-NEXT: mtvsrd v0, r5
>> +; P8LE-NEXT: mtvsrd v1, r4
>> +; P8LE-NEXT: vmrghh v3, v5, v3
>> +; P8LE-NEXT: mtvsrd v5, r3
>> +; P8LE-NEXT: vmrghh v0, v1, v0
>> +; P8LE-NEXT: vmrghh v4, v5, v4
>> +; P8LE-NEXT: vmrglw v3, v0, v3
>> +; P8LE-NEXT: vmrglw v2, v4, v2
>> ; P8LE-NEXT: vadduhm v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> @@ -742,34 +710,30 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
>> i16> %x) {
>> ; P9LE-NEXT: li r3, 0
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: clrlwi r3, r3, 26
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: clrlwi r3, r3, 27
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, 22765
>> -; P9LE-NEXT: ori r5, r5, 8969
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r5, r4, r5
>> -; P9LE-NEXT: sub r4, r4, r5
>> -; P9LE-NEXT: srwi r4, r4, 1
>> -; P9LE-NEXT: add r4, r4, r5
>> +; P9LE-NEXT: lis r4, 22765
>> +; P9LE-NEXT: ori r4, r4, 8969
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: mulhwu r4, r3, r4
>> +; P9LE-NEXT: sub r5, r3, r4
>> +; P9LE-NEXT: srwi r5, r5, 1
>> +; P9LE-NEXT: add r4, r5, r4
>> ; P9LE-NEXT: srwi r4, r4, 6
>> ; P9LE-NEXT: mulli r4, r4, 95
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> ; P9LE-NEXT: clrlwi r3, r3, 29
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v2, v4, v2
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: vmrghh v2, v4, v2
>> ; P9LE-NEXT: vmrglw v2, v2, v3
>> ; P9LE-NEXT: blr
>> ;
>> @@ -817,9 +781,9 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
>> i16> %x) {
>> ; P8LE-NEXT: mffprd r4, f0
>> ; P8LE-NEXT: rldicl r5, r4, 16, 48
>> ; P8LE-NEXT: rldicl r7, r4, 48, 48
>> -; P8LE-NEXT: clrlwi r6, r5, 16
>> -; P8LE-NEXT: mulhwu r3, r6, r3
>> -; P8LE-NEXT: sub r6, r6, r3
>> +; P8LE-NEXT: clrlwi r5, r5, 16
>> +; P8LE-NEXT: mulhwu r3, r5, r3
>> +; P8LE-NEXT: sub r6, r5, r3
>> ; P8LE-NEXT: srwi r6, r6, 1
>> ; P8LE-NEXT: add r3, r6, r3
>> ; P8LE-NEXT: clrldi r6, r4, 48
>> @@ -827,19 +791,15 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
>> i16> %x) {
>> ; P8LE-NEXT: clrlwi r6, r6, 26
>> ; P8LE-NEXT: mulli r3, r3, 95
>> ; P8LE-NEXT: rldicl r4, r4, 32, 48
>> -; P8LE-NEXT: mtfprd f0, r6
>> +; P8LE-NEXT: mtvsrd v2, r6
>> ; P8LE-NEXT: clrlwi r6, r7, 27
>> ; P8LE-NEXT: clrlwi r4, r4, 29
>> -; P8LE-NEXT: mtfprd f1, r6
>> -; P8LE-NEXT: mtfprd f3, r4
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: xxswapd v3, vs1
>> +; P8LE-NEXT: mtvsrd v3, r6
>> +; P8LE-NEXT: mtvsrd v5, r4
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> ; P8LE-NEXT: sub r3, r5, r3
>> -; P8LE-NEXT: xxswapd v5, vs3
>> -; P8LE-NEXT: mtfprd f2, r3
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: vmrglh v3, v4, v5
>> +; P8LE-NEXT: mtvsrd v4, r3
>> +; P8LE-NEXT: vmrghh v3, v4, v5
>> ; P8LE-NEXT: vmrglw v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> @@ -885,40 +845,39 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) {
>> ; P9LE: # %bb.0:
>> ; P9LE-NEXT: li r3, 4
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: lis r5, -19946
>> -; P9LE-NEXT: ori r5, r5, 17097
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r4, r4, r5
>> -; P9LE-NEXT: lis r5, 24749
>> -; P9LE-NEXT: ori r5, r5, 47143
>> +; P9LE-NEXT: lis r4, -19946
>> +; P9LE-NEXT: ori r4, r4, 17097
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: mulhwu r4, r3, r4
>> ; P9LE-NEXT: srwi r4, r4, 4
>> ; P9LE-NEXT: mulli r4, r4, 23
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: lis r4, 24749
>> +; P9LE-NEXT: mtvsrd v3, r3
>> ; P9LE-NEXT: li r3, 6
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: clrlwi r4, r3, 16
>> -; P9LE-NEXT: mulhwu r4, r4, r5
>> -; P9LE-NEXT: lis r5, -14230
>> -; P9LE-NEXT: ori r5, r5, 30865
>> +; P9LE-NEXT: clrlwi r3, r3, 16
>> +; P9LE-NEXT: ori r4, r4, 47143
>> +; P9LE-NEXT: mulhwu r4, r3, r4
>> ; P9LE-NEXT: srwi r4, r4, 11
>> ; P9LE-NEXT: mulli r4, r4, 5423
>> ; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v3, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> +; P9LE-NEXT: mtvsrd v4, r3
>> ; P9LE-NEXT: li r3, 2
>> ; P9LE-NEXT: vextuhrx r3, r3, v2
>> -; P9LE-NEXT: rlwinm r4, r3, 31, 17, 31
>> -; P9LE-NEXT: mulhwu r4, r4, r5
>> -; P9LE-NEXT: srwi r4, r4, 8
>> -; P9LE-NEXT: mulli r4, r4, 654
>> -; P9LE-NEXT: sub r3, r3, r4
>> -; P9LE-NEXT: xxswapd v4, vs0
>> -; P9LE-NEXT: mtfprd f0, r3
>> -; P9LE-NEXT: xxswapd v2, vs0
>> -; P9LE-NEXT: vmrglh v3, v4, v3
>> -; P9LE-NEXT: xxlxor v4, v4, v4
>> -; P9LE-NEXT: vmrglh v2, v2, v4
>> +; P9LE-NEXT: lis r5, -14230
>> +; P9LE-NEXT: ori r5, r5, 30865
>> +; P9LE-NEXT: vmrghh v3, v4, v3
>> +; P9LE-NEXT: clrlwi r4, r3, 16
>> +; P9LE-NEXT: rlwinm r3, r3, 31, 17, 31
>> +; P9LE-NEXT: mulhwu r3, r3, r5
>> +; P9LE-NEXT: srwi r3, r3, 8
>> +; P9LE-NEXT: mulli r3, r3, 654
>> +; P9LE-NEXT: sub r3, r4, r3
>> +; P9LE-NEXT: mtvsrd v2, r3
>> +; P9LE-NEXT: li r3, 0
>> +; P9LE-NEXT: mtvsrd v4, r3
>> +; P9LE-NEXT: vmrghh v2, v2, v4
>> ; P9LE-NEXT: vmrglw v2, v3, v2
>> ; P9LE-NEXT: blr
>> ;
>> @@ -969,41 +928,40 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) {
>> ; P8LE-LABEL: dont_fold_urem_one:
>> ; P8LE: # %bb.0:
>> ; P8LE-NEXT: xxswapd vs0, v2
>> -; P8LE-NEXT: lis r3, -19946
>> -; P8LE-NEXT: lis r7, 24749
>> -; P8LE-NEXT: lis r9, -14230
>> -; P8LE-NEXT: xxlxor v5, v5, v5
>> -; P8LE-NEXT: ori r3, r3, 17097
>> -; P8LE-NEXT: ori r7, r7, 47143
>> -; P8LE-NEXT: ori r9, r9, 30865
>> +; P8LE-NEXT: lis r3, -14230
>> +; P8LE-NEXT: lis r7, -19946
>> +; P8LE-NEXT: lis r9, 24749
>> +; P8LE-NEXT: ori r3, r3, 30865
>> +; P8LE-NEXT: ori r7, r7, 17097
>> ; P8LE-NEXT: mffprd r4, f0
>> -; P8LE-NEXT: rldicl r5, r4, 32, 48
>> -; P8LE-NEXT: rldicl r6, r4, 16, 48
>> -; P8LE-NEXT: clrlwi r8, r5, 16
>> -; P8LE-NEXT: rldicl r4, r4, 48, 48
>> +; P8LE-NEXT: rldicl r5, r4, 48, 48
>> +; P8LE-NEXT: rldicl r6, r4, 32, 48
>> +; P8LE-NEXT: rldicl r4, r4, 16, 48
>> +; P8LE-NEXT: rlwinm r8, r5, 31, 17, 31
>> +; P8LE-NEXT: clrlwi r6, r6, 16
>> +; P8LE-NEXT: clrlwi r5, r5, 16
>> ; P8LE-NEXT: mulhwu r3, r8, r3
>> -; P8LE-NEXT: clrlwi r8, r6, 16
>> -; P8LE-NEXT: mulhwu r7, r8, r7
>> -; P8LE-NEXT: rlwinm r8, r4, 31, 17, 31
>> -; P8LE-NEXT: mulhwu r8, r8, r9
>> -; P8LE-NEXT: srwi r3, r3, 4
>> -; P8LE-NEXT: srwi r7, r7, 11
>> -; P8LE-NEXT: mulli r3, r3, 23
>> -; P8LE-NEXT: srwi r8, r8, 8
>> -; P8LE-NEXT: mulli r7, r7, 5423
>> -; P8LE-NEXT: mulli r8, r8, 654
>> +; P8LE-NEXT: ori r8, r9, 47143
>> +; P8LE-NEXT: clrlwi r4, r4, 16
>> +; P8LE-NEXT: li r9, 0
>> +; P8LE-NEXT: mulhwu r7, r6, r7
>> +; P8LE-NEXT: mulhwu r8, r4, r8
>> +; P8LE-NEXT: mtvsrd v2, r9
>> +; P8LE-NEXT: srwi r3, r3, 8
>> +; P8LE-NEXT: srwi r7, r7, 4
>> +; P8LE-NEXT: mulli r3, r3, 654
>> +; P8LE-NEXT: srwi r8, r8, 11
>> +; P8LE-NEXT: mulli r7, r7, 23
>> +; P8LE-NEXT: mulli r8, r8, 5423
>> ; P8LE-NEXT: sub r3, r5, r3
>> ; P8LE-NEXT: sub r5, r6, r7
>> -; P8LE-NEXT: mtfprd f0, r3
>> +; P8LE-NEXT: mtvsrd v3, r3
>> ; P8LE-NEXT: sub r3, r4, r8
>> -; P8LE-NEXT: mtfprd f1, r5
>> -; P8LE-NEXT: mtfprd f2, r3
>> -; P8LE-NEXT: xxswapd v2, vs0
>> -; P8LE-NEXT: xxswapd v3, vs1
>> -; P8LE-NEXT: xxswapd v4, vs2
>> -; P8LE-NEXT: vmrglh v2, v3, v2
>> -; P8LE-NEXT: vmrglh v3, v4, v5
>> -; P8LE-NEXT: vmrglw v2, v2, v3
>> +; P8LE-NEXT: mtvsrd v4, r5
>> +; P8LE-NEXT: mtvsrd v5, r3
>> +; P8LE-NEXT: vmrghh v2, v3, v2
>> +; P8LE-NEXT: vmrghh v3, v5, v4
>> +; P8LE-NEXT: vmrglw v2, v3, v2
>> ; P8LE-NEXT: blr
>> ;
>> ; P8BE-LABEL: dont_fold_urem_one:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> index 239b38e2ec70..48b62f57c1c9 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> @@ -20,12 +20,10 @@ define i32 @test2elt(i64 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -40,13 +38,11 @@ define i32 @test2elt(i64 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs1
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: li r3, 0
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -90,20 +86,16 @@ define i64 @test4elt(<4 x float> %a)
>> local_unnamed_addr #1 {
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: mtfprd f3, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs2
>> -; CHECK-P8-NEXT: xxswapd v5, vs3
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglh v3, v4, v5
>> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mtvsrd v5, r3
>> +; CHECK-P8-NEXT: vmrghh v3, v4, v3
>> +; CHECK-P8-NEXT: vmrghh v2, v2, v5
>> +; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -114,27 +106,23 @@ define i64 @test4elt(<4 x float> %a)
>> local_unnamed_addr #1 {
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xscvspdpn f0, v2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglh v2, v4, v2
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> +; CHECK-P9-NEXT: vmrghh v2, v4, v2
>> ; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -180,59 +168,51 @@ define <8 x i16> @test8elt(<8 x float>* nocapture
>> readonly) local_unnamed_addr #
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: lvx v2, 0, r3
>> ; CHECK-P8-NEXT: li r4, 16
>> -; CHECK-P8-NEXT: lvx v5, r3, r4
>> -; CHECK-P8-NEXT: xxswapd vs1, v2
>> +; CHECK-P8-NEXT: lvx v3, r3, r4
>> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3
>> -; CHECK-P8-NEXT: xscvspdpn f4, v5
>> -; CHECK-P8-NEXT: xxswapd vs3, v5
>> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xxswapd vs1, v2
>> +; CHECK-P8-NEXT: xscvspdpn f2, v2
>> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1
>> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
>> +; CHECK-P8-NEXT: xscvspdpn f3, v3
>> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
>> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: mffprwz r6, f1
>> -; CHECK-P8-NEXT: mffprwz r5, f0
>> -; CHECK-P8-NEXT: mtfprd f1, r6
>> -; CHECK-P8-NEXT: mtfprd f0, r5
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: xscvspdpn f0, v2
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v1, vs4
>> -; CHECK-P8-NEXT: vmrglh v2, v4, v3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: xxswapd v5, vs2
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: xxswapd vs0, v3
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: mffprwz r3, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f4
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f5
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghh v2, v4, v2
>> +; CHECK-P8-NEXT: mffprwz r4, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v6, vs3
>> -; CHECK-P8-NEXT: xxswapd v0, vs0
>> -; CHECK-P8-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P8-NEXT: vmrglh v4, v0, v5
>> -; CHECK-P8-NEXT: vmrglh v5, v1, v6
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: vmrghh v3, v3, v4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mtvsrd v5, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: vmrghh v5, v0, v5
>> +; CHECK-P8-NEXT: mtvsrd v1, r3
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> +; CHECK-P8-NEXT: vmrghh v4, v4, v1
>> +; CHECK-P8-NEXT: vmrglw v3, v4, v5
>> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> ; CHECK-P8-NEXT: blr
>> ;
>> @@ -244,53 +224,45 @@ define <8 x i16> @test8elt(<8 x float>* nocapture
>> readonly) local_unnamed_addr #
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs1
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs0
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -363,116 +335,100 @@ define void @test16elt(<16 x i16>* noalias
>> nocapture sret %agg.result, <16 x flo
>> ; CHECK-P8-LABEL: test16elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: lvx v5, 0, r4
>> -; CHECK-P8-NEXT: li r6, 32
>> ; CHECK-P8-NEXT: li r5, 16
>> -; CHECK-P8-NEXT: lvx v2, r4, r6
>> +; CHECK-P8-NEXT: li r6, 32
>> ; CHECK-P8-NEXT: lvx v3, r4, r5
>> +; CHECK-P8-NEXT: lvx v2, r4, r6
>> ; CHECK-P8-NEXT: li r6, 48
>> -; CHECK-P8-NEXT: xscvspdpn f0, v5
>> -; CHECK-P8-NEXT: xxsldwi vs1, v5, v5, 3
>> +; CHECK-P8-NEXT: xxsldwi vs0, v5, v5, 3
>> +; CHECK-P8-NEXT: xscvspdpn f1, v5
>> ; CHECK-P8-NEXT: lvx v4, r4, r6
>> -; CHECK-P8-NEXT: xscvspdpn f4, v2
>> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT: xscvspdpn f2, v3
>> ; CHECK-P8-NEXT: xxswapd vs3, v5
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: xxswapd vs8, v3
>> -; CHECK-P8-NEXT: xscvspdpn f6, v4
>> +; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
>> ; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 3
>> +; CHECK-P8-NEXT: xxswapd vs8, v3
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xxsldwi vs10, v2, v2, 3
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT: xscvspdpn f7, vs7
>> +; CHECK-P8-NEXT: xscvspdpn f8, vs8
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: xxsldwi vs9, v3, v3, 1
>> +; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT: xscvspdpn f2, v3
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f5
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: xxsldwi vs0, v3, v3, 1
>> +; CHECK-P8-NEXT: xscvspdpn f4, v2
>> +; CHECK-P8-NEXT: xscvdpsxws f5, f7
>> +; CHECK-P8-NEXT: xxsldwi vs7, v4, v4, 3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 3
>> +; CHECK-P8-NEXT: xscvspdpn f6, v4
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f8
>> +; CHECK-P8-NEXT: xxswapd vs8, v4
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f5
>> +; CHECK-P8-NEXT: xxswapd vs5, v2
>> ; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT: xxsldwi vs12, v2, v2, 1
>> -; CHECK-P8-NEXT: xscvspdpn f8, vs8
>> -; CHECK-P8-NEXT: xxswapd vs11, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT: xxswapd v2, v4
>> +; CHECK-P8-NEXT: vmrghh v3, v0, v3
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xscvdpsxws f6, f6
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs5
>> +; CHECK-P8-NEXT: xxsldwi vs5, v2, v2, 1
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghh v2, v5, v1
>> +; CHECK-P8-NEXT: vmrghh v5, v6, v0
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f4
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f3
>> +; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f6
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> ; CHECK-P8-NEXT: xscvspdpn f7, vs7
>> -; CHECK-P8-NEXT: xxsldwi vs13, v4, v4, 3
>> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT: xxsldwi v3, v4, v4, 1
>> -; CHECK-P8-NEXT: xscvspdpn f10, vs10
>> +; CHECK-P8-NEXT: mtvsrd v7, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f2
>> +; CHECK-P8-NEXT: xxsldwi vs2, v4, v4, 1
>> +; CHECK-P8-NEXT: xscvspdpn f8, vs8
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f5
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs2
>> +; CHECK-P8-NEXT: xscvdpsxws f3, f7
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f8
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xscvspdpn f9, vs9
>> -; CHECK-P8-NEXT: xscvdpsxws f6, f6
>> -; CHECK-P8-NEXT: xscvspdpn f12, vs12
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> +; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: vmrghh v0, v0, v7
>> +; CHECK-P8-NEXT: mtvsrd v7, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: xscvspdpn f11, vs11
>> -; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xscvspdpn v2, v2
>> -; CHECK-P8-NEXT: xscvdpsxws f8, f8
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT: mffprwz r6, f2
>> -; CHECK-P8-NEXT: xscvspdpn f13, vs13
>> -; CHECK-P8-NEXT: xscvspdpn v3, v3
>> -; CHECK-P8-NEXT: xscvdpsxws f10, f10
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> +; CHECK-P8-NEXT: vmrghh v4, v8, v4
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT: mtfprd f2, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f6
>> -; CHECK-P8-NEXT: xscvdpsxws f12, f12
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f6, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f3
>> -; CHECK-P8-NEXT: xscvdpsxws v2, v2
>> -; CHECK-P8-NEXT: xxswapd v9, vs6
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f8
>> -; CHECK-P8-NEXT: mtfprd f3, r6
>> -; CHECK-P8-NEXT: xxswapd v0, vs5
>> -; CHECK-P8-NEXT: mffprwz r6, f7
>> -; CHECK-P8-NEXT: xscvdpsxws f13, f13
>> -; CHECK-P8-NEXT: xxswapd v5, vs3
>> -; CHECK-P8-NEXT: xscvdpsxws v3, v3
>> -; CHECK-P8-NEXT: mtfprd f8, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f10
>> -; CHECK-P8-NEXT: mtfprd f7, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f9
>> -; CHECK-P8-NEXT: mtfprd f10, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f12
>> -; CHECK-P8-NEXT: mtfprd f9, r6
>> -; CHECK-P8-NEXT: xxswapd v6, vs10
>> -; CHECK-P8-NEXT: mffprwz r6, f11
>> -; CHECK-P8-NEXT: mtfprd f12, r4
>> -; CHECK-P8-NEXT: xxswapd v1, vs9
>> -; CHECK-P8-NEXT: mfvsrwz r4, v2
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: mtfprd f11, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f13
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: xxswapd v7, vs11
>> -; CHECK-P8-NEXT: mfvsrwz r4, v3
>> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
>> -; CHECK-P8-NEXT: xxswapd v4, vs7
>> -; CHECK-P8-NEXT: vmrglh v2, v2, v0
>> -; CHECK-P8-NEXT: xxswapd v5, vs8
>> -; CHECK-P8-NEXT: xxswapd v0, vs2
>> -; CHECK-P8-NEXT: mtfprd f13, r6
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> -; CHECK-P8-NEXT: vmrglh v4, v5, v4
>> -; CHECK-P8-NEXT: vmrglh v5, v0, v1
>> -; CHECK-P8-NEXT: xxswapd v1, vs4
>> -; CHECK-P8-NEXT: vmrglh v0, v7, v6
>> -; CHECK-P8-NEXT: xxswapd v6, vs12
>> -; CHECK-P8-NEXT: xxswapd v7, vs13
>> -; CHECK-P8-NEXT: xxswapd v10, vs1
>> +; CHECK-P8-NEXT: vmrghh v1, v1, v9
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> +; CHECK-P8-NEXT: vmrghh v7, v8, v7
>> +; CHECK-P8-NEXT: vmrghh v6, v6, v9
>> ; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> -; CHECK-P8-NEXT: vmrglh v1, v1, v6
>> -; CHECK-P8-NEXT: vmrglh v6, v8, v7
>> -; CHECK-P8-NEXT: vmrglh v7, v9, v10
>> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> -; CHECK-P8-NEXT: vmrglw v4, v1, v0
>> -; CHECK-P8-NEXT: vmrglw v5, v7, v6
>> +; CHECK-P8-NEXT: vmrglw v3, v0, v5
>> +; CHECK-P8-NEXT: vmrglw v4, v1, v4
>> +; CHECK-P8-NEXT: vmrglw v5, v6, v7
>> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> ; CHECK-P8-NEXT: stvx v2, 0, r3
>> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4
>> @@ -481,118 +437,102 @@ define void @test16elt(<16 x i16>* noalias
>> nocapture sret %agg.result, <16 x flo
>> ;
>> ; CHECK-P9-LABEL: test16elt:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: lxv vs1, 0(r4)
>> -; CHECK-P9-NEXT: lxv vs3, 16(r4)
>> -; CHECK-P9-NEXT: xscvspdpn f5, vs1
>> -; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3
>> -; CHECK-P9-NEXT: xscvspdpn f8, vs3
>> -; CHECK-P9-NEXT: xxswapd vs4, vs1
>> -; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT: lxv vs2, 0(r4)
>> +; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3
>> +; CHECK-P9-NEXT: xxswapd vs4, vs2
>> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> ; CHECK-P9-NEXT: xscvspdpn f4, vs4
>> -; CHECK-P9-NEXT: xscvdpsxws f5, f5
>> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: xscvspdpn f5, vs2
>> +; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT: xscvdpsxws f8, f8
>> -; CHECK-P9-NEXT: xxsldwi vs6, vs3, vs3, 3
>> -; CHECK-P9-NEXT: xxswapd vs7, vs3
>> -; CHECK-P9-NEXT: xscvspdpn f6, vs6
>> -; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1
>> -; CHECK-P9-NEXT: xscvspdpn f7, vs7
>> -; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: mffprwz r5, f3
>> +; CHECK-P9-NEXT: lxv vs1, 16(r4)
>> +; CHECK-P9-NEXT: xxsldwi vs6, vs1, vs1, 3
>> +; CHECK-P9-NEXT: xxswapd vs3, vs1
>> +; CHECK-P9-NEXT: mtvsrd v2, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f4
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f5
>> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT: mtvsrd v3, r5
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT: mffprwz r5, f4
>> +; CHECK-P9-NEXT: xscvspdpn f4, vs6
>> +; CHECK-P9-NEXT: mtvsrd v3, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f2
>> +; CHECK-P9-NEXT: xscvspdpn f2, vs1
>> +; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P9-NEXT: xscvdpsxws f6, f6
>> -; CHECK-P9-NEXT: mffprwz r5, f5
>> -; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: xscvdpsxws f7, f7
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT: mtfprd f5, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f8
>> -; CHECK-P9-NEXT: mtfprd f8, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f2
>> ; CHECK-P9-NEXT: lxv vs0, 32(r4)
>> -; CHECK-P9-NEXT: xxsldwi vs9, vs0, vs0, 3
>> -; CHECK-P9-NEXT: xxswapd vs10, vs0
>> -; CHECK-P9-NEXT: xscvspdpn f9, vs9
>> -; CHECK-P9-NEXT: xscvspdpn f10, vs10
>> -; CHECK-P9-NEXT: xscvdpsxws f9, f9
>> -; CHECK-P9-NEXT: xscvdpsxws f10, f10
>> -; CHECK-P9-NEXT: mtfprd f2, r5
>> +; CHECK-P9-NEXT: mtvsrd v4, r5
>> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r5, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r5
>> +; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: mtvsrd v4, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f3
>> +; CHECK-P9-NEXT: xxsldwi vs3, vs0, vs0, 3
>> +; CHECK-P9-NEXT: mtvsrd v5, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f2
>> +; CHECK-P9-NEXT: xscvspdpn f2, vs3
>> +; CHECK-P9-NEXT: vmrghh v4, v5, v4
>> +; CHECK-P9-NEXT: mtvsrd v5, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f6
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> -; CHECK-P9-NEXT: xxswapd v3, vs4
>> +; CHECK-P9-NEXT: xxswapd vs1, vs0
>> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> +; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: vmrghh v5, v5, v0
>> +; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrglw v3, v5, v4
>> +; CHECK-P9-NEXT: mffprwz r5, f2
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: mtfprd f6, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f7
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> +; CHECK-P9-NEXT: mffprwz r5, f1
>> ; CHECK-P9-NEXT: lxv vs1, 48(r4)
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs5
>> -; CHECK-P9-NEXT: mtfprd f7, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f3
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs6
>> -; CHECK-P9-NEXT: xxswapd v5, vs7
>> -; CHECK-P9-NEXT: mtfprd f3, r5
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT: xxswapd v0, vs3
>> -; CHECK-P9-NEXT: vmrglh v4, v5, v4
>> -; CHECK-P9-NEXT: xxswapd v5, vs8
>> -; CHECK-P9-NEXT: vmrglh v5, v5, v0
>> +; CHECK-P9-NEXT: mtvsrd v1, r5
>> +; CHECK-P9-NEXT: vmrghh v0, v1, v0
>> ; CHECK-P9-NEXT: mffprwz r4, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r4
>> -; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: vmrglw v3, v5, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> ; CHECK-P9-NEXT: xxmrgld vs2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> +; CHECK-P9-NEXT: mffprwz r4, f0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 3
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P9-NEXT: vmrghh v2, v4, v2
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrglw v2, v2, v0
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs1
>> +; CHECK-P9-NEXT: mtvsrd v3, r4
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: vmrglh v2, v4, v2
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs1
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT: mffprwz r5, f9
>> -; CHECK-P9-NEXT: mtfprd f9, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f10
>> -; CHECK-P9-NEXT: mtfprd f10, r5
>> -; CHECK-P9-NEXT: xxswapd v0, vs9
>> -; CHECK-P9-NEXT: xxswapd v1, vs10
>> -; CHECK-P9-NEXT: vmrglh v0, v1, v0
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v0
>> -; CHECK-P9-NEXT: stxv vs2, 0(r3)
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r4
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2
>> ; CHECK-P9-NEXT: stxv vs0, 16(r3)
>> +; CHECK-P9-NEXT: stxv vs2, 0(r3)
>> ; CHECK-P9-NEXT: blr
>> ;
>> ; CHECK-BE-LABEL: test16elt:
>> @@ -728,12 +668,10 @@ define i32 @test2elt_signed(i64 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -748,13 +686,11 @@ define i32 @test2elt_signed(i64 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs1
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: li r3, 0
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -798,20 +734,16 @@ define i64 @test4elt_signed(<4 x float> %a)
>> local_unnamed_addr #1 {
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: mtfprd f3, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs2
>> -; CHECK-P8-NEXT: xxswapd v5, vs3
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglh v3, v4, v5
>> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mtvsrd v5, r3
>> +; CHECK-P8-NEXT: vmrghh v3, v4, v3
>> +; CHECK-P8-NEXT: vmrghh v2, v2, v5
>> +; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -822,27 +754,23 @@ define i64 @test4elt_signed(<4 x float> %a)
>> local_unnamed_addr #1 {
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xscvspdpn f0, v2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglh v2, v4, v2
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> +; CHECK-P9-NEXT: vmrghh v2, v4, v2
>> ; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -888,59 +816,51 @@ define <8 x i16> @test8elt_signed(<8 x float>*
>> nocapture readonly) local_unnamed
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: lvx v2, 0, r3
>> ; CHECK-P8-NEXT: li r4, 16
>> -; CHECK-P8-NEXT: lvx v5, r3, r4
>> -; CHECK-P8-NEXT: xxswapd vs1, v2
>> +; CHECK-P8-NEXT: lvx v3, r3, r4
>> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3
>> -; CHECK-P8-NEXT: xscvspdpn f4, v5
>> -; CHECK-P8-NEXT: xxswapd vs3, v5
>> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xxswapd vs1, v2
>> +; CHECK-P8-NEXT: xscvspdpn f2, v2
>> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1
>> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
>> +; CHECK-P8-NEXT: xscvspdpn f3, v3
>> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
>> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: mffprwz r6, f1
>> -; CHECK-P8-NEXT: mffprwz r5, f0
>> -; CHECK-P8-NEXT: mtfprd f1, r6
>> -; CHECK-P8-NEXT: mtfprd f0, r5
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: xscvspdpn f0, v2
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v1, vs4
>> -; CHECK-P8-NEXT: vmrglh v2, v4, v3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: xxswapd v5, vs2
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: xxswapd vs0, v3
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: mffprwz r3, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f4
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f5
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghh v2, v4, v2
>> +; CHECK-P8-NEXT: mffprwz r4, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v6, vs3
>> -; CHECK-P8-NEXT: xxswapd v0, vs0
>> -; CHECK-P8-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P8-NEXT: vmrglh v4, v0, v5
>> -; CHECK-P8-NEXT: vmrglh v5, v1, v6
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: vmrghh v3, v3, v4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mtvsrd v5, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: vmrghh v5, v0, v5
>> +; CHECK-P8-NEXT: mtvsrd v1, r3
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> +; CHECK-P8-NEXT: vmrghh v4, v4, v1
>> +; CHECK-P8-NEXT: vmrglw v3, v4, v5
>> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> ; CHECK-P8-NEXT: blr
>> ;
>> @@ -952,53 +872,45 @@ define <8 x i16> @test8elt_signed(<8 x float>*
>> nocapture readonly) local_unnamed
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs1
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs0
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -1071,116 +983,100 @@ define void @test16elt_signed(<16 x i16>*
>> noalias nocapture sret %agg.result, <1
>> ; CHECK-P8-LABEL: test16elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: lvx v5, 0, r4
>> -; CHECK-P8-NEXT: li r6, 32
>> ; CHECK-P8-NEXT: li r5, 16
>> -; CHECK-P8-NEXT: lvx v2, r4, r6
>> +; CHECK-P8-NEXT: li r6, 32
>> ; CHECK-P8-NEXT: lvx v3, r4, r5
>> +; CHECK-P8-NEXT: lvx v2, r4, r6
>> ; CHECK-P8-NEXT: li r6, 48
>> -; CHECK-P8-NEXT: xscvspdpn f0, v5
>> -; CHECK-P8-NEXT: xxsldwi vs1, v5, v5, 3
>> +; CHECK-P8-NEXT: xxsldwi vs0, v5, v5, 3
>> +; CHECK-P8-NEXT: xscvspdpn f1, v5
>> ; CHECK-P8-NEXT: lvx v4, r4, r6
>> -; CHECK-P8-NEXT: xscvspdpn f4, v2
>> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT: xscvspdpn f2, v3
>> ; CHECK-P8-NEXT: xxswapd vs3, v5
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: xxswapd vs8, v3
>> -; CHECK-P8-NEXT: xscvspdpn f6, v4
>> +; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
>> ; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 3
>> +; CHECK-P8-NEXT: xxswapd vs8, v3
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xxsldwi vs10, v2, v2, 3
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT: xscvspdpn f7, vs7
>> +; CHECK-P8-NEXT: xscvspdpn f8, vs8
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: xxsldwi vs9, v3, v3, 1
>> +; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT: xscvspdpn f2, v3
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f5
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: xxsldwi vs0, v3, v3, 1
>> +; CHECK-P8-NEXT: xscvspdpn f4, v2
>> +; CHECK-P8-NEXT: xscvdpsxws f5, f7
>> +; CHECK-P8-NEXT: xxsldwi vs7, v4, v4, 3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 3
>> +; CHECK-P8-NEXT: xscvspdpn f6, v4
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f8
>> +; CHECK-P8-NEXT: xxswapd vs8, v4
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f5
>> +; CHECK-P8-NEXT: xxswapd vs5, v2
>> ; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT: xxsldwi vs12, v2, v2, 1
>> -; CHECK-P8-NEXT: xscvspdpn f8, vs8
>> -; CHECK-P8-NEXT: xxswapd vs11, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT: xxswapd v2, v4
>> +; CHECK-P8-NEXT: vmrghh v3, v0, v3
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xscvdpsxws f6, f6
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs5
>> +; CHECK-P8-NEXT: xxsldwi vs5, v2, v2, 1
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghh v2, v5, v1
>> +; CHECK-P8-NEXT: vmrghh v5, v6, v0
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f4
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f3
>> +; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f6
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> ; CHECK-P8-NEXT: xscvspdpn f7, vs7
>> -; CHECK-P8-NEXT: xxsldwi vs13, v4, v4, 3
>> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT: xxsldwi v3, v4, v4, 1
>> -; CHECK-P8-NEXT: xscvspdpn f10, vs10
>> +; CHECK-P8-NEXT: mtvsrd v7, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f2
>> +; CHECK-P8-NEXT: xxsldwi vs2, v4, v4, 1
>> +; CHECK-P8-NEXT: xscvspdpn f8, vs8
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f5
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs2
>> +; CHECK-P8-NEXT: xscvdpsxws f3, f7
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f8
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xscvspdpn f9, vs9
>> -; CHECK-P8-NEXT: xscvdpsxws f6, f6
>> -; CHECK-P8-NEXT: xscvspdpn f12, vs12
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> +; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: vmrghh v0, v0, v7
>> +; CHECK-P8-NEXT: mtvsrd v7, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: xscvspdpn f11, vs11
>> -; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xscvspdpn v2, v2
>> -; CHECK-P8-NEXT: xscvdpsxws f8, f8
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT: mffprwz r6, f2
>> -; CHECK-P8-NEXT: xscvspdpn f13, vs13
>> -; CHECK-P8-NEXT: xscvspdpn v3, v3
>> -; CHECK-P8-NEXT: xscvdpsxws f10, f10
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> +; CHECK-P8-NEXT: vmrghh v4, v8, v4
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT: mtfprd f2, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f6
>> -; CHECK-P8-NEXT: xscvdpsxws f12, f12
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f6, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f3
>> -; CHECK-P8-NEXT: xscvdpsxws v2, v2
>> -; CHECK-P8-NEXT: xxswapd v9, vs6
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f8
>> -; CHECK-P8-NEXT: mtfprd f3, r6
>> -; CHECK-P8-NEXT: xxswapd v0, vs5
>> -; CHECK-P8-NEXT: mffprwz r6, f7
>> -; CHECK-P8-NEXT: xscvdpsxws f13, f13
>> -; CHECK-P8-NEXT: xxswapd v5, vs3
>> -; CHECK-P8-NEXT: xscvdpsxws v3, v3
>> -; CHECK-P8-NEXT: mtfprd f8, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f10
>> -; CHECK-P8-NEXT: mtfprd f7, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f9
>> -; CHECK-P8-NEXT: mtfprd f10, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f12
>> -; CHECK-P8-NEXT: mtfprd f9, r6
>> -; CHECK-P8-NEXT: xxswapd v6, vs10
>> -; CHECK-P8-NEXT: mffprwz r6, f11
>> -; CHECK-P8-NEXT: mtfprd f12, r4
>> -; CHECK-P8-NEXT: xxswapd v1, vs9
>> -; CHECK-P8-NEXT: mfvsrwz r4, v2
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: mtfprd f11, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f13
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: xxswapd v7, vs11
>> -; CHECK-P8-NEXT: mfvsrwz r4, v3
>> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
>> -; CHECK-P8-NEXT: xxswapd v4, vs7
>> -; CHECK-P8-NEXT: vmrglh v2, v2, v0
>> -; CHECK-P8-NEXT: xxswapd v5, vs8
>> -; CHECK-P8-NEXT: xxswapd v0, vs2
>> -; CHECK-P8-NEXT: mtfprd f13, r6
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> -; CHECK-P8-NEXT: vmrglh v4, v5, v4
>> -; CHECK-P8-NEXT: vmrglh v5, v0, v1
>> -; CHECK-P8-NEXT: xxswapd v1, vs4
>> -; CHECK-P8-NEXT: vmrglh v0, v7, v6
>> -; CHECK-P8-NEXT: xxswapd v6, vs12
>> -; CHECK-P8-NEXT: xxswapd v7, vs13
>> -; CHECK-P8-NEXT: xxswapd v10, vs1
>> +; CHECK-P8-NEXT: vmrghh v1, v1, v9
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> +; CHECK-P8-NEXT: vmrghh v7, v8, v7
>> +; CHECK-P8-NEXT: vmrghh v6, v6, v9
>> ; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> -; CHECK-P8-NEXT: vmrglh v1, v1, v6
>> -; CHECK-P8-NEXT: vmrglh v6, v8, v7
>> -; CHECK-P8-NEXT: vmrglh v7, v9, v10
>> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> -; CHECK-P8-NEXT: vmrglw v4, v1, v0
>> -; CHECK-P8-NEXT: vmrglw v5, v7, v6
>> +; CHECK-P8-NEXT: vmrglw v3, v0, v5
>> +; CHECK-P8-NEXT: vmrglw v4, v1, v4
>> +; CHECK-P8-NEXT: vmrglw v5, v6, v7
>> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> ; CHECK-P8-NEXT: stvx v2, 0, r3
>> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4
>> @@ -1189,118 +1085,102 @@ define void @test16elt_signed(<16 x i16>*
>> noalias nocapture sret %agg.result, <1
>> ;
>> ; CHECK-P9-LABEL: test16elt_signed:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: lxv vs1, 0(r4)
>> -; CHECK-P9-NEXT: lxv vs3, 16(r4)
>> -; CHECK-P9-NEXT: xscvspdpn f5, vs1
>> -; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3
>> -; CHECK-P9-NEXT: xscvspdpn f8, vs3
>> -; CHECK-P9-NEXT: xxswapd vs4, vs1
>> -; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT: lxv vs2, 0(r4)
>> +; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3
>> +; CHECK-P9-NEXT: xxswapd vs4, vs2
>> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> ; CHECK-P9-NEXT: xscvspdpn f4, vs4
>> -; CHECK-P9-NEXT: xscvdpsxws f5, f5
>> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: xscvspdpn f5, vs2
>> +; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT: xscvdpsxws f8, f8
>> -; CHECK-P9-NEXT: xxsldwi vs6, vs3, vs3, 3
>> -; CHECK-P9-NEXT: xxswapd vs7, vs3
>> -; CHECK-P9-NEXT: xscvspdpn f6, vs6
>> -; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1
>> -; CHECK-P9-NEXT: xscvspdpn f7, vs7
>> -; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: mffprwz r5, f3
>> +; CHECK-P9-NEXT: lxv vs1, 16(r4)
>> +; CHECK-P9-NEXT: xxsldwi vs6, vs1, vs1, 3
>> +; CHECK-P9-NEXT: xxswapd vs3, vs1
>> +; CHECK-P9-NEXT: mtvsrd v2, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f4
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f5
>> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT: mtvsrd v3, r5
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT: mffprwz r5, f4
>> +; CHECK-P9-NEXT: xscvspdpn f4, vs6
>> +; CHECK-P9-NEXT: mtvsrd v3, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f2
>> +; CHECK-P9-NEXT: xscvspdpn f2, vs1
>> +; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P9-NEXT: xscvdpsxws f6, f6
>> -; CHECK-P9-NEXT: mffprwz r5, f5
>> -; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: xscvdpsxws f7, f7
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT: mtfprd f5, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f8
>> -; CHECK-P9-NEXT: mtfprd f8, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f2
>> ; CHECK-P9-NEXT: lxv vs0, 32(r4)
>> -; CHECK-P9-NEXT: xxsldwi vs9, vs0, vs0, 3
>> -; CHECK-P9-NEXT: xxswapd vs10, vs0
>> -; CHECK-P9-NEXT: xscvspdpn f9, vs9
>> -; CHECK-P9-NEXT: xscvspdpn f10, vs10
>> -; CHECK-P9-NEXT: xscvdpsxws f9, f9
>> -; CHECK-P9-NEXT: xscvdpsxws f10, f10
>> -; CHECK-P9-NEXT: mtfprd f2, r5
>> +; CHECK-P9-NEXT: mtvsrd v4, r5
>> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r5, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r5
>> +; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: mtvsrd v4, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f3
>> +; CHECK-P9-NEXT: xxsldwi vs3, vs0, vs0, 3
>> +; CHECK-P9-NEXT: mtvsrd v5, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f2
>> +; CHECK-P9-NEXT: xscvspdpn f2, vs3
>> +; CHECK-P9-NEXT: vmrghh v4, v5, v4
>> +; CHECK-P9-NEXT: mtvsrd v5, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f6
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> -; CHECK-P9-NEXT: xxswapd v3, vs4
>> +; CHECK-P9-NEXT: xxswapd vs1, vs0
>> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> +; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: vmrghh v5, v5, v0
>> +; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrglw v3, v5, v4
>> +; CHECK-P9-NEXT: mffprwz r5, f2
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: mtfprd f6, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f7
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> +; CHECK-P9-NEXT: mffprwz r5, f1
>> ; CHECK-P9-NEXT: lxv vs1, 48(r4)
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs5
>> -; CHECK-P9-NEXT: mtfprd f7, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f3
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs6
>> -; CHECK-P9-NEXT: xxswapd v5, vs7
>> -; CHECK-P9-NEXT: mtfprd f3, r5
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT: xxswapd v0, vs3
>> -; CHECK-P9-NEXT: vmrglh v4, v5, v4
>> -; CHECK-P9-NEXT: xxswapd v5, vs8
>> -; CHECK-P9-NEXT: vmrglh v5, v5, v0
>> +; CHECK-P9-NEXT: mtvsrd v1, r5
>> +; CHECK-P9-NEXT: vmrghh v0, v1, v0
>> ; CHECK-P9-NEXT: mffprwz r4, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r4
>> -; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: vmrglw v3, v5, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> ; CHECK-P9-NEXT: xxmrgld vs2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> +; CHECK-P9-NEXT: mffprwz r4, f0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 3
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P9-NEXT: vmrghh v2, v4, v2
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrglw v2, v2, v0
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs1
>> +; CHECK-P9-NEXT: mtvsrd v3, r4
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: vmrglh v2, v4, v2
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs1
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs1, vs1, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT: mffprwz r5, f9
>> -; CHECK-P9-NEXT: mtfprd f9, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f10
>> -; CHECK-P9-NEXT: mtfprd f10, r5
>> -; CHECK-P9-NEXT: xxswapd v0, vs9
>> -; CHECK-P9-NEXT: xxswapd v1, vs10
>> -; CHECK-P9-NEXT: vmrglh v0, v1, v0
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v0
>> -; CHECK-P9-NEXT: stxv vs2, 0(r3)
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r4
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2
>> ; CHECK-P9-NEXT: stxv vs0, 16(r3)
>> +; CHECK-P9-NEXT: stxv vs2, 0(r3)
>> ; CHECK-P9-NEXT: blr
>> ;
>> ; CHECK-BE-LABEL: test16elt_signed:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> index 1f95eda2b1b5..928a19f3a55c 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> @@ -20,12 +20,10 @@ define i16 @test2elt(i64 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: clrldi r3, r3, 48
>> @@ -43,13 +41,11 @@ define i16 @test2elt(i64 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: addi r3, r1, -2
>> -; CHECK-P9-NEXT: xxswapd v2, vs1
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8
>> ; CHECK-P9-NEXT: stxsihx v2, 0, r3
>> ; CHECK-P9-NEXT: lhz r3, -2(r1)
>> @@ -97,20 +93,16 @@ define i32 @test4elt(<4 x float> %a)
>> local_unnamed_addr #1 {
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: mtfprd f3, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs2
>> -; CHECK-P8-NEXT: xxswapd v5, vs3
>> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglb v3, v4, v5
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mtvsrd v5, r3
>> +; CHECK-P8-NEXT: vmrghb v3, v4, v3
>> +; CHECK-P8-NEXT: vmrghb v2, v2, v5
>> +; CHECK-P8-NEXT: vmrglh v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -121,28 +113,24 @@ define i32 @test4elt(<4 x float> %a)
>> local_unnamed_addr #1 {
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xscvspdpn f0, v2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: li r3, 0
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglb v2, v4, v2
>> +; CHECK-P9-NEXT: vmrghb v2, v4, v2
>> ; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -189,59 +177,51 @@ define i64 @test8elt(<8 x float>* nocapture
>> readonly) local_unnamed_addr #2 {
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: lvx v2, 0, r3
>> ; CHECK-P8-NEXT: li r4, 16
>> -; CHECK-P8-NEXT: lvx v5, r3, r4
>> -; CHECK-P8-NEXT: xxswapd vs1, v2
>> +; CHECK-P8-NEXT: lvx v3, r3, r4
>> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3
>> -; CHECK-P8-NEXT: xscvspdpn f4, v5
>> -; CHECK-P8-NEXT: xxswapd vs3, v5
>> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xxswapd vs1, v2
>> +; CHECK-P8-NEXT: xscvspdpn f2, v2
>> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1
>> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
>> +; CHECK-P8-NEXT: xscvspdpn f3, v3
>> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
>> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: mffprwz r6, f1
>> -; CHECK-P8-NEXT: mffprwz r5, f0
>> -; CHECK-P8-NEXT: mtfprd f1, r6
>> -; CHECK-P8-NEXT: mtfprd f0, r5
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: xscvspdpn f0, v2
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v1, vs4
>> -; CHECK-P8-NEXT: vmrglb v2, v4, v3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: xxswapd v5, vs2
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: xxswapd vs0, v3
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: mffprwz r3, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f4
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f5
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghb v2, v4, v2
>> +; CHECK-P8-NEXT: mffprwz r4, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v6, vs3
>> -; CHECK-P8-NEXT: xxswapd v0, vs0
>> -; CHECK-P8-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P8-NEXT: vmrglb v4, v0, v5
>> -; CHECK-P8-NEXT: vmrglb v5, v1, v6
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: vmrghb v3, v3, v4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mtvsrd v5, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: vmrghb v5, v0, v5
>> +; CHECK-P8-NEXT: mtvsrd v1, r3
>> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
>> +; CHECK-P8-NEXT: vmrghb v4, v4, v1
>> +; CHECK-P8-NEXT: vmrglh v3, v4, v5
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> @@ -255,53 +235,45 @@ define i64 @test8elt(<8 x float>* nocapture
>> readonly) local_unnamed_addr #2 {
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs1
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs0
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> @@ -376,117 +348,101 @@ entry:
>> define <16 x i8> @test16elt(<16 x float>* nocapture readonly)
>> local_unnamed_addr #3 {
>> ; CHECK-P8-LABEL: test16elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: lvx v2, 0, r3
>> +; CHECK-P8-NEXT: lvx v4, 0, r3
>> ; CHECK-P8-NEXT: li r4, 16
>> +; CHECK-P8-NEXT: li r5, 32
>> ; CHECK-P8-NEXT: lvx v3, r3, r4
>> -; CHECK-P8-NEXT: li r4, 32
>> -; CHECK-P8-NEXT: xscvspdpn f2, v2
>> -; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT: xscvspdpn f4, v3
>> -; CHECK-P8-NEXT: xxswapd vs1, v2
>> -; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 1
>> -; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
>> -; CHECK-P8-NEXT: lvx v2, r3, r4
>> +; CHECK-P8-NEXT: lvx v2, r3, r5
>> +; CHECK-P8-NEXT: xxsldwi vs0, v4, v4, 3
>> +; CHECK-P8-NEXT: xxswapd vs2, v4
>> +; CHECK-P8-NEXT: xxsldwi vs4, v4, v4, 1
>> +; CHECK-P8-NEXT: xscvspdpn f1, v4
>> +; CHECK-P8-NEXT: xscvspdpn f3, v3
>> +; CHECK-P8-NEXT: xxsldwi vs6, v3, v3, 3
>> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT: xxswapd vs6, v3
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 1
>> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT: xxsldwi vs8, v2, v2, 3
>> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT: xxswapd vs9, v2
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: xxswapd vs7, v3
>> +; CHECK-P8-NEXT: xscvspdpn f2, vs2
>> +; CHECK-P8-NEXT: xxsldwi vs8, v3, v3, 1
>> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
>> +; CHECK-P8-NEXT: xxsldwi vs9, v2, v2, 3
>> ; CHECK-P8-NEXT: xscvspdpn f6, vs6
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: mffprwz r4, f2
>> ; CHECK-P8-NEXT: xscvspdpn f7, vs7
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> ; CHECK-P8-NEXT: xscvspdpn f8, vs8
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f5
>> -; CHECK-P8-NEXT: xxswapd v0, vs4
>> +; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> ; CHECK-P8-NEXT: xscvspdpn f9, vs9
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: xxswapd vs0, v2
>> +; CHECK-P8-NEXT: mffprwz r5, f2
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: mtvsrd v4, r5
>> +; CHECK-P8-NEXT: mffprwz r5, f4
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f6
>> -; CHECK-P8-NEXT: xxswapd v3, vs5
>> -; CHECK-P8-NEXT: mtfprd f6, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: vmrghb v3, v4, v3
>> +; CHECK-P8-NEXT: mtvsrd v4, r5
>> +; CHECK-P8-NEXT: mffprwz r5, f3
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f7
>> -; CHECK-P8-NEXT: xxswapd v4, vs6
>> -; CHECK-P8-NEXT: mtfprd f7, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f8
>> -; CHECK-P8-NEXT: xxswapd v5, vs7
>> -; CHECK-P8-NEXT: mtfprd f8, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: xscvdpsxws f1, f9
>> -; CHECK-P8-NEXT: xxswapd v1, vs8
>> -; CHECK-P8-NEXT: mtfprd f9, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P8-NEXT: xxswapd v4, vs2
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: xxswapd v6, vs9
>> -; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: xscvspdpn f0, v2
>> -; CHECK-P8-NEXT: xxswapd v7, vs3
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: vmrglb v4, v4, v5
>> -; CHECK-P8-NEXT: xxswapd v5, vs5
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f8
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> ; CHECK-P8-NEXT: li r4, 48
>> -; CHECK-P8-NEXT: lvx v9, r3, r4
>> -; CHECK-P8-NEXT: vmrglb v1, v6, v1
>> -; CHECK-P8-NEXT: xxswapd v8, vs1
>> +; CHECK-P8-NEXT: lvx v0, r3, r4
>> +; CHECK-P8-NEXT: mffprwz r3, f1
>> ; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT: xxsldwi vs2, v9, v9, 3
>> -; CHECK-P8-NEXT: xscvspdpn f4, v9
>> -; CHECK-P8-NEXT: xxswapd vs3, v9
>> -; CHECK-P8-NEXT: xxsldwi vs5, v9, v9, 1
>> +; CHECK-P8-NEXT: xscvspdpn f5, v2
>> +; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: xxsldwi vs3, v0, v0, 3
>> +; CHECK-P8-NEXT: mtvsrd v1, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> +; CHECK-P8-NEXT: xxswapd vs4, v0
>> ; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
>> +; CHECK-P8-NEXT: mtvsrd v7, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f0
>> +; CHECK-P8-NEXT: xxsldwi vs0, v0, v0, 1
>> +; CHECK-P8-NEXT: xscvspdpn f2, v0
>> ; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P8-NEXT: xscvdpsxws f6, f9
>> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f6
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghb v2, v6, v1
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f5
>> +; CHECK-P8-NEXT: mtvsrd v6, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: vmrghb v4, v5, v4
>> +; CHECK-P8-NEXT: mtvsrd v5, r5
>> +; CHECK-P8-NEXT: vmrghb v0, v6, v1
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v9, vs4
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: mtvsrd v6, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> -; CHECK-P8-NEXT: xxswapd v6, vs1
>> -; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: vmrglb v2, v0, v7
>> -; CHECK-P8-NEXT: xxswapd v0, vs0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v7, vs2
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: vmrglb v5, v8, v5
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> -; CHECK-P8-NEXT: xxswapd v10, vs3
>> -; CHECK-P8-NEXT: vmrglb v0, v0, v6
>> +; CHECK-P8-NEXT: vmrghb v5, v5, v7
>> +; CHECK-P8-NEXT: vmrghb v1, v1, v6
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f4
>> +; CHECK-P8-NEXT: mtvsrd v7, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f0
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mtvsrd v9, r3
>> +; CHECK-P8-NEXT: vmrghb v7, v8, v7
>> +; CHECK-P8-NEXT: vmrghb v6, v6, v9
>> ; CHECK-P8-NEXT: vmrglh v3, v4, v3
>> -; CHECK-P8-NEXT: vmrglb v6, v8, v7
>> -; CHECK-P8-NEXT: vmrglb v7, v9, v10
>> -; CHECK-P8-NEXT: vmrglh v2, v2, v1
>> -; CHECK-P8-NEXT: vmrglh v4, v0, v5
>> -; CHECK-P8-NEXT: vmrglh v5, v7, v6
>> +; CHECK-P8-NEXT: vmrglh v2, v5, v2
>> +; CHECK-P8-NEXT: vmrglh v4, v1, v0
>> +; CHECK-P8-NEXT: vmrglh v5, v6, v7
>> ; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> ; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> @@ -494,114 +450,98 @@ define <16 x i8> @test16elt(<16 x float>*
>> nocapture readonly) local_unnamed_addr
>> ;
>> ; CHECK-P9-LABEL: test16elt:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: lxv vs2, 0(r3)
>> +; CHECK-P9-NEXT: lxv vs3, 0(r3)
>> +; CHECK-P9-NEXT: xxsldwi vs4, vs3, vs3, 3
>> +; CHECK-P9-NEXT: xscvspdpn f4, vs4
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: lxv vs0, 48(r3)
>> +; CHECK-P9-NEXT: lxv vs1, 32(r3)
>> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> +; CHECK-P9-NEXT: mffprwz r3, f4
>> +; CHECK-P9-NEXT: xxswapd vs4, vs3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> +; CHECK-P9-NEXT: xscvspdpn f4, vs4
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: mffprwz r3, f4
>> +; CHECK-P9-NEXT: xscvspdpn f4, vs3
>> +; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: mffprwz r3, f4
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: mffprwz r3, f3
>> ; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT: lxv vs0, 48(r3)
>> -; CHECK-P9-NEXT: lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT: lxv vs4, 16(r3)
>> +; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs3
>> ; CHECK-P9-NEXT: xxswapd vs3, vs2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> ; CHECK-P9-NEXT: xscvspdpn f3, vs2
>> ; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 3
>> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: xxswapd vs2, vs4
>> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: xscvspdpn f2, vs4
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 1
>> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs2
>> ; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghb v4, v5, v4
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v4, v5, v4
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> -; CHECK-P9-NEXT: xxswapd v0, vs0
>> -; CHECK-P9-NEXT: vmrglb v5, v5, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r3
>> +; CHECK-P9-NEXT: vmrghb v5, v5, v0
>> ; CHECK-P9-NEXT: vmrglh v4, v5, v4
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>> @@ -738,12 +678,10 @@ define i16 @test2elt_signed(i64 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: clrldi r3, r3, 48
>> @@ -761,13 +699,11 @@ define i16 @test2elt_signed(i64 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: addi r3, r1, -2
>> -; CHECK-P9-NEXT: xxswapd v2, vs1
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8
>> ; CHECK-P9-NEXT: stxsihx v2, 0, r3
>> ; CHECK-P9-NEXT: lhz r3, -2(r1)
>> @@ -815,20 +751,16 @@ define i32 @test4elt_signed(<4 x float> %a)
>> local_unnamed_addr #1 {
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: mtfprd f3, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs2
>> -; CHECK-P8-NEXT: xxswapd v5, vs3
>> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglb v3, v4, v5
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mtvsrd v5, r3
>> +; CHECK-P8-NEXT: vmrghb v3, v4, v3
>> +; CHECK-P8-NEXT: vmrghb v2, v2, v5
>> +; CHECK-P8-NEXT: vmrglh v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -839,28 +771,24 @@ define i32 @test4elt_signed(<4 x float> %a)
>> local_unnamed_addr #1 {
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xscvspdpn f0, v2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, v2, v2, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: li r3, 0
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglb v2, v4, v2
>> +; CHECK-P9-NEXT: vmrghb v2, v4, v2
>> ; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -907,59 +835,51 @@ define i64 @test8elt_signed(<8 x float>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: lvx v2, 0, r3
>> ; CHECK-P8-NEXT: li r4, 16
>> -; CHECK-P8-NEXT: lvx v5, r3, r4
>> -; CHECK-P8-NEXT: xxswapd vs1, v2
>> +; CHECK-P8-NEXT: lvx v3, r3, r4
>> ; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT: xxsldwi vs2, v5, v5, 3
>> -; CHECK-P8-NEXT: xscvspdpn f4, v5
>> -; CHECK-P8-NEXT: xxswapd vs3, v5
>> -; CHECK-P8-NEXT: xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xxswapd vs1, v2
>> +; CHECK-P8-NEXT: xscvspdpn f2, v2
>> +; CHECK-P8-NEXT: xxsldwi vs4, v2, v2, 1
>> +; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
>> +; CHECK-P8-NEXT: xscvspdpn f3, v3
>> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
>> ; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: mffprwz r6, f1
>> -; CHECK-P8-NEXT: mffprwz r5, f0
>> -; CHECK-P8-NEXT: mtfprd f1, r6
>> -; CHECK-P8-NEXT: mtfprd f0, r5
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: xscvspdpn f0, v2
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v1, vs4
>> -; CHECK-P8-NEXT: vmrglb v2, v4, v3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: xxswapd v5, vs2
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mffprwz r3, f1
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: xxswapd vs0, v3
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: xxsldwi vs1, v3, v3, 1
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: mffprwz r3, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f4
>> +; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f5
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghb v2, v4, v2
>> +; CHECK-P8-NEXT: mffprwz r4, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: xxswapd v4, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v6, vs3
>> -; CHECK-P8-NEXT: xxswapd v0, vs0
>> -; CHECK-P8-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P8-NEXT: vmrglb v4, v0, v5
>> -; CHECK-P8-NEXT: vmrglb v5, v1, v6
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: vmrghb v3, v3, v4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> +; CHECK-P8-NEXT: mtvsrd v5, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: vmrghb v5, v0, v5
>> +; CHECK-P8-NEXT: mtvsrd v1, r3
>> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
>> +; CHECK-P8-NEXT: vmrghb v4, v4, v1
>> +; CHECK-P8-NEXT: vmrglh v3, v4, v5
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> @@ -973,53 +893,45 @@ define i64 @test8elt_signed(<8 x float>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs1
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs0
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> @@ -1094,117 +1006,101 @@ entry:
>> define <16 x i8> @test16elt_signed(<16 x float>* nocapture readonly)
>> local_unnamed_addr #3 {
>> ; CHECK-P8-LABEL: test16elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: lvx v2, 0, r3
>> +; CHECK-P8-NEXT: lvx v4, 0, r3
>> ; CHECK-P8-NEXT: li r4, 16
>> +; CHECK-P8-NEXT: li r5, 32
>> ; CHECK-P8-NEXT: lvx v3, r3, r4
>> -; CHECK-P8-NEXT: li r4, 32
>> -; CHECK-P8-NEXT: xscvspdpn f2, v2
>> -; CHECK-P8-NEXT: xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT: xscvspdpn f4, v3
>> -; CHECK-P8-NEXT: xxswapd vs1, v2
>> -; CHECK-P8-NEXT: xxsldwi vs3, v2, v2, 1
>> -; CHECK-P8-NEXT: xxsldwi vs5, v3, v3, 3
>> -; CHECK-P8-NEXT: lvx v2, r3, r4
>> +; CHECK-P8-NEXT: lvx v2, r3, r5
>> +; CHECK-P8-NEXT: xxsldwi vs0, v4, v4, 3
>> +; CHECK-P8-NEXT: xxswapd vs2, v4
>> +; CHECK-P8-NEXT: xxsldwi vs4, v4, v4, 1
>> +; CHECK-P8-NEXT: xscvspdpn f1, v4
>> +; CHECK-P8-NEXT: xscvspdpn f3, v3
>> +; CHECK-P8-NEXT: xxsldwi vs6, v3, v3, 3
>> ; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT: xxswapd vs6, v3
>> -; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: xxsldwi vs7, v3, v3, 1
>> -; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT: xxsldwi vs8, v2, v2, 3
>> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT: xxswapd vs9, v2
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: xxswapd vs7, v3
>> +; CHECK-P8-NEXT: xscvspdpn f2, vs2
>> +; CHECK-P8-NEXT: xxsldwi vs8, v3, v3, 1
>> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
>> +; CHECK-P8-NEXT: xxsldwi vs9, v2, v2, 3
>> ; CHECK-P8-NEXT: xscvspdpn f6, vs6
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: mffprwz r4, f2
>> ; CHECK-P8-NEXT: xscvspdpn f7, vs7
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> ; CHECK-P8-NEXT: xscvspdpn f8, vs8
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f5
>> -; CHECK-P8-NEXT: xxswapd v0, vs4
>> +; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> ; CHECK-P8-NEXT: xscvspdpn f9, vs9
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: xxswapd vs0, v2
>> +; CHECK-P8-NEXT: mffprwz r5, f2
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> +; CHECK-P8-NEXT: mtvsrd v4, r5
>> +; CHECK-P8-NEXT: mffprwz r5, f4
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f6
>> -; CHECK-P8-NEXT: xxswapd v3, vs5
>> -; CHECK-P8-NEXT: mtfprd f6, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: vmrghb v3, v4, v3
>> +; CHECK-P8-NEXT: mtvsrd v4, r5
>> +; CHECK-P8-NEXT: mffprwz r5, f3
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f7
>> -; CHECK-P8-NEXT: xxswapd v4, vs6
>> -; CHECK-P8-NEXT: mtfprd f7, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f8
>> -; CHECK-P8-NEXT: xxswapd v5, vs7
>> -; CHECK-P8-NEXT: mtfprd f8, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: xscvdpsxws f1, f9
>> -; CHECK-P8-NEXT: xxswapd v1, vs8
>> -; CHECK-P8-NEXT: mtfprd f9, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P8-NEXT: xxswapd v4, vs2
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: xxswapd v6, vs9
>> -; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: xscvspdpn f0, v2
>> -; CHECK-P8-NEXT: xxswapd v7, vs3
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: vmrglb v4, v4, v5
>> -; CHECK-P8-NEXT: xxswapd v5, vs5
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f8
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> ; CHECK-P8-NEXT: li r4, 48
>> -; CHECK-P8-NEXT: lvx v9, r3, r4
>> -; CHECK-P8-NEXT: vmrglb v1, v6, v1
>> -; CHECK-P8-NEXT: xxswapd v8, vs1
>> +; CHECK-P8-NEXT: lvx v0, r3, r4
>> +; CHECK-P8-NEXT: mffprwz r3, f1
>> ; CHECK-P8-NEXT: xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT: xxsldwi vs2, v9, v9, 3
>> -; CHECK-P8-NEXT: xscvspdpn f4, v9
>> -; CHECK-P8-NEXT: xxswapd vs3, v9
>> -; CHECK-P8-NEXT: xxsldwi vs5, v9, v9, 1
>> +; CHECK-P8-NEXT: xscvspdpn f5, v2
>> +; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: xxsldwi vs3, v0, v0, 3
>> +; CHECK-P8-NEXT: mtvsrd v1, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> +; CHECK-P8-NEXT: xxswapd vs4, v0
>> ; CHECK-P8-NEXT: xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT: xscvspdpn f2, vs2
>> +; CHECK-P8-NEXT: mtvsrd v7, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f0
>> +; CHECK-P8-NEXT: xxsldwi vs0, v0, v0, 1
>> +; CHECK-P8-NEXT: xscvspdpn f2, v0
>> ; CHECK-P8-NEXT: xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT: xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P8-NEXT: xscvdpsxws f6, f9
>> +; CHECK-P8-NEXT: xscvspdpn f4, vs4
>> +; CHECK-P8-NEXT: xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f6
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghb v2, v6, v1
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f5
>> +; CHECK-P8-NEXT: mtvsrd v6, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: vmrghb v4, v5, v4
>> +; CHECK-P8-NEXT: mtvsrd v5, r5
>> +; CHECK-P8-NEXT: vmrghb v0, v6, v1
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: xxswapd v9, vs4
>> -; CHECK-P8-NEXT: mtfprd f1, r3
>> +; CHECK-P8-NEXT: mtvsrd v6, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> -; CHECK-P8-NEXT: xxswapd v6, vs1
>> -; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: vmrglb v2, v0, v7
>> -; CHECK-P8-NEXT: xxswapd v0, vs0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v7, vs2
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: vmrglb v5, v8, v5
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> -; CHECK-P8-NEXT: xxswapd v10, vs3
>> -; CHECK-P8-NEXT: vmrglb v0, v0, v6
>> +; CHECK-P8-NEXT: vmrghb v5, v5, v7
>> +; CHECK-P8-NEXT: vmrghb v1, v1, v6
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f4
>> +; CHECK-P8-NEXT: mtvsrd v7, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f0
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mtvsrd v9, r3
>> +; CHECK-P8-NEXT: vmrghb v7, v8, v7
>> +; CHECK-P8-NEXT: vmrghb v6, v6, v9
>> ; CHECK-P8-NEXT: vmrglh v3, v4, v3
>> -; CHECK-P8-NEXT: vmrglb v6, v8, v7
>> -; CHECK-P8-NEXT: vmrglb v7, v9, v10
>> -; CHECK-P8-NEXT: vmrglh v2, v2, v1
>> -; CHECK-P8-NEXT: vmrglh v4, v0, v5
>> -; CHECK-P8-NEXT: vmrglh v5, v7, v6
>> +; CHECK-P8-NEXT: vmrglh v2, v5, v2
>> +; CHECK-P8-NEXT: vmrglh v4, v1, v0
>> +; CHECK-P8-NEXT: vmrglh v5, v6, v7
>> ; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> ; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> @@ -1212,114 +1108,98 @@ define <16 x i8> @test16elt_signed(<16 x float>*
>> nocapture readonly) local_unnam
>> ;
>> ; CHECK-P9-LABEL: test16elt_signed:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: lxv vs2, 0(r3)
>> +; CHECK-P9-NEXT: lxv vs3, 0(r3)
>> +; CHECK-P9-NEXT: xxsldwi vs4, vs3, vs3, 3
>> +; CHECK-P9-NEXT: xscvspdpn f4, vs4
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: lxv vs0, 48(r3)
>> +; CHECK-P9-NEXT: lxv vs1, 32(r3)
>> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> +; CHECK-P9-NEXT: mffprwz r3, f4
>> +; CHECK-P9-NEXT: xxswapd vs4, vs3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> +; CHECK-P9-NEXT: xscvspdpn f4, vs4
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: mffprwz r3, f4
>> +; CHECK-P9-NEXT: xscvspdpn f4, vs3
>> +; CHECK-P9-NEXT: xxsldwi vs3, vs3, vs3, 1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: mffprwz r3, f4
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: mffprwz r3, f3
>> ; CHECK-P9-NEXT: xxsldwi vs3, vs2, vs2, 3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT: lxv vs0, 48(r3)
>> -; CHECK-P9-NEXT: lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT: lxv vs4, 16(r3)
>> +; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs3
>> ; CHECK-P9-NEXT: xxswapd vs3, vs2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> ; CHECK-P9-NEXT: xscvspdpn f3, vs2
>> ; CHECK-P9-NEXT: xxsldwi vs2, vs2, vs2, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 3
>> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: xxswapd vs2, vs4
>> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: xscvspdpn f2, vs4
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: xxsldwi vs2, vs4, vs4, 1
>> -; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs2
>> ; CHECK-P9-NEXT: xxsldwi vs2, vs1, vs1, 3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> ; CHECK-P9-NEXT: xscvspdpn f2, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghb v3, v4, v3
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> ; CHECK-P9-NEXT: xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> ; CHECK-P9-NEXT: xscvspdpn f1, vs0
>> ; CHECK-P9-NEXT: xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvspdpn f0, vs0
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghb v4, v5, v4
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v4, v5, v4
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> -; CHECK-P9-NEXT: xxswapd v0, vs0
>> -; CHECK-P9-NEXT: vmrglb v5, v5, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r3
>> +; CHECK-P9-NEXT: vmrghb v5, v5, v0
>> ; CHECK-P9-NEXT: vmrglh v4, v5, v4
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> index c7d66ae784a0..dbc2774fed8c 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> @@ -16,12 +16,10 @@ define i32 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: vmrglh v2, v2, v3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -30,15 +28,13 @@ define i32 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P9: # %bb.0: # %entry
>> ; CHECK-P9-NEXT: xscvdpsxws f0, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: li r3, 0
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -77,18 +73,14 @@ define i64 @test4elt(<4 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: xxswapd v2, vs2
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: xxswapd v4, vs3
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: xxswapd v5, vs1
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> +; CHECK-P8-NEXT: vmrghh v2, v4, v2
>> +; CHECK-P8-NEXT: vmrghh v3, v5, v3
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> @@ -102,22 +94,18 @@ define i64 @test4elt(<4 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -176,36 +164,28 @@ define <8 x i16> @test8elt(<8 x double>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P8-NEXT: xxswapd vs3, vs3
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: mtfprd f4, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f6
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs4
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f7
>> -; CHECK-P8-NEXT: mtfprd f6, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs5
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mtfprd f7, r4
>> -; CHECK-P8-NEXT: xxswapd v4, vs6
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v1, vs7
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v5, vs0
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: xxswapd v0, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: xxswapd v6, vs2
>> -; CHECK-P8-NEXT: vmrglh v2, v5, v2
>> -; CHECK-P8-NEXT: xxswapd v5, vs0
>> -; CHECK-P8-NEXT: vmrglh v3, v0, v3
>> -; CHECK-P8-NEXT: vmrglh v4, v6, v4
>> -; CHECK-P8-NEXT: vmrglh v5, v5, v1
>> +; CHECK-P8-NEXT: vmrghh v2, v0, v2
>> +; CHECK-P8-NEXT: vmrghh v3, v1, v3
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: vmrghh v4, v0, v4
>> +; CHECK-P8-NEXT: vmrghh v5, v1, v5
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> @@ -217,47 +197,39 @@ define <8 x i16> @test8elt(<8 x double>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> ; CHECK-P9-NEXT: xxswapd vs3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> ; CHECK-P9-NEXT: lxv vs0, 48(r3)
>> ; CHECK-P9-NEXT: lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: xxswapd v2, vs4
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: mffprwz r3, f1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -321,209 +293,177 @@ entry:
>> define void @test16elt(<16 x i16>* noalias nocapture sret %agg.result,
>> <16 x double>* nocapture readonly) local_unnamed_addr #3 {
>> ; CHECK-P8-LABEL: test16elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
>> ; CHECK-P8-NEXT: li r5, 16
>> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
>> ; CHECK-P8-NEXT: li r6, 32
>> +; CHECK-P8-NEXT: li r7, 48
>> ; CHECK-P8-NEXT: lxvd2x vs1, r4, r5
>> ; CHECK-P8-NEXT: lxvd2x vs2, r4, r6
>> -; CHECK-P8-NEXT: li r6, 48
>> -; CHECK-P8-NEXT: lxvd2x vs3, r4, r6
>> ; CHECK-P8-NEXT: li r6, 64
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f0
>> +; CHECK-P8-NEXT: lxvd2x vs3, r4, r7
>> ; CHECK-P8-NEXT: lxvd2x vs5, r4, r6
>> -; CHECK-P8-NEXT: li r6, 80
>> +; CHECK-P8-NEXT: li r7, 80
>> +; CHECK-P8-NEXT: li r6, 96
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f0
>> +; CHECK-P8-NEXT: lxvd2x vs7, r4, r7
>> +; CHECK-P8-NEXT: lxvd2x vs10, r4, r6
>> +; CHECK-P8-NEXT: li r6, 112
>> ; CHECK-P8-NEXT: xxswapd vs0, vs0
>> ; CHECK-P8-NEXT: xscvdpsxws f6, f1
>> -; CHECK-P8-NEXT: lxvd2x vs7, r4, r6
>> -; CHECK-P8-NEXT: li r6, 96
>> ; CHECK-P8-NEXT: xxswapd vs1, vs1
>> ; CHECK-P8-NEXT: xscvdpsxws f8, f2
>> -; CHECK-P8-NEXT: lxvd2x vs9, r4, r6
>> -; CHECK-P8-NEXT: li r6, 112
>> ; CHECK-P8-NEXT: xxswapd vs2, vs2
>> -; CHECK-P8-NEXT: xscvdpsxws f10, f3
>> -; CHECK-P8-NEXT: lxvd2x vs11, r4, r6
>> +; CHECK-P8-NEXT: xscvdpsxws f9, f3
>> ; CHECK-P8-NEXT: xxswapd vs3, vs3
>> -; CHECK-P8-NEXT: xscvdpsxws f12, f5
>> +; CHECK-P8-NEXT: xscvdpsxws f11, f5
>> ; CHECK-P8-NEXT: xxswapd vs5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f13, f7
>> +; CHECK-P8-NEXT: xscvdpsxws f12, f7
>> ; CHECK-P8-NEXT: xxswapd vs7, vs7
>> -; CHECK-P8-NEXT: xscvdpsxws v2, f9
>> -; CHECK-P8-NEXT: xxswapd vs9, vs9
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws v3, f11
>> -; CHECK-P8-NEXT: xxswapd vs11, vs11
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: mffprwz r6, f6
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> +; CHECK-P8-NEXT: mffprwz r7, f4
>> +; CHECK-P8-NEXT: lxvd2x vs4, r4, r6
>> +; CHECK-P8-NEXT: mffprwz r4, f6
>> +; CHECK-P8-NEXT: xscvdpsxws f13, f10
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f8
>> +; CHECK-P8-NEXT: xscvdpsxws f6, f4
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f9
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f11
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xxswapd v4, vs4
>> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT: mtfprd f6, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f10
>> -; CHECK-P8-NEXT: mtfprd f8, r4
>> -; CHECK-P8-NEXT: xxswapd v5, vs6
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f12
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: xxswapd v0, vs8
>> -; CHECK-P8-NEXT: mtfprd f10, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f13
>> -; CHECK-P8-NEXT: mtfprd f12, r4
>> -; CHECK-P8-NEXT: xxswapd v1, vs10
>> -; CHECK-P8-NEXT: mfvsrwz r4, v2
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f13
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xxswapd v6, vs12
>> -; CHECK-P8-NEXT: xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT: mtfprd f13, r6
>> -; CHECK-P8-NEXT: mfvsrwz r6, v3
>> -; CHECK-P8-NEXT: mtvsrd v2, r4
>> -; CHECK-P8-NEXT: xxswapd v7, vs13
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f6
>> +; CHECK-P8-NEXT: xxswapd vs6, vs10
>> +; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> +; CHECK-P8-NEXT: mtvsrd v7, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: xxswapd vs0, vs4
>> +; CHECK-P8-NEXT: mtvsrd v2, r7
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> ; CHECK-P8-NEXT: xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT: xxswapd v2, v2
>> -; CHECK-P8-NEXT: xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT: mtvsrd v3, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f1
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: xxswapd v3, v3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: mtfprd f1, r6
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f6
>> +; CHECK-P8-NEXT: vmrghh v2, v8, v2
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghh v3, v9, v3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: xxswapd v9, vs1
>> -; CHECK-P8-NEXT: mffprwz r6, f3
>> -; CHECK-P8-NEXT: xxswapd v10, vs2
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f9
>> -; CHECK-P8-NEXT: mtfprd f3, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f7
>> -; CHECK-P8-NEXT: mtfprd f9, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f11
>> -; CHECK-P8-NEXT: vmrglh v4, v8, v4
>> -; CHECK-P8-NEXT: xxswapd v8, vs3
>> -; CHECK-P8-NEXT: vmrglh v5, v9, v5
>> -; CHECK-P8-NEXT: xxswapd v9, vs5
>> -; CHECK-P8-NEXT: mtfprd f7, r6
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: vmrglh v0, v10, v0
>> -; CHECK-P8-NEXT: xxswapd v10, vs7
>> -; CHECK-P8-NEXT: vmrglh v1, v8, v1
>> -; CHECK-P8-NEXT: xxswapd v8, vs9
>> -; CHECK-P8-NEXT: vmrglh v6, v9, v6
>> -; CHECK-P8-NEXT: xxswapd v9, vs0
>> -; CHECK-P8-NEXT: vmrglh v7, v10, v7
>> -; CHECK-P8-NEXT: vmrglh v2, v8, v2
>> -; CHECK-P8-NEXT: vmrglh v3, v9, v3
>> -; CHECK-P8-NEXT: vmrglw v4, v5, v4
>> -; CHECK-P8-NEXT: vmrglw v5, v1, v0
>> -; CHECK-P8-NEXT: vmrglw v0, v7, v6
>> +; CHECK-P8-NEXT: vmrghh v4, v8, v4
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f7
>> +; CHECK-P8-NEXT: vmrghh v5, v9, v5
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f4
>> +; CHECK-P8-NEXT: vmrghh v0, v8, v0
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: vmrghh v1, v9, v1
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> +; CHECK-P8-NEXT: vmrghh v6, v8, v6
>> +; CHECK-P8-NEXT: vmrghh v7, v9, v7
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> +; CHECK-P8-NEXT: vmrglw v4, v1, v0
>> +; CHECK-P8-NEXT: vmrglw v5, v7, v6
>> +; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> +; CHECK-P8-NEXT: stvx v2, 0, r3
>> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4
>> -; CHECK-P8-NEXT: stvx v3, 0, r3
>> -; CHECK-P8-NEXT: xxmrgld v2, v2, v0
>> -; CHECK-P8-NEXT: stvx v2, r3, r5
>> +; CHECK-P8-NEXT: stvx v3, r3, r5
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: test16elt:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: lxv vs4, 0(r4)
>> -; CHECK-P9-NEXT: lxv vs3, 16(r4)
>> -; CHECK-P9-NEXT: lxv vs2, 32(r4)
>> -; CHECK-P9-NEXT: xscvdpsxws f5, f4
>> -; CHECK-P9-NEXT: lxv vs1, 48(r4)
>> -; CHECK-P9-NEXT: xscvdpsxws f6, f3
>> -; CHECK-P9-NEXT: lxv vs0, 64(r4)
>> -; CHECK-P9-NEXT: xscvdpsxws f7, f2
>> -; CHECK-P9-NEXT: xscvdpsxws f8, f1
>> -; CHECK-P9-NEXT: xxswapd vs4, vs4
>> -; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P9-NEXT: mffprwz r5, f5
>> -; CHECK-P9-NEXT: xscvdpsxws f9, f0
>> +; CHECK-P9-NEXT: lxv vs3, 0(r4)
>> +; CHECK-P9-NEXT: lxv vs2, 16(r4)
>> +; CHECK-P9-NEXT: lxv vs1, 32(r4)
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> +; CHECK-P9-NEXT: lxv vs0, 48(r4)
>> +; CHECK-P9-NEXT: xscvdpsxws f5, f2
>> +; CHECK-P9-NEXT: xscvdpsxws f6, f1
>> ; CHECK-P9-NEXT: xxswapd vs3, vs3
>> +; CHECK-P9-NEXT: xscvdpsxws f7, f0
>> +; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: mffprwz r5, f4
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT: mtfprd f5, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f6
>> ; CHECK-P9-NEXT: xxswapd vs2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: mtfprd f6, r5
>> +; CHECK-P9-NEXT: mtvsrd v2, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f5
>> +; CHECK-P9-NEXT: mtvsrd v3, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f6
>> +; CHECK-P9-NEXT: mtvsrd v4, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f7
>> -; CHECK-P9-NEXT: mtfprd f7, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f8
>> -; CHECK-P9-NEXT: mtfprd f8, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f9
>> -; CHECK-P9-NEXT: mtfprd f9, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r5
>> +; CHECK-P9-NEXT: mtvsrd v5, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f3
>> +; CHECK-P9-NEXT: lxv vs3, 64(r4)
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: xxswapd v2, vs5
>> -; CHECK-P9-NEXT: xxswapd v5, vs8
>> -; CHECK-P9-NEXT: xxswapd v0, vs9
>> -; CHECK-P9-NEXT: mtfprd f3, r5
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r5
>> -; CHECK-P9-NEXT: xxswapd vs0, vs0
>> -; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT: xxswapd v1, vs2
>> ; CHECK-P9-NEXT: lxv vs2, 80(r4)
>> -; CHECK-P9-NEXT: xxswapd v3, vs4
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs6
>> -; CHECK-P9-NEXT: xxswapd v4, vs3
>> -; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> -; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: vmrghh v2, v2, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f1
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs7
>> -; CHECK-P9-NEXT: mtfprd f1, r5
>> +; CHECK-P9-NEXT: lxv vs1, 96(r4)
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> +; CHECK-P9-NEXT: xxswapd vs3, vs3
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v1
>> -; CHECK-P9-NEXT: xxswapd v1, vs1
>> -; CHECK-P9-NEXT: mtfprd f0, r5
>> -; CHECK-P9-NEXT: vmrglh v5, v5, v1
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: xxswapd v1, vs0
>> ; CHECK-P9-NEXT: lxv vs0, 112(r4)
>> -; CHECK-P9-NEXT: lxv vs1, 96(r4)
>> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghh v5, v5, v0
>> +; CHECK-P9-NEXT: mffprwz r4, f4
>> +; CHECK-P9-NEXT: vmrglw v4, v5, v4
>> +; CHECK-P9-NEXT: mtvsrd v3, r4
>> ; CHECK-P9-NEXT: mffprwz r4, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r4
>> +; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> +; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: xxmrgld vs4, v4, v2
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT: stxv vs4, 0(r3)
>> +; CHECK-P9-NEXT: mffprwz r4, f3
>> +; CHECK-P9-NEXT: mtvsrd v3, r4
>> ; CHECK-P9-NEXT: mffprwz r4, f2
>> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT: vmrglw v3, v5, v4
>> -; CHECK-P9-NEXT: xxmrgld vs4, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v2, vs3
>> -; CHECK-P9-NEXT: vmrglh v0, v0, v1
>> -; CHECK-P9-NEXT: mtfprd f2, r4
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r4, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r4
>> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r4
>> ; CHECK-P9-NEXT: mffprwz r4, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r4
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r4, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r4
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r4
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2
>> ; CHECK-P9-NEXT: stxv vs0, 16(r3)
>> -; CHECK-P9-NEXT: stxv vs4, 0(r3)
>> ; CHECK-P9-NEXT: blr
>> ;
>> ; CHECK-BE-LABEL: test16elt:
>> @@ -639,12 +579,10 @@ define i32 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: vmrglh v2, v2, v3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -653,15 +591,13 @@ define i32 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P9: # %bb.0: # %entry
>> ; CHECK-P9-NEXT: xscvdpsxws f0, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: li r3, 0
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -700,18 +636,14 @@ define i64 @test4elt_signed(<4 x double>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: xxswapd v2, vs2
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: xxswapd v4, vs3
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: xxswapd v5, vs1
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglh v3, v5, v4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> +; CHECK-P8-NEXT: vmrghh v2, v4, v2
>> +; CHECK-P8-NEXT: vmrghh v3, v5, v3
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> @@ -725,22 +657,18 @@ define i64 @test4elt_signed(<4 x double>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -799,36 +727,28 @@ define <8 x i16> @test8elt_signed(<8 x double>*
>> nocapture readonly) local_unname
>> ; CHECK-P8-NEXT: xxswapd vs3, vs3
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: mtfprd f4, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f6
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs4
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f7
>> -; CHECK-P8-NEXT: mtfprd f6, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs5
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mtfprd f7, r4
>> -; CHECK-P8-NEXT: xxswapd v4, vs6
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v1, vs7
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v5, vs0
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: xxswapd v0, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: xxswapd v6, vs2
>> -; CHECK-P8-NEXT: vmrglh v2, v5, v2
>> -; CHECK-P8-NEXT: xxswapd v5, vs0
>> -; CHECK-P8-NEXT: vmrglh v3, v0, v3
>> -; CHECK-P8-NEXT: vmrglh v4, v6, v4
>> -; CHECK-P8-NEXT: vmrglh v5, v5, v1
>> +; CHECK-P8-NEXT: vmrghh v2, v0, v2
>> +; CHECK-P8-NEXT: vmrghh v3, v1, v3
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: vmrghh v4, v0, v4
>> +; CHECK-P8-NEXT: vmrghh v5, v1, v5
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> ; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> @@ -840,47 +760,39 @@ define <8 x i16> @test8elt_signed(<8 x double>*
>> nocapture readonly) local_unname
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> ; CHECK-P9-NEXT: xxswapd vs3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> ; CHECK-P9-NEXT: lxv vs0, 48(r3)
>> ; CHECK-P9-NEXT: lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: xxswapd v2, vs4
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: mffprwz r3, f1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>> ; CHECK-P9-NEXT: blr
>> @@ -944,209 +856,177 @@ entry:
>> define void @test16elt_signed(<16 x i16>* noalias nocapture sret
>> %agg.result, <16 x double>* nocapture readonly) local_unnamed_addr #3 {
>> ; CHECK-P8-LABEL: test16elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
>> ; CHECK-P8-NEXT: li r5, 16
>> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
>> ; CHECK-P8-NEXT: li r6, 32
>> +; CHECK-P8-NEXT: li r7, 48
>> ; CHECK-P8-NEXT: lxvd2x vs1, r4, r5
>> ; CHECK-P8-NEXT: lxvd2x vs2, r4, r6
>> -; CHECK-P8-NEXT: li r6, 48
>> -; CHECK-P8-NEXT: lxvd2x vs3, r4, r6
>> ; CHECK-P8-NEXT: li r6, 64
>> -; CHECK-P8-NEXT: xscvdpsxws f4, f0
>> +; CHECK-P8-NEXT: lxvd2x vs3, r4, r7
>> ; CHECK-P8-NEXT: lxvd2x vs5, r4, r6
>> -; CHECK-P8-NEXT: li r6, 80
>> +; CHECK-P8-NEXT: li r7, 80
>> +; CHECK-P8-NEXT: li r6, 96
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f0
>> +; CHECK-P8-NEXT: lxvd2x vs7, r4, r7
>> +; CHECK-P8-NEXT: lxvd2x vs10, r4, r6
>> +; CHECK-P8-NEXT: li r6, 112
>> ; CHECK-P8-NEXT: xxswapd vs0, vs0
>> ; CHECK-P8-NEXT: xscvdpsxws f6, f1
>> -; CHECK-P8-NEXT: lxvd2x vs7, r4, r6
>> -; CHECK-P8-NEXT: li r6, 96
>> ; CHECK-P8-NEXT: xxswapd vs1, vs1
>> ; CHECK-P8-NEXT: xscvdpsxws f8, f2
>> -; CHECK-P8-NEXT: lxvd2x vs9, r4, r6
>> -; CHECK-P8-NEXT: li r6, 112
>> ; CHECK-P8-NEXT: xxswapd vs2, vs2
>> -; CHECK-P8-NEXT: xscvdpsxws f10, f3
>> -; CHECK-P8-NEXT: lxvd2x vs11, r4, r6
>> +; CHECK-P8-NEXT: xscvdpsxws f9, f3
>> ; CHECK-P8-NEXT: xxswapd vs3, vs3
>> -; CHECK-P8-NEXT: xscvdpsxws f12, f5
>> +; CHECK-P8-NEXT: xscvdpsxws f11, f5
>> ; CHECK-P8-NEXT: xxswapd vs5, vs5
>> -; CHECK-P8-NEXT: xscvdpsxws f13, f7
>> +; CHECK-P8-NEXT: xscvdpsxws f12, f7
>> ; CHECK-P8-NEXT: xxswapd vs7, vs7
>> -; CHECK-P8-NEXT: xscvdpsxws v2, f9
>> -; CHECK-P8-NEXT: xxswapd vs9, vs9
>> -; CHECK-P8-NEXT: mffprwz r4, f4
>> -; CHECK-P8-NEXT: xscvdpsxws v3, f11
>> -; CHECK-P8-NEXT: xxswapd vs11, vs11
>> -; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: mffprwz r6, f6
>> -; CHECK-P8-NEXT: mtfprd f4, r4
>> +; CHECK-P8-NEXT: mffprwz r7, f4
>> +; CHECK-P8-NEXT: lxvd2x vs4, r4, r6
>> +; CHECK-P8-NEXT: mffprwz r4, f6
>> +; CHECK-P8-NEXT: xscvdpsxws f13, f10
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f8
>> +; CHECK-P8-NEXT: xscvdpsxws f6, f4
>> +; CHECK-P8-NEXT: mtvsrd v4, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f9
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f11
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xxswapd v4, vs4
>> -; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT: mtfprd f6, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f10
>> -; CHECK-P8-NEXT: mtfprd f8, r4
>> -; CHECK-P8-NEXT: xxswapd v5, vs6
>> +; CHECK-P8-NEXT: mtvsrd v0, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f12
>> -; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: xxswapd v0, vs8
>> -; CHECK-P8-NEXT: mtfprd f10, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f13
>> -; CHECK-P8-NEXT: mtfprd f12, r4
>> -; CHECK-P8-NEXT: xxswapd v1, vs10
>> -; CHECK-P8-NEXT: mfvsrwz r4, v2
>> +; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f13
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xxswapd v6, vs12
>> -; CHECK-P8-NEXT: xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT: mtfprd f13, r6
>> -; CHECK-P8-NEXT: mfvsrwz r6, v3
>> -; CHECK-P8-NEXT: mtvsrd v2, r4
>> -; CHECK-P8-NEXT: xxswapd v7, vs13
>> +; CHECK-P8-NEXT: mtvsrd v6, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f6
>> +; CHECK-P8-NEXT: xxswapd vs6, vs10
>> +; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> +; CHECK-P8-NEXT: mtvsrd v7, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: xxswapd vs0, vs4
>> +; CHECK-P8-NEXT: mtvsrd v2, r7
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f1
>> ; CHECK-P8-NEXT: xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT: xxswapd v2, v2
>> -; CHECK-P8-NEXT: xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT: mtvsrd v3, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f1
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: xxswapd v3, v3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f2
>> -; CHECK-P8-NEXT: mtfprd f1, r6
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> -; CHECK-P8-NEXT: mtfprd f2, r4
>> +; CHECK-P8-NEXT: xscvdpsxws f4, f6
>> +; CHECK-P8-NEXT: vmrghh v2, v8, v2
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f3
>> +; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT: vmrghh v3, v9, v3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: xxswapd v9, vs1
>> -; CHECK-P8-NEXT: mffprwz r6, f3
>> -; CHECK-P8-NEXT: xxswapd v10, vs2
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f9
>> -; CHECK-P8-NEXT: mtfprd f3, r6
>> -; CHECK-P8-NEXT: mffprwz r6, f7
>> -; CHECK-P8-NEXT: mtfprd f9, r4
>> -; CHECK-P8-NEXT: mffprwz r4, f11
>> -; CHECK-P8-NEXT: vmrglh v4, v8, v4
>> -; CHECK-P8-NEXT: xxswapd v8, vs3
>> -; CHECK-P8-NEXT: vmrglh v5, v9, v5
>> -; CHECK-P8-NEXT: xxswapd v9, vs5
>> -; CHECK-P8-NEXT: mtfprd f7, r6
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: vmrglh v0, v10, v0
>> -; CHECK-P8-NEXT: xxswapd v10, vs7
>> -; CHECK-P8-NEXT: vmrglh v1, v8, v1
>> -; CHECK-P8-NEXT: xxswapd v8, vs9
>> -; CHECK-P8-NEXT: vmrglh v6, v9, v6
>> -; CHECK-P8-NEXT: xxswapd v9, vs0
>> -; CHECK-P8-NEXT: vmrglh v7, v10, v7
>> -; CHECK-P8-NEXT: vmrglh v2, v8, v2
>> -; CHECK-P8-NEXT: vmrglh v3, v9, v3
>> -; CHECK-P8-NEXT: vmrglw v4, v5, v4
>> -; CHECK-P8-NEXT: vmrglw v5, v1, v0
>> -; CHECK-P8-NEXT: vmrglw v0, v7, v6
>> +; CHECK-P8-NEXT: vmrghh v4, v8, v4
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f7
>> +; CHECK-P8-NEXT: vmrghh v5, v9, v5
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f4
>> +; CHECK-P8-NEXT: vmrghh v0, v8, v0
>> +; CHECK-P8-NEXT: mtvsrd v8, r4
>> +; CHECK-P8-NEXT: mffprwz r4, f0
>> +; CHECK-P8-NEXT: vmrghh v1, v9, v1
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> +; CHECK-P8-NEXT: vmrghh v6, v8, v6
>> +; CHECK-P8-NEXT: vmrghh v7, v9, v7
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> +; CHECK-P8-NEXT: vmrglw v4, v1, v0
>> +; CHECK-P8-NEXT: vmrglw v5, v7, v6
>> +; CHECK-P8-NEXT: xxmrgld v2, v3, v2
>> +; CHECK-P8-NEXT: stvx v2, 0, r3
>> ; CHECK-P8-NEXT: xxmrgld v3, v5, v4
>> -; CHECK-P8-NEXT: stvx v3, 0, r3
>> -; CHECK-P8-NEXT: xxmrgld v2, v2, v0
>> -; CHECK-P8-NEXT: stvx v2, r3, r5
>> +; CHECK-P8-NEXT: stvx v3, r3, r5
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: test16elt_signed:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: lxv vs4, 0(r4)
>> -; CHECK-P9-NEXT: lxv vs3, 16(r4)
>> -; CHECK-P9-NEXT: lxv vs2, 32(r4)
>> -; CHECK-P9-NEXT: xscvdpsxws f5, f4
>> -; CHECK-P9-NEXT: lxv vs1, 48(r4)
>> -; CHECK-P9-NEXT: xscvdpsxws f6, f3
>> -; CHECK-P9-NEXT: lxv vs0, 64(r4)
>> -; CHECK-P9-NEXT: xscvdpsxws f7, f2
>> -; CHECK-P9-NEXT: xscvdpsxws f8, f1
>> -; CHECK-P9-NEXT: xxswapd vs4, vs4
>> -; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> -; CHECK-P9-NEXT: mffprwz r5, f5
>> -; CHECK-P9-NEXT: xscvdpsxws f9, f0
>> +; CHECK-P9-NEXT: lxv vs3, 0(r4)
>> +; CHECK-P9-NEXT: lxv vs2, 16(r4)
>> +; CHECK-P9-NEXT: lxv vs1, 32(r4)
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> +; CHECK-P9-NEXT: lxv vs0, 48(r4)
>> +; CHECK-P9-NEXT: xscvdpsxws f5, f2
>> +; CHECK-P9-NEXT: xscvdpsxws f6, f1
>> ; CHECK-P9-NEXT: xxswapd vs3, vs3
>> +; CHECK-P9-NEXT: xscvdpsxws f7, f0
>> +; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: mffprwz r5, f4
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT: mtfprd f5, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f6
>> ; CHECK-P9-NEXT: xxswapd vs2, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: mtfprd f6, r5
>> +; CHECK-P9-NEXT: mtvsrd v2, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f5
>> +; CHECK-P9-NEXT: mtvsrd v3, r5
>> +; CHECK-P9-NEXT: mffprwz r5, f6
>> +; CHECK-P9-NEXT: mtvsrd v4, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f7
>> -; CHECK-P9-NEXT: mtfprd f7, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f8
>> -; CHECK-P9-NEXT: mtfprd f8, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f9
>> -; CHECK-P9-NEXT: mtfprd f9, r5
>> -; CHECK-P9-NEXT: mffprwz r5, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r5
>> +; CHECK-P9-NEXT: mtvsrd v5, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f3
>> +; CHECK-P9-NEXT: lxv vs3, 64(r4)
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: xxswapd v2, vs5
>> -; CHECK-P9-NEXT: xxswapd v5, vs8
>> -; CHECK-P9-NEXT: xxswapd v0, vs9
>> -; CHECK-P9-NEXT: mtfprd f3, r5
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r5
>> -; CHECK-P9-NEXT: xxswapd vs0, vs0
>> -; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT: xxswapd v1, vs2
>> ; CHECK-P9-NEXT: lxv vs2, 80(r4)
>> -; CHECK-P9-NEXT: xxswapd v3, vs4
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs6
>> -; CHECK-P9-NEXT: xxswapd v4, vs3
>> -; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> -; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: vmrghh v2, v2, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f1
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs7
>> -; CHECK-P9-NEXT: mtfprd f1, r5
>> +; CHECK-P9-NEXT: lxv vs1, 96(r4)
>> +; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> +; CHECK-P9-NEXT: xxswapd vs3, vs3
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> ; CHECK-P9-NEXT: mffprwz r5, f0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v1
>> -; CHECK-P9-NEXT: xxswapd v1, vs1
>> -; CHECK-P9-NEXT: mtfprd f0, r5
>> -; CHECK-P9-NEXT: vmrglh v5, v5, v1
>> -; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT: xxswapd v1, vs0
>> ; CHECK-P9-NEXT: lxv vs0, 112(r4)
>> -; CHECK-P9-NEXT: lxv vs1, 96(r4)
>> +; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r5
>> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghh v5, v5, v0
>> +; CHECK-P9-NEXT: mffprwz r4, f4
>> +; CHECK-P9-NEXT: vmrglw v4, v5, v4
>> +; CHECK-P9-NEXT: mtvsrd v3, r4
>> ; CHECK-P9-NEXT: mffprwz r4, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r4
>> +; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> +; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: xxmrgld vs4, v4, v2
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> +; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT: stxv vs4, 0(r3)
>> +; CHECK-P9-NEXT: mffprwz r4, f3
>> +; CHECK-P9-NEXT: mtvsrd v3, r4
>> ; CHECK-P9-NEXT: mffprwz r4, f2
>> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT: vmrglw v3, v5, v4
>> -; CHECK-P9-NEXT: xxmrgld vs4, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v2, vs3
>> -; CHECK-P9-NEXT: vmrglh v0, v0, v1
>> -; CHECK-P9-NEXT: mtfprd f2, r4
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r4, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r4
>> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r4
>> ; CHECK-P9-NEXT: mffprwz r4, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r4
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghh v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r4, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r4
>> +; CHECK-P9-NEXT: mtvsrd v4, r4
>> ; CHECK-P9-NEXT: mffprwz r4, f0
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v0
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r4
>> +; CHECK-P9-NEXT: vmrghh v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld vs0, v3, v2
>> ; CHECK-P9-NEXT: stxv vs0, 16(r3)
>> -; CHECK-P9-NEXT: stxv vs4, 0(r3)
>> ; CHECK-P9-NEXT: blr
>> ;
>> ; CHECK-BE-LABEL: test16elt_signed:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> index 369fb3f10100..173ced964ad6 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> @@ -16,12 +16,10 @@ define i64 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpuxws f1, v2
>> ; CHECK-P8-NEXT: xscvdpuxws f0, f0
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: mtvsrwz v2, r3
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> +; CHECK-P8-NEXT: mtvsrwz v3, r4
>> +; CHECK-P8-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -35,7 +33,7 @@ define i64 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpuxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> ; CHECK-P9-NEXT: mtvsrws v2, r3
>> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghw v2, v3, v2
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -310,12 +308,10 @@ define i64 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: mtvsrwz v2, r3
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: vmrglw v2, v2, v3
>> +; CHECK-P8-NEXT: mtvsrwz v3, r4
>> +; CHECK-P8-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -329,7 +325,7 @@ define i64 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> ; CHECK-P9-NEXT: mtvsrws v2, r3
>> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghw v2, v3, v2
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> index fb13d1bd71f5..fd28d9a1afdc 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> @@ -16,12 +16,10 @@ define i16 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: vmrglb v2, v2, v3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: clrldi r3, r3, 48
>> @@ -33,15 +31,13 @@ define i16 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P9: # %bb.0: # %entry
>> ; CHECK-P9-NEXT: xscvdpsxws f0, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: addi r3, r1, -2
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8
>> ; CHECK-P9-NEXT: stxsihx v2, 0, r3
>> ; CHECK-P9-NEXT: lhz r3, -2(r1)
>> @@ -84,18 +80,14 @@ define i32 @test4elt(<4 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: xxswapd v2, vs2
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: xxswapd v4, vs3
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: xxswapd v5, vs1
>> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglb v3, v5, v4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> +; CHECK-P8-NEXT: vmrghb v2, v4, v2
>> +; CHECK-P8-NEXT: vmrghb v3, v5, v3
>> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> @@ -109,24 +101,20 @@ define i32 @test4elt(<4 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: li r3, 0
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> +; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -185,36 +173,28 @@ define i64 @test8elt(<8 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>> ; CHECK-P8-NEXT: xxswapd vs3, vs3
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: mtfprd f4, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f6
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs4
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f7
>> -; CHECK-P8-NEXT: mtfprd f6, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs5
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mtfprd f7, r4
>> -; CHECK-P8-NEXT: xxswapd v4, vs6
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v1, vs7
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v5, vs0
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: xxswapd v0, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: xxswapd v6, vs2
>> -; CHECK-P8-NEXT: vmrglb v2, v5, v2
>> -; CHECK-P8-NEXT: xxswapd v5, vs0
>> -; CHECK-P8-NEXT: vmrglb v3, v0, v3
>> -; CHECK-P8-NEXT: vmrglb v4, v6, v4
>> -; CHECK-P8-NEXT: vmrglb v5, v5, v1
>> +; CHECK-P8-NEXT: vmrghb v2, v0, v2
>> +; CHECK-P8-NEXT: vmrghb v3, v1, v3
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: vmrghb v4, v0, v4
>> +; CHECK-P8-NEXT: vmrghb v5, v1, v5
>> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P8-NEXT: vmrglh v3, v5, v4
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> @@ -228,47 +208,39 @@ define i64 @test8elt(<8 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> ; CHECK-P9-NEXT: xxswapd vs3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> ; CHECK-P9-NEXT: lxv vs0, 48(r3)
>> ; CHECK-P9-NEXT: lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: xxswapd v2, vs4
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: mffprwz r3, f1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> @@ -364,79 +336,63 @@ define <16 x i8> @test16elt(<16 x double>*
>> nocapture readonly) local_unnamed_add
>> ; CHECK-P8-NEXT: xxswapd vs7, vs7
>> ; CHECK-P8-NEXT: xscvdpsxws v2, f9
>> ; CHECK-P8-NEXT: xxswapd vs9, vs9
>> -; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: xscvdpsxws v3, f11
>> ; CHECK-P8-NEXT: xxswapd vs11, vs11
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: mffprwz r4, f6
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: mtfprd f4, r3
>> -; CHECK-P8-NEXT: mffprwz r3, f8
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xxswapd v4, vs4
>> -; CHECK-P8-NEXT: mtfprd f6, r4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f8
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f10
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT: xxswapd v5, vs6
>> -; CHECK-P8-NEXT: mtfprd f8, r3
>> -; CHECK-P8-NEXT: mffprwz r3, f12
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xxswapd v0, vs8
>> -; CHECK-P8-NEXT: mtfprd f10, r4
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f12
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f13
>> ; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: xxswapd v1, vs10
>> -; CHECK-P8-NEXT: mtfprd f12, r3
>> -; CHECK-P8-NEXT: mfvsrwz r3, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT: xxswapd v6, vs12
>> -; CHECK-P8-NEXT: mtfprd f13, r4
>> +; CHECK-P8-NEXT: mtvsrd v6, r3
>> +; CHECK-P8-NEXT: mfvsrwz r3, v2
>> +; CHECK-P8-NEXT: mtvsrd v2, r4
>> ; CHECK-P8-NEXT: mfvsrwz r4, v3
>> -; CHECK-P8-NEXT: mtvsrd v2, r3
>> -; CHECK-P8-NEXT: xxswapd v7, vs13
>> -; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT: xxswapd v2, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> +; CHECK-P8-NEXT: mtvsrd v7, r4
>> +; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v3, v3
>> +; CHECK-P8-NEXT: mtvsrd v8, r3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: xxswapd v9, vs1
>> +; CHECK-P8-NEXT: vmrghb v4, v8, v4
>> +; CHECK-P8-NEXT: vmrghb v5, v9, v5
>> +; CHECK-P8-NEXT: mtvsrd v8, r3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f5
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: xxswapd v10, vs2
>> ; CHECK-P8-NEXT: mffprwz r4, f7
>> -; CHECK-P8-NEXT: mtfprd f5, r3
>> +; CHECK-P8-NEXT: vmrghb v0, v8, v0
>> +; CHECK-P8-NEXT: vmrghb v1, v9, v1
>> +; CHECK-P8-NEXT: mtvsrd v8, r3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f9
>> -; CHECK-P8-NEXT: mtfprd f7, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f11
>> -; CHECK-P8-NEXT: vmrglb v4, v8, v4
>> -; CHECK-P8-NEXT: xxswapd v8, vs3
>> -; CHECK-P8-NEXT: vmrglb v5, v9, v5
>> -; CHECK-P8-NEXT: xxswapd v9, vs5
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: vmrglb v0, v10, v0
>> -; CHECK-P8-NEXT: xxswapd v10, vs7
>> -; CHECK-P8-NEXT: vmrglb v1, v8, v1
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> -; CHECK-P8-NEXT: vmrglb v6, v9, v6
>> -; CHECK-P8-NEXT: xxswapd v9, vs1
>> -; CHECK-P8-NEXT: vmrglb v7, v10, v7
>> -; CHECK-P8-NEXT: vmrglb v2, v8, v2
>> -; CHECK-P8-NEXT: vmrglb v3, v9, v3
>> +; CHECK-P8-NEXT: vmrghb v6, v8, v6
>> +; CHECK-P8-NEXT: vmrghb v2, v9, v2
>> +; CHECK-P8-NEXT: mtvsrd v8, r3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> +; CHECK-P8-NEXT: vmrghb v3, v8, v3
>> +; CHECK-P8-NEXT: vmrghb v7, v9, v7
>> ; CHECK-P8-NEXT: vmrglh v4, v5, v4
>> ; CHECK-P8-NEXT: vmrglh v5, v1, v0
>> -; CHECK-P8-NEXT: vmrglh v0, v7, v6
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> -; CHECK-P8-NEXT: vmrglw v2, v2, v0
>> -; CHECK-P8-NEXT: xxmrgld v2, v2, v3
>> +; CHECK-P8-NEXT: vmrglh v2, v2, v6
>> +; CHECK-P8-NEXT: vmrglh v3, v7, v3
>> +; CHECK-P8-NEXT: vmrglw v4, v5, v4
>> +; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: xxmrgld v2, v2, v4
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: test16elt:
>> @@ -445,94 +401,78 @@ define <16 x i8> @test16elt(<16 x double>*
>> nocapture readonly) local_unnamed_add
>> ; CHECK-P9-NEXT: xscvdpsxws f8, f7
>> ; CHECK-P9-NEXT: xxswapd vs7, vs7
>> ; CHECK-P9-NEXT: xscvdpsxws f7, f7
>> +; CHECK-P9-NEXT: lxv vs6, 16(r3)
>> ; CHECK-P9-NEXT: lxv vs0, 112(r3)
>> ; CHECK-P9-NEXT: lxv vs1, 96(r3)
>> ; CHECK-P9-NEXT: lxv vs2, 80(r3)
>> ; CHECK-P9-NEXT: lxv vs3, 64(r3)
>> ; CHECK-P9-NEXT: lxv vs4, 48(r3)
>> ; CHECK-P9-NEXT: lxv vs5, 32(r3)
>> -; CHECK-P9-NEXT: lxv vs6, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f8
>> -; CHECK-P9-NEXT: mtfprd f8, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f7
>> -; CHECK-P9-NEXT: xxswapd v2, vs8
>> -; CHECK-P9-NEXT: mtfprd f7, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs7
>> ; CHECK-P9-NEXT: xscvdpsxws f7, f6
>> ; CHECK-P9-NEXT: xxswapd vs6, vs6
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f6, f6
>> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f7
>> -; CHECK-P9-NEXT: mtfprd f7, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f6
>> -; CHECK-P9-NEXT: mtfprd f6, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs6
>> ; CHECK-P9-NEXT: xscvdpsxws f6, f5
>> ; CHECK-P9-NEXT: xxswapd vs5, vs5
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f5, f5
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f6
>> -; CHECK-P9-NEXT: mtfprd f6, r3
>> -; CHECK-P9-NEXT: mffprwz r3, f5
>> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs7
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs6
>> -; CHECK-P9-NEXT: mtfprd f5, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs5
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: mffprwz r3, f5
>> ; CHECK-P9-NEXT: xscvdpsxws f5, f4
>> ; CHECK-P9-NEXT: xxswapd vs4, vs4
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f5
>> -; CHECK-P9-NEXT: mtfprd f5, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs4
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> ; CHECK-P9-NEXT: xxswapd vs3, vs3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs5
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r3
>> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs4
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs3
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> -; CHECK-P9-NEXT: xxswapd v0, vs0
>> -; CHECK-P9-NEXT: vmrglb v5, v5, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r3
>> +; CHECK-P9-NEXT: vmrghb v5, v5, v0
>> ; CHECK-P9-NEXT: vmrglh v4, v5, v4
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>> @@ -649,12 +589,10 @@ define i16 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvdpsxws f1, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: mffprwz r3, f1
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r4, f0
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: xxswapd v3, vs1
>> -; CHECK-P8-NEXT: vmrglb v2, v2, v3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: clrldi r3, r3, 48
>> @@ -666,15 +604,13 @@ define i16 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P9: # %bb.0: # %entry
>> ; CHECK-P9-NEXT: xscvdpsxws f0, v2
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs0
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: addi r3, r1, -2
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglb v2, v3, v2
>> +; CHECK-P9-NEXT: vmrghb v2, v3, v2
>> ; CHECK-P9-NEXT: vsldoi v2, v2, v2, 8
>> ; CHECK-P9-NEXT: stxsihx v2, 0, r3
>> ; CHECK-P9-NEXT: lhz r3, -2(r1)
>> @@ -717,18 +653,14 @@ define i32 @test4elt_signed(<4 x double>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: xxswapd v2, vs2
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: xxswapd v4, vs3
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: xxswapd v5, vs1
>> -; CHECK-P8-NEXT: vmrglb v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglb v3, v5, v4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> +; CHECK-P8-NEXT: vmrghb v2, v4, v2
>> +; CHECK-P8-NEXT: vmrghb v3, v5, v3
>> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> @@ -742,24 +674,20 @@ define i32 @test4elt_signed(<4 x double>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> ; CHECK-P9-NEXT: lxv vs0, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: xxswapd v2, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs1
>> -; CHECK-P9-NEXT: xxswapd v4, vs0
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: li r3, 0
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> +; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P9-NEXT: vextuwrx r3, r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -818,36 +746,28 @@ define i64 @test8elt_signed(<8 x double>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P8-NEXT: xxswapd vs3, vs3
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: mffprwz r4, f5
>> -; CHECK-P8-NEXT: mtfprd f4, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: mffprwz r3, f6
>> -; CHECK-P8-NEXT: mtfprd f5, r4
>> -; CHECK-P8-NEXT: xxswapd v2, vs4
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f7
>> -; CHECK-P8-NEXT: mtfprd f6, r3
>> -; CHECK-P8-NEXT: xxswapd v3, vs5
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f0
>> -; CHECK-P8-NEXT: mtfprd f7, r4
>> -; CHECK-P8-NEXT: xxswapd v4, vs6
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v1, vs7
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v5, vs0
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: xxswapd v0, vs1
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: xxswapd v6, vs2
>> -; CHECK-P8-NEXT: vmrglb v2, v5, v2
>> -; CHECK-P8-NEXT: xxswapd v5, vs0
>> -; CHECK-P8-NEXT: vmrglb v3, v0, v3
>> -; CHECK-P8-NEXT: vmrglb v4, v6, v4
>> -; CHECK-P8-NEXT: vmrglb v5, v5, v1
>> +; CHECK-P8-NEXT: vmrghb v2, v0, v2
>> +; CHECK-P8-NEXT: vmrghb v3, v1, v3
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> +; CHECK-P8-NEXT: vmrghb v4, v0, v4
>> +; CHECK-P8-NEXT: vmrghb v5, v1, v5
>> ; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> ; CHECK-P8-NEXT: vmrglh v3, v5, v4
>> ; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> @@ -861,47 +781,39 @@ define i64 @test8elt_signed(<8 x double>* nocapture
>> readonly) local_unnamed_addr
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> ; CHECK-P9-NEXT: xxswapd vs3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> ; CHECK-P9-NEXT: lxv vs0, 48(r3)
>> ; CHECK-P9-NEXT: lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT: lxv vs2, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: xxswapd v2, vs4
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs3
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: mffprwz r3, f1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs1
>> -; CHECK-P9-NEXT: xxswapd v5, vs0
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> @@ -997,79 +909,63 @@ define <16 x i8> @test16elt_signed(<16 x double>*
>> nocapture readonly) local_unna
>> ; CHECK-P8-NEXT: xxswapd vs7, vs7
>> ; CHECK-P8-NEXT: xscvdpsxws v2, f9
>> ; CHECK-P8-NEXT: xxswapd vs9, vs9
>> -; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: xscvdpsxws v3, f11
>> ; CHECK-P8-NEXT: xxswapd vs11, vs11
>> +; CHECK-P8-NEXT: mffprwz r3, f4
>> ; CHECK-P8-NEXT: mffprwz r4, f6
>> ; CHECK-P8-NEXT: xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT: mtfprd f4, r3
>> -; CHECK-P8-NEXT: mffprwz r3, f8
>> ; CHECK-P8-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT: xxswapd v4, vs4
>> -; CHECK-P8-NEXT: mtfprd f6, r4
>> +; CHECK-P8-NEXT: mtvsrd v4, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f8
>> +; CHECK-P8-NEXT: mtvsrd v5, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f10
>> ; CHECK-P8-NEXT: xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT: xxswapd v5, vs6
>> -; CHECK-P8-NEXT: mtfprd f8, r3
>> -; CHECK-P8-NEXT: mffprwz r3, f12
>> ; CHECK-P8-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT: xxswapd v0, vs8
>> -; CHECK-P8-NEXT: mtfprd f10, r4
>> +; CHECK-P8-NEXT: mtvsrd v0, r3
>> +; CHECK-P8-NEXT: mffprwz r3, f12
>> +; CHECK-P8-NEXT: mtvsrd v1, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f13
>> ; CHECK-P8-NEXT: xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT: xxswapd v1, vs10
>> -; CHECK-P8-NEXT: mtfprd f12, r3
>> -; CHECK-P8-NEXT: mfvsrwz r3, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT: xxswapd v6, vs12
>> -; CHECK-P8-NEXT: mtfprd f13, r4
>> +; CHECK-P8-NEXT: mtvsrd v6, r3
>> +; CHECK-P8-NEXT: mfvsrwz r3, v2
>> +; CHECK-P8-NEXT: mtvsrd v2, r4
>> ; CHECK-P8-NEXT: mfvsrwz r4, v3
>> -; CHECK-P8-NEXT: mtvsrd v2, r3
>> -; CHECK-P8-NEXT: xxswapd v7, vs13
>> -; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT: xxswapd v2, v2
>> ; CHECK-P8-NEXT: xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> +; CHECK-P8-NEXT: mtvsrd v7, r4
>> +; CHECK-P8-NEXT: mffprwz r3, f0
>> ; CHECK-P8-NEXT: mffprwz r4, f1
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: xxswapd v3, v3
>> +; CHECK-P8-NEXT: mtvsrd v8, r3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f2
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> ; CHECK-P8-NEXT: mffprwz r4, f3
>> -; CHECK-P8-NEXT: mtfprd f2, r3
>> -; CHECK-P8-NEXT: xxswapd v9, vs1
>> +; CHECK-P8-NEXT: vmrghb v4, v8, v4
>> +; CHECK-P8-NEXT: vmrghb v5, v9, v5
>> +; CHECK-P8-NEXT: mtvsrd v8, r3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f5
>> -; CHECK-P8-NEXT: mtfprd f3, r4
>> -; CHECK-P8-NEXT: xxswapd v10, vs2
>> ; CHECK-P8-NEXT: mffprwz r4, f7
>> -; CHECK-P8-NEXT: mtfprd f5, r3
>> +; CHECK-P8-NEXT: vmrghb v0, v8, v0
>> +; CHECK-P8-NEXT: vmrghb v1, v9, v1
>> +; CHECK-P8-NEXT: mtvsrd v8, r3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> ; CHECK-P8-NEXT: mffprwz r3, f9
>> -; CHECK-P8-NEXT: mtfprd f7, r4
>> ; CHECK-P8-NEXT: mffprwz r4, f11
>> -; CHECK-P8-NEXT: vmrglb v4, v8, v4
>> -; CHECK-P8-NEXT: xxswapd v8, vs3
>> -; CHECK-P8-NEXT: vmrglb v5, v9, v5
>> -; CHECK-P8-NEXT: xxswapd v9, vs5
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: mtfprd f1, r4
>> -; CHECK-P8-NEXT: vmrglb v0, v10, v0
>> -; CHECK-P8-NEXT: xxswapd v10, vs7
>> -; CHECK-P8-NEXT: vmrglb v1, v8, v1
>> -; CHECK-P8-NEXT: xxswapd v8, vs0
>> -; CHECK-P8-NEXT: vmrglb v6, v9, v6
>> -; CHECK-P8-NEXT: xxswapd v9, vs1
>> -; CHECK-P8-NEXT: vmrglb v7, v10, v7
>> -; CHECK-P8-NEXT: vmrglb v2, v8, v2
>> -; CHECK-P8-NEXT: vmrglb v3, v9, v3
>> +; CHECK-P8-NEXT: vmrghb v6, v8, v6
>> +; CHECK-P8-NEXT: vmrghb v2, v9, v2
>> +; CHECK-P8-NEXT: mtvsrd v8, r3
>> +; CHECK-P8-NEXT: mtvsrd v9, r4
>> +; CHECK-P8-NEXT: vmrghb v3, v8, v3
>> +; CHECK-P8-NEXT: vmrghb v7, v9, v7
>> ; CHECK-P8-NEXT: vmrglh v4, v5, v4
>> ; CHECK-P8-NEXT: vmrglh v5, v1, v0
>> -; CHECK-P8-NEXT: vmrglh v0, v7, v6
>> -; CHECK-P8-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT: vmrglw v3, v5, v4
>> -; CHECK-P8-NEXT: vmrglw v2, v2, v0
>> -; CHECK-P8-NEXT: xxmrgld v2, v2, v3
>> +; CHECK-P8-NEXT: vmrglh v2, v2, v6
>> +; CHECK-P8-NEXT: vmrglh v3, v7, v3
>> +; CHECK-P8-NEXT: vmrglw v4, v5, v4
>> +; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: xxmrgld v2, v2, v4
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: test16elt_signed:
>> @@ -1078,94 +974,78 @@ define <16 x i8> @test16elt_signed(<16 x double>*
>> nocapture readonly) local_unna
>> ; CHECK-P9-NEXT: xscvdpsxws f8, f7
>> ; CHECK-P9-NEXT: xxswapd vs7, vs7
>> ; CHECK-P9-NEXT: xscvdpsxws f7, f7
>> +; CHECK-P9-NEXT: lxv vs6, 16(r3)
>> ; CHECK-P9-NEXT: lxv vs0, 112(r3)
>> ; CHECK-P9-NEXT: lxv vs1, 96(r3)
>> ; CHECK-P9-NEXT: lxv vs2, 80(r3)
>> ; CHECK-P9-NEXT: lxv vs3, 64(r3)
>> ; CHECK-P9-NEXT: lxv vs4, 48(r3)
>> ; CHECK-P9-NEXT: lxv vs5, 32(r3)
>> -; CHECK-P9-NEXT: lxv vs6, 16(r3)
>> ; CHECK-P9-NEXT: mffprwz r3, f8
>> -; CHECK-P9-NEXT: mtfprd f8, r3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f7
>> -; CHECK-P9-NEXT: xxswapd v2, vs8
>> -; CHECK-P9-NEXT: mtfprd f7, r3
>> -; CHECK-P9-NEXT: xxswapd v3, vs7
>> ; CHECK-P9-NEXT: xscvdpsxws f7, f6
>> ; CHECK-P9-NEXT: xxswapd vs6, vs6
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f6, f6
>> +; CHECK-P9-NEXT: vmrghb v2, v2, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f7
>> -; CHECK-P9-NEXT: mtfprd f7, r3
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f6
>> -; CHECK-P9-NEXT: mtfprd f6, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs6
>> ; CHECK-P9-NEXT: xscvdpsxws f6, f5
>> ; CHECK-P9-NEXT: xxswapd vs5, vs5
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f5, f5
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f6
>> -; CHECK-P9-NEXT: mtfprd f6, r3
>> -; CHECK-P9-NEXT: mffprwz r3, f5
>> -; CHECK-P9-NEXT: vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT: xxswapd v3, vs7
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> ; CHECK-P9-NEXT: vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs6
>> -; CHECK-P9-NEXT: mtfprd f5, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs5
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> +; CHECK-P9-NEXT: mffprwz r3, f5
>> ; CHECK-P9-NEXT: xscvdpsxws f5, f4
>> ; CHECK-P9-NEXT: xxswapd vs4, vs4
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f5
>> -; CHECK-P9-NEXT: mtfprd f5, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs4
>> ; CHECK-P9-NEXT: xscvdpsxws f4, f3
>> ; CHECK-P9-NEXT: xxswapd vs3, vs3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs5
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> ; CHECK-P9-NEXT: mffprwz r3, f4
>> -; CHECK-P9-NEXT: mtfprd f4, r3
>> +; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT: mtvsrd v3, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> -; CHECK-P9-NEXT: xxswapd v4, vs3
>> ; CHECK-P9-NEXT: xscvdpsxws f3, f2
>> ; CHECK-P9-NEXT: xxswapd vs2, vs2
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT: vmrghb v3, v3, v4
>> ; CHECK-P9-NEXT: mffprwz r3, f3
>> -; CHECK-P9-NEXT: mtfprd f3, r3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs2
>> ; CHECK-P9-NEXT: xscvdpsxws f2, f1
>> ; CHECK-P9-NEXT: xxswapd vs1, vs1
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT: vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT: xxswapd v3, vs4
>> -; CHECK-P9-NEXT: vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT: xxswapd v4, vs3
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> -; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: mffprwz r3, f2
>> -; CHECK-P9-NEXT: mtfprd f2, r3
>> +; CHECK-P9-NEXT: vmrglh v3, v4, v3
>> +; CHECK-P9-NEXT: mtvsrd v4, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: xxswapd v4, vs2
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> ; CHECK-P9-NEXT: xscvdpsxws f1, f0
>> ; CHECK-P9-NEXT: xxswapd vs0, vs0
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT: vmrghb v4, v4, v5
>> ; CHECK-P9-NEXT: mffprwz r3, f1
>> -; CHECK-P9-NEXT: mtfprd f1, r3
>> +; CHECK-P9-NEXT: mtvsrd v5, r3
>> ; CHECK-P9-NEXT: mffprwz r3, f0
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: vmrglb v4, v4, v5
>> -; CHECK-P9-NEXT: xxswapd v5, vs1
>> -; CHECK-P9-NEXT: xxswapd v0, vs0
>> -; CHECK-P9-NEXT: vmrglb v5, v5, v0
>> +; CHECK-P9-NEXT: mtvsrd v0, r3
>> +; CHECK-P9-NEXT: vmrghb v5, v5, v0
>> ; CHECK-P9-NEXT: vmrglh v4, v5, v4
>> ; CHECK-P9-NEXT: vmrglw v3, v4, v3
>> ; CHECK-P9-NEXT: xxmrgld v2, v3, v2
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> index e51af62cb128..5ecd34941b39 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> @@ -24,9 +24,9 @@ define i64 @test2elt(i32 %a.coerce) local_unnamed_addr
>> #0 {
>> ; CHECK-P8-NEXT: xscvuxdsp f1, f1
>> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
>> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
>> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -43,12 +43,12 @@ define i64 @test2elt(i32 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> ; CHECK-P9-NEXT: vextuhrx r3, r3, v2
>> ; CHECK-P9-NEXT: clrlwi r3, r3, 16
>> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
>> ; CHECK-P9-NEXT: mtfprwz f0, r3
>> ; CHECK-P9-NEXT: xscvuxdsp f0, f0
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -80,25 +80,17 @@ entry:
>> define <4 x float> @test4elt(i64 %a.coerce) local_unnamed_addr #1 {
>> ; CHECK-P8-LABEL: test4elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: addi r3, r4, .LCPI1_0 at toc@l
>> -; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: lvx v3, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v4, v2, v3
>> +; CHECK-P8-NEXT: xxlxor v2, v2, v2
>> +; CHECK-P8-NEXT: mtvsrd v3, r3
>> +; CHECK-P8-NEXT: vmrghh v2, v2, v3
>> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: test4elt:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: addis r3, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r3, r3, .LCPI1_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v3, 0, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P9-NEXT: vperm v2, v4, v2, v3
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> +; CHECK-P9-NEXT: xxlxor v3, v3, v3
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> ; CHECK-P9-NEXT: xvcvuxwsp v2, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -121,17 +113,11 @@ entry:
>> define void @test8elt(<8 x float>* noalias nocapture sret %agg.result,
>> <8 x i16> %a) local_unnamed_addr #2 {
>> ; CHECK-P8-LABEL: test8elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
>> -; CHECK-P8-NEXT: addis r5, r2, .LCPI2_1 at toc@ha
>> -; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_0 at toc@l
>> -; CHECK-P8-NEXT: lvx v3, 0, r4
>> -; CHECK-P8-NEXT: addi r4, r5, .LCPI2_1 at toc@l
>> -; CHECK-P8-NEXT: lvx v5, 0, r4
>> +; CHECK-P8-NEXT: xxlxor v3, v3, v3
>> ; CHECK-P8-NEXT: li r4, 16
>> -; CHECK-P8-NEXT: vperm v3, v4, v2, v3
>> -; CHECK-P8-NEXT: vperm v2, v4, v2, v5
>> -; CHECK-P8-NEXT: xvcvuxwsp v3, v3
>> +; CHECK-P8-NEXT: vmrglh v4, v3, v2
>> +; CHECK-P8-NEXT: vmrghh v2, v3, v2
>> +; CHECK-P8-NEXT: xvcvuxwsp v3, v4
>> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2
>> ; CHECK-P8-NEXT: stvx v3, 0, r3
>> ; CHECK-P8-NEXT: stvx v2, r3, r4
>> @@ -139,19 +125,13 @@ define void @test8elt(<8 x float>* noalias
>> nocapture sret %agg.result, <8 x i16>
>> ;
>> ; CHECK-P9-LABEL: test8elt:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
>> -; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l
>> -; CHECK-P9-NEXT: lxvx v3, 0, r4
>> -; CHECK-P9-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
>> -; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l
>> -; CHECK-P9-NEXT: vperm v3, v4, v2, v3
>> -; CHECK-P9-NEXT: xvcvuxwsp vs0, v3
>> -; CHECK-P9-NEXT: lxvx v3, 0, r4
>> -; CHECK-P9-NEXT: vperm v2, v4, v2, v3
>> -; CHECK-P9-NEXT: stxv vs0, 0(r3)
>> +; CHECK-P9-NEXT: xxlxor v3, v3, v3
>> +; CHECK-P9-NEXT: vmrglh v4, v3, v2
>> +; CHECK-P9-NEXT: vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT: xvcvuxwsp vs0, v4
>> ; CHECK-P9-NEXT: xvcvuxwsp vs1, v2
>> ; CHECK-P9-NEXT: stxv vs1, 16(r3)
>> +; CHECK-P9-NEXT: stxv vs0, 0(r3)
>> ; CHECK-P9-NEXT: blr
>> ;
>> ; CHECK-BE-LABEL: test8elt:
>> @@ -276,9 +256,9 @@ define i64 @test2elt_signed(i32 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvsxdsp f1, f1
>> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
>> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
>> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -295,12 +275,12 @@ define i64 @test2elt_signed(i32 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> ; CHECK-P9-NEXT: vextuhrx r3, r3, v2
>> ; CHECK-P9-NEXT: extsh r3, r3
>> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
>> ; CHECK-P9-NEXT: mtfprwa f0, r3
>> ; CHECK-P9-NEXT: xscvsxdsp f0, f0
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -332,11 +312,10 @@ entry:
>> define <4 x float> @test4elt_signed(i64 %a.coerce) local_unnamed_addr #1
>> {
>> ; CHECK-P8-LABEL: test4elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> +; CHECK-P8-NEXT: mtvsrd v2, r3
>> ; CHECK-P8-NEXT: vspltisw v3, 8
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> +; CHECK-P8-NEXT: vmrghh v2, v2, v2
>> ; CHECK-P8-NEXT: vadduwm v3, v3, v3
>> -; CHECK-P8-NEXT: vmrglh v2, v2, v2
>> ; CHECK-P8-NEXT: vslw v2, v2, v3
>> ; CHECK-P8-NEXT: vsraw v2, v2, v3
>> ; CHECK-P8-NEXT: xvcvsxwsp v2, v2
>> @@ -344,9 +323,8 @@ define <4 x float> @test4elt_signed(i64 %a.coerce)
>> local_unnamed_addr #1 {
>> ;
>> ; CHECK-P9-LABEL: test4elt_signed:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: mtfprd f0, r3
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vmrglh v2, v2, v2
>> +; CHECK-P9-NEXT: mtvsrd v2, r3
>> +; CHECK-P9-NEXT: vmrghh v2, v2, v2
>> ; CHECK-P9-NEXT: vextsh2w v2, v2
>> ; CHECK-P9-NEXT: xvcvsxwsp v2, v2
>> ; CHECK-P9-NEXT: blr
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> index faec95831816..ea8ede3af22a 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i32 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: test2elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r4, r2, .LCPI0_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: addi r3, r4, .LCPI0_0 at toc@l
>> +; CHECK-P8-NEXT: mtvsrwz v2, r3
>> +; CHECK-P8-NEXT: addi r4, r4, .LCPI0_0 at toc@l
>> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: lvx v3, 0, r3
>> +; CHECK-P8-NEXT: lvx v3, 0, r4
>> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3
>> ; CHECK-P8-NEXT: xvcvuxddp v2, v2
>> ; CHECK-P8-NEXT: blr
>> @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture
>> sret %agg.result, i64 %a.c
>> ; CHECK-P8-LABEL: test4elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r5, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_1 at toc@ha
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI1_1 at toc@ha
>> +; CHECK-P8-NEXT: mtvsrd v2, r4
>> ; CHECK-P8-NEXT: addi r5, r5, .LCPI1_0 at toc@l
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI1_1 at toc@l
>> +; CHECK-P8-NEXT: addi r4, r6, .LCPI1_1 at toc@l
>> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: lvx v2, 0, r5
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> +; CHECK-P8-NEXT: lvx v3, 0, r5
>> ; CHECK-P8-NEXT: lvx v5, 0, r4
>> ; CHECK-P8-NEXT: li r4, 16
>> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2
>> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5
>> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2
>> -; CHECK-P8-NEXT: xvcvuxddp vs1, v3
>> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3
>> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5
>> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3
>> +; CHECK-P8-NEXT: xvcvuxddp vs1, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, vs0
>> ; CHECK-P8-NEXT: xxswapd vs1, vs1
>> ; CHECK-P8-NEXT: stxvd2x vs1, r3, r4
>> @@ -74,11 +72,10 @@ define void @test4elt(<4 x double>* noalias nocapture
>> sret %agg.result, i64 %a.c
>> ;
>> ; CHECK-P9-LABEL: test4elt:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI1_0 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI1_0 at toc@l
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> ; CHECK-P9-NEXT: xxlxor v4, v4, v4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI1_1 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI1_1 at toc@l
>> @@ -370,14 +367,13 @@ define <2 x double> @test2elt_signed(i32 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: test2elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r4, r2, .LCPI4_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: addi r3, r4, .LCPI4_0 at toc@l
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: lvx v3, 0, r3
>> +; CHECK-P8-NEXT: mtvsrwz v3, r3
>> ; CHECK-P8-NEXT: addis r3, r2, .LCPI4_1 at toc@ha
>> +; CHECK-P8-NEXT: addi r4, r4, .LCPI4_0 at toc@l
>> ; CHECK-P8-NEXT: addi r3, r3, .LCPI4_1 at toc@l
>> +; CHECK-P8-NEXT: lvx v2, 0, r4
>> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3
>> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2
>> ; CHECK-P8-NEXT: xxswapd v3, vs0
>> ; CHECK-P8-NEXT: vsld v2, v2, v3
>> ; CHECK-P8-NEXT: vsrad v2, v2, v3
>> @@ -415,17 +411,16 @@ define void @test4elt_signed(<4 x double>* noalias
>> nocapture sret %agg.result, i
>> ; CHECK-P8-LABEL: test4elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r5, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI5_2 at toc@ha
>> -; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI5_2 at toc@l
>> -; CHECK-P8-NEXT: lvx v2, 0, r5
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: lvx v4, 0, r4
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI5_2 at toc@ha
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_1 at toc@ha
>> +; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l
>> ; CHECK-P8-NEXT: addi r4, r4, .LCPI5_1 at toc@l
>> +; CHECK-P8-NEXT: lvx v2, 0, r5
>> +; CHECK-P8-NEXT: addi r5, r6, .LCPI5_2 at toc@l
>> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
>> ; CHECK-P8-NEXT: li r4, 16
>> +; CHECK-P8-NEXT: lvx v4, 0, r5
>> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2
>> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4
>> ; CHECK-P8-NEXT: xxswapd v4, vs0
>> @@ -443,14 +438,13 @@ define void @test4elt_signed(<4 x double>* noalias
>> nocapture sret %agg.result, i
>> ;
>> ; CHECK-P9-LABEL: test4elt_signed:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI5_0 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI5_0 at toc@l
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI5_1 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI5_1 at toc@l
>> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3
>> ; CHECK-P9-NEXT: vextsh2d v3, v3
>> ; CHECK-P9-NEXT: xvcvsxddp vs0, v3
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> index 6f046f69ecca..f152c2b008ff 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> @@ -18,9 +18,9 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr
>> #0 {
>> ; CHECK-P8-NEXT: xscvuxdsp f0, f0
>> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
>> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -30,12 +30,12 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr
>> #0 {
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> ; CHECK-P9-NEXT: xscvuxdsp f0, f0
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
>> ; CHECK-P9-NEXT: xxlor vs0, v2, v2
>> ; CHECK-P9-NEXT: xscvuxdsp f0, f0
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -311,9 +311,9 @@ define i64 @test2elt_signed(<2 x i64> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvsxdsp f0, f0
>> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
>> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -323,12 +323,12 @@ define i64 @test2elt_signed(<2 x i64> %a)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xxswapd vs0, v2
>> ; CHECK-P9-NEXT: xscvsxdsp f0, f0
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
>> ; CHECK-P9-NEXT: xxlor vs0, v2, v2
>> ; CHECK-P9-NEXT: xscvsxdsp f0, f0
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> index ce97ed67baa1..f2cb9f5f45fb 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> @@ -24,9 +24,9 @@ define i64 @test2elt(i16 %a.coerce) local_unnamed_addr
>> #0 {
>> ; CHECK-P8-NEXT: xscvuxdsp f1, f1
>> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
>> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
>> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -43,12 +43,12 @@ define i64 @test2elt(i16 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> ; CHECK-P9-NEXT: vextubrx r3, r3, v2
>> ; CHECK-P9-NEXT: clrlwi r3, r3, 24
>> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
>> ; CHECK-P9-NEXT: mtfprwz f0, r3
>> ; CHECK-P9-NEXT: xscvuxdsp f0, f0
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -81,11 +81,10 @@ define <4 x float> @test4elt(i32 %a.coerce)
>> local_unnamed_addr #1 {
>> ; CHECK-P8-LABEL: test4elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r4, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: addi r3, r4, .LCPI1_0 at toc@l
>> +; CHECK-P8-NEXT: mtvsrwz v2, r3
>> +; CHECK-P8-NEXT: addi r4, r4, .LCPI1_0 at toc@l
>> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: lvx v3, 0, r3
>> +; CHECK-P8-NEXT: lvx v3, 0, r4
>> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3
>> ; CHECK-P8-NEXT: xvcvuxwsp v2, v2
>> ; CHECK-P8-NEXT: blr
>> @@ -121,30 +120,28 @@ define void @test8elt(<8 x float>* noalias
>> nocapture sret %agg.result, i64 %a.co
>> ; CHECK-P8-LABEL: test8elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r5, r2, .LCPI2_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI2_1 at toc@ha
>> +; CHECK-P8-NEXT: mtvsrd v2, r4
>> ; CHECK-P8-NEXT: addi r5, r5, .LCPI2_0 at toc@l
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_1 at toc@l
>> +; CHECK-P8-NEXT: addi r4, r6, .LCPI2_1 at toc@l
>> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: lvx v2, 0, r5
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> +; CHECK-P8-NEXT: lvx v3, 0, r5
>> ; CHECK-P8-NEXT: lvx v5, 0, r4
>> ; CHECK-P8-NEXT: li r4, 16
>> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2
>> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5
>> -; CHECK-P8-NEXT: xvcvuxwsp v2, v2
>> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3
>> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5
>> ; CHECK-P8-NEXT: xvcvuxwsp v3, v3
>> -; CHECK-P8-NEXT: stvx v2, 0, r3
>> -; CHECK-P8-NEXT: stvx v3, r3, r4
>> +; CHECK-P8-NEXT: xvcvuxwsp v2, v2
>> +; CHECK-P8-NEXT: stvx v3, 0, r3
>> +; CHECK-P8-NEXT: stvx v2, r3, r4
>> ; CHECK-P8-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: test8elt:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> ; CHECK-P9-NEXT: xxlxor v4, v4, v4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l
>> @@ -292,9 +289,9 @@ define i64 @test2elt_signed(i16 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-NEXT: xscvsxdsp f1, f1
>> ; CHECK-P8-NEXT: xscvdpspn vs0, f0
>> ; CHECK-P8-NEXT: xscvdpspn vs1, f1
>> -; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT: vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT: xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT: vmrghw v2, v3, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, v2
>> ; CHECK-P8-NEXT: mffprd r3, f0
>> ; CHECK-P8-NEXT: blr
>> @@ -311,12 +308,12 @@ define i64 @test2elt_signed(i16 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> ; CHECK-P9-NEXT: vextubrx r3, r3, v2
>> ; CHECK-P9-NEXT: extsb r3, r3
>> -; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT: xxsldwi v3, vs0, vs0, 3
>> ; CHECK-P9-NEXT: mtfprwa f0, r3
>> ; CHECK-P9-NEXT: xscvsxdsp f0, f0
>> ; CHECK-P9-NEXT: xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT: vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT: vmrghw v2, v2, v3
>> ; CHECK-P9-NEXT: mfvsrld r3, v2
>> ; CHECK-P9-NEXT: blr
>> ;
>> @@ -349,11 +346,10 @@ define <4 x float> @test4elt_signed(i32 %a.coerce)
>> local_unnamed_addr #1 {
>> ; CHECK-P8-LABEL: test4elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: addi r3, r4, .LCPI5_0 at toc@l
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: lvx v3, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3
>> +; CHECK-P8-NEXT: mtvsrwz v3, r3
>> +; CHECK-P8-NEXT: addi r4, r4, .LCPI5_0 at toc@l
>> +; CHECK-P8-NEXT: lvx v2, 0, r4
>> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2
>> ; CHECK-P8-NEXT: vspltisw v3, 12
>> ; CHECK-P8-NEXT: vadduwm v3, v3, v3
>> ; CHECK-P8-NEXT: vslw v2, v2, v3
>> @@ -392,15 +388,14 @@ define void @test8elt_signed(<8 x float>* noalias
>> nocapture sret %agg.result, i6
>> ; CHECK-P8-LABEL: test8elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r5, r2, .LCPI6_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_1 at toc@ha
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_1 at toc@ha
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> ; CHECK-P8-NEXT: vspltisw v5, 12
>> +; CHECK-P8-NEXT: li r4, 16
>> ; CHECK-P8-NEXT: addi r5, r5, .LCPI6_0 at toc@l
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_1 at toc@l
>> ; CHECK-P8-NEXT: lvx v2, 0, r5
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: lvx v4, 0, r4
>> -; CHECK-P8-NEXT: li r4, 16
>> +; CHECK-P8-NEXT: addi r5, r6, .LCPI6_1 at toc@l
>> +; CHECK-P8-NEXT: lvx v4, 0, r5
>> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2
>> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4
>> ; CHECK-P8-NEXT: vadduwm v4, v5, v5
>> @@ -416,14 +411,13 @@ define void @test8elt_signed(<8 x float>* noalias
>> nocapture sret %agg.result, i6
>> ;
>> ; CHECK-P9-LABEL: test8elt_signed:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_0 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_0 at toc@l
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_1 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_1 at toc@l
>> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3
>> ; CHECK-P9-NEXT: vextsb2w v3, v3
>> ; CHECK-P9-NEXT: xvcvsxwsp vs0, v3
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> index b4582e844f30..268fc9b7d4cc 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i16 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: test2elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r4, r2, .LCPI0_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: addi r3, r4, .LCPI0_0 at toc@l
>> +; CHECK-P8-NEXT: mtvsrwz v2, r3
>> +; CHECK-P8-NEXT: addi r4, r4, .LCPI0_0 at toc@l
>> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: lvx v3, 0, r3
>> +; CHECK-P8-NEXT: lvx v3, 0, r4
>> ; CHECK-P8-NEXT: vperm v2, v4, v2, v3
>> ; CHECK-P8-NEXT: xvcvuxddp v2, v2
>> ; CHECK-P8-NEXT: blr
>> @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture
>> sret %agg.result, i32 %a.c
>> ; CHECK-P8-LABEL: test4elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r5, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI1_1 at toc@ha
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI1_1 at toc@ha
>> +; CHECK-P8-NEXT: mtvsrwz v2, r4
>> ; CHECK-P8-NEXT: addi r5, r5, .LCPI1_0 at toc@l
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI1_1 at toc@l
>> +; CHECK-P8-NEXT: addi r4, r6, .LCPI1_1 at toc@l
>> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: lvx v2, 0, r5
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> +; CHECK-P8-NEXT: lvx v3, 0, r5
>> ; CHECK-P8-NEXT: lvx v5, 0, r4
>> ; CHECK-P8-NEXT: li r4, 16
>> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2
>> -; CHECK-P8-NEXT: vperm v3, v4, v3, v5
>> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2
>> -; CHECK-P8-NEXT: xvcvuxddp vs1, v3
>> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3
>> +; CHECK-P8-NEXT: vperm v2, v4, v2, v5
>> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3
>> +; CHECK-P8-NEXT: xvcvuxddp vs1, v2
>> ; CHECK-P8-NEXT: xxswapd vs0, vs0
>> ; CHECK-P8-NEXT: xxswapd vs1, vs1
>> ; CHECK-P8-NEXT: stxvd2x vs1, r3, r4
>> @@ -118,33 +116,32 @@ define void @test8elt(<8 x double>* noalias
>> nocapture sret %agg.result, i64 %a.c
>> ; CHECK-P8-LABEL: test8elt:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r5, r2, .LCPI2_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_2 at toc@ha
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI2_2 at toc@ha
>> +; CHECK-P8-NEXT: mtvsrd v2, r4
>> +; CHECK-P8-NEXT: addis r4, r2, .LCPI2_3 at toc@ha
>> ; CHECK-P8-NEXT: addi r5, r5, .LCPI2_0 at toc@l
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_2 at toc@l
>> +; CHECK-P8-NEXT: addi r4, r4, .LCPI2_3 at toc@l
>> ; CHECK-P8-NEXT: xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT: lvx v2, 0, r5
>> -; CHECK-P8-NEXT: addis r5, r2, .LCPI2_3 at toc@ha
>> -; CHECK-P8-NEXT: lvx v5, 0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: addi r5, r5, .LCPI2_3 at toc@l
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI2_1 at toc@l
>> -; CHECK-P8-NEXT: lvx v0, 0, r5
>> -; CHECK-P8-NEXT: lvx v1, 0, r4
>> +; CHECK-P8-NEXT: lvx v3, 0, r5
>> +; CHECK-P8-NEXT: addi r5, r6, .LCPI2_2 at toc@l
>> +; CHECK-P8-NEXT: lvx v0, 0, r4
>> ; CHECK-P8-NEXT: li r4, 48
>> +; CHECK-P8-NEXT: lvx v5, 0, r5
>> +; CHECK-P8-NEXT: addis r5, r2, .LCPI2_1 at toc@ha
>> +; CHECK-P8-NEXT: addi r5, r5, .LCPI2_1 at toc@l
>> +; CHECK-P8-NEXT: lvx v1, 0, r5
>> +; CHECK-P8-NEXT: vperm v0, v4, v2, v0
>> ; CHECK-P8-NEXT: li r5, 32
>> -; CHECK-P8-NEXT: vperm v2, v4, v3, v2
>> -; CHECK-P8-NEXT: vperm v5, v4, v3, v5
>> -; CHECK-P8-NEXT: vperm v0, v4, v3, v0
>> -; CHECK-P8-NEXT: vperm v3, v4, v3, v1
>> -; CHECK-P8-NEXT: xvcvuxddp vs0, v2
>> -; CHECK-P8-NEXT: xvcvuxddp vs1, v5
>> +; CHECK-P8-NEXT: vperm v3, v4, v2, v3
>> +; CHECK-P8-NEXT: vperm v5, v4, v2, v5
>> +; CHECK-P8-NEXT: vperm v2, v4, v2, v1
>> ; CHECK-P8-NEXT: xvcvuxddp vs2, v0
>> -; CHECK-P8-NEXT: xvcvuxddp vs3, v3
>> +; CHECK-P8-NEXT: xvcvuxddp vs0, v3
>> +; CHECK-P8-NEXT: xvcvuxddp vs1, v5
>> +; CHECK-P8-NEXT: xvcvuxddp vs3, v2
>> +; CHECK-P8-NEXT: xxswapd vs2, vs2
>> ; CHECK-P8-NEXT: xxswapd vs0, vs0
>> ; CHECK-P8-NEXT: xxswapd vs1, vs1
>> -; CHECK-P8-NEXT: xxswapd vs2, vs2
>> ; CHECK-P8-NEXT: xxswapd vs3, vs3
>> ; CHECK-P8-NEXT: stxvd2x vs2, r3, r4
>> ; CHECK-P8-NEXT: li r4, 16
>> @@ -155,11 +152,10 @@ define void @test8elt(<8 x double>* noalias
>> nocapture sret %agg.result, i64 %a.c
>> ;
>> ; CHECK-P9-LABEL: test8elt:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_0 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_0 at toc@l
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> ; CHECK-P9-NEXT: xxlxor v4, v4, v4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI2_1 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI2_1 at toc@l
>> @@ -404,14 +400,13 @@ define <2 x double> @test2elt_signed(i16 %a.coerce)
>> local_unnamed_addr #0 {
>> ; CHECK-P8-LABEL: test2elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r4, r2, .LCPI4_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r3
>> -; CHECK-P8-NEXT: addi r3, r4, .LCPI4_0 at toc@l
>> -; CHECK-P8-NEXT: xxswapd v2, vs0
>> -; CHECK-P8-NEXT: lvx v3, 0, r3
>> +; CHECK-P8-NEXT: mtvsrwz v3, r3
>> ; CHECK-P8-NEXT: addis r3, r2, .LCPI4_1 at toc@ha
>> +; CHECK-P8-NEXT: addi r4, r4, .LCPI4_0 at toc@l
>> ; CHECK-P8-NEXT: addi r3, r3, .LCPI4_1 at toc@l
>> +; CHECK-P8-NEXT: lvx v2, 0, r4
>> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r3
>> -; CHECK-P8-NEXT: vperm v2, v2, v2, v3
>> +; CHECK-P8-NEXT: vperm v2, v3, v3, v2
>> ; CHECK-P8-NEXT: xxswapd v3, vs0
>> ; CHECK-P8-NEXT: vsld v2, v2, v3
>> ; CHECK-P8-NEXT: vsrad v2, v2, v3
>> @@ -449,17 +444,16 @@ define void @test4elt_signed(<4 x double>* noalias
>> nocapture sret %agg.result, i
>> ; CHECK-P8-LABEL: test4elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> ; CHECK-P8-NEXT: addis r5, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI5_2 at toc@ha
>> -; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI5_2 at toc@l
>> -; CHECK-P8-NEXT: lvx v2, 0, r5
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: lvx v4, 0, r4
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI5_2 at toc@ha
>> +; CHECK-P8-NEXT: mtvsrwz v3, r4
>> ; CHECK-P8-NEXT: addis r4, r2, .LCPI5_1 at toc@ha
>> +; CHECK-P8-NEXT: addi r5, r5, .LCPI5_0 at toc@l
>> ; CHECK-P8-NEXT: addi r4, r4, .LCPI5_1 at toc@l
>> +; CHECK-P8-NEXT: lvx v2, 0, r5
>> +; CHECK-P8-NEXT: addi r5, r6, .LCPI5_2 at toc@l
>> ; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
>> ; CHECK-P8-NEXT: li r4, 16
>> +; CHECK-P8-NEXT: lvx v4, 0, r5
>> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2
>> ; CHECK-P8-NEXT: vperm v3, v3, v3, v4
>> ; CHECK-P8-NEXT: xxswapd v4, vs0
>> @@ -523,26 +517,25 @@ entry:
>> define void @test8elt_signed(<8 x double>* noalias nocapture sret
>> %agg.result, i64 %a.coerce) local_unnamed_addr #1 {
>> ; CHECK-P8-LABEL: test8elt_signed:
>> ; CHECK-P8: # %bb.0: # %entry
>> -; CHECK-P8-NEXT: mtfprd f0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_2 at toc@ha
>> ; CHECK-P8-NEXT: addis r5, r2, .LCPI6_0 at toc@ha
>> -; CHECK-P8-NEXT: addis r6, r2, .LCPI6_3 at toc@ha
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_2 at toc@l
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_2 at toc@ha
>> +; CHECK-P8-NEXT: mtvsrd v3, r4
>> +; CHECK-P8-NEXT: addis r4, r2, .LCPI6_1 at toc@ha
>> ; CHECK-P8-NEXT: addi r5, r5, .LCPI6_0 at toc@l
>> -; CHECK-P8-NEXT: addi r6, r6, .LCPI6_3 at toc@l
>> -; CHECK-P8-NEXT: lvx v4, 0, r4
>> -; CHECK-P8-NEXT: addis r4, r2, .LCPI6_4 at toc@ha
>> +; CHECK-P8-NEXT: addi r6, r6, .LCPI6_2 at toc@l
>> +; CHECK-P8-NEXT: addi r4, r4, .LCPI6_1 at toc@l
>> ; CHECK-P8-NEXT: lvx v2, 0, r5
>> -; CHECK-P8-NEXT: xxswapd v3, vs0
>> -; CHECK-P8-NEXT: lvx v5, 0, r6
>> -; CHECK-P8-NEXT: addis r5, r2, .LCPI6_1 at toc@ha
>> -; CHECK-P8-NEXT: addi r4, r4, .LCPI6_4 at toc@l
>> -; CHECK-P8-NEXT: addi r5, r5, .LCPI6_1 at toc@l
>> -; CHECK-P8-NEXT: lvx v0, 0, r4
>> -; CHECK-P8-NEXT: lxvd2x vs0, 0, r5
>> +; CHECK-P8-NEXT: addis r5, r2, .LCPI6_3 at toc@ha
>> +; CHECK-P8-NEXT: lvx v4, 0, r6
>> +; CHECK-P8-NEXT: addis r6, r2, .LCPI6_4 at toc@ha
>> +; CHECK-P8-NEXT: lxvd2x vs0, 0, r4
>> ; CHECK-P8-NEXT: li r4, 48
>> -; CHECK-P8-NEXT: li r5, 32
>> +; CHECK-P8-NEXT: addi r5, r5, .LCPI6_3 at toc@l
>> +; CHECK-P8-NEXT: lvx v5, 0, r5
>> +; CHECK-P8-NEXT: addi r5, r6, .LCPI6_4 at toc@l
>> +; CHECK-P8-NEXT: lvx v0, 0, r5
>> ; CHECK-P8-NEXT: vperm v2, v3, v3, v2
>> +; CHECK-P8-NEXT: li r5, 32
>> ; CHECK-P8-NEXT: vperm v4, v3, v3, v4
>> ; CHECK-P8-NEXT: vperm v5, v3, v3, v5
>> ; CHECK-P8-NEXT: vperm v3, v3, v3, v0
>> @@ -572,14 +565,13 @@ define void @test8elt_signed(<8 x double>* noalias
>> nocapture sret %agg.result, i
>> ;
>> ; CHECK-P9-LABEL: test8elt_signed:
>> ; CHECK-P9: # %bb.0: # %entry
>> -; CHECK-P9-NEXT: mtfprd f0, r4
>> +; CHECK-P9-NEXT: mtvsrd v2, r4
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_0 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_0 at toc@l
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>> -; CHECK-P9-NEXT: xxswapd v2, vs0
>> -; CHECK-P9-NEXT: vperm v3, v2, v2, v3
>> ; CHECK-P9-NEXT: addis r4, r2, .LCPI6_1 at toc@ha
>> ; CHECK-P9-NEXT: addi r4, r4, .LCPI6_1 at toc@l
>> +; CHECK-P9-NEXT: vperm v3, v2, v2, v3
>> ; CHECK-P9-NEXT: vextsb2d v3, v3
>> ; CHECK-P9-NEXT: xvcvsxddp vs0, v3
>> ; CHECK-P9-NEXT: lxvx v3, 0, r4
>>
>> diff --git
>> a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> index 7e51f2b862ab..29955dc17f67 100644
>> --- a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> @@ -82,10 +82,10 @@ define <3 x float> @constrained_vector_fdiv_v3f32()
>> #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: xscvdpspn 2, 2
>> ; PC64LE-NEXT: xscvdpspn 0, 0
>> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1
>> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1
>> -; PC64LE-NEXT: vmrglw 2, 3, 2
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3
>> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3
>> +; PC64LE-NEXT: vmrghw 2, 3, 2
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: blr
>> ;
>> @@ -106,12 +106,12 @@ define <3 x float> @constrained_vector_fdiv_v3f32()
>> #0 {
>> ; PC64LE9-NEXT: xsdivsp 2, 2, 0
>> ; PC64LE9-NEXT: xsdivsp 0, 3, 0
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: blr
>> entry:
>> @@ -359,11 +359,11 @@ define <3 x float> @constrained_vector_frem_v3f32()
>> #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI7_4 at toc@l
>> ; PC64LE-NEXT: lvx 4, 0, 3
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 30
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: addi 1, 1, 64
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -401,15 +401,15 @@ define <3 x float> @constrained_vector_frem_v3f32()
>> #0 {
>> ; PC64LE9-NEXT: bl fmodf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 29
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI7_4 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI7_4 at toc@l
>> ; PC64LE9-NEXT: lxvx 36, 0, 3
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: addi 1, 1, 64
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -710,10 +710,10 @@ define <3 x float> @constrained_vector_fmul_v3f32()
>> #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: xscvdpspn 2, 2
>> ; PC64LE-NEXT: xscvdpspn 0, 0
>> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1
>> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1
>> -; PC64LE-NEXT: vmrglw 2, 3, 2
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3
>> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3
>> +; PC64LE-NEXT: vmrghw 2, 3, 2
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: blr
>> ;
>> @@ -735,11 +735,11 @@ define <3 x float> @constrained_vector_fmul_v3f32()
>> #0 {
>> ; PC64LE9-NEXT: xsmulsp 1, 1, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> ; PC64LE9-NEXT: xscvdpspn 1, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 1, 1, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 1, 1, 3
>> ; PC64LE9-NEXT: xscvdpspn 1, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: blr
>> entry:
>> @@ -925,10 +925,10 @@ define <3 x float> @constrained_vector_fadd_v3f32()
>> #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: xscvdpspn 2, 2
>> ; PC64LE-NEXT: xscvdpspn 0, 0
>> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1
>> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1
>> -; PC64LE-NEXT: vmrglw 2, 3, 2
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3
>> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3
>> +; PC64LE-NEXT: vmrghw 2, 3, 2
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: blr
>> ;
>> @@ -945,15 +945,15 @@ define <3 x float> @constrained_vector_fadd_v3f32()
>> #0 {
>> ; PC64LE9-NEXT: xsaddsp 1, 0, 1
>> ; PC64LE9-NEXT: xsaddsp 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI17_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI17_3 at toc@l
>> ; PC64LE9-NEXT: lxvx 36, 0, 3
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: blr
>> entry:
>> @@ -1137,10 +1137,10 @@ define <3 x float>
>> @constrained_vector_fsub_v3f32() #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: xscvdpspn 2, 2
>> ; PC64LE-NEXT: xscvdpspn 0, 0
>> -; PC64LE-NEXT: xxsldwi 34, 1, 1, 1
>> -; PC64LE-NEXT: xxsldwi 35, 2, 2, 1
>> -; PC64LE-NEXT: vmrglw 2, 3, 2
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 1, 1, 3
>> +; PC64LE-NEXT: xxsldwi 35, 2, 2, 3
>> +; PC64LE-NEXT: vmrghw 2, 3, 2
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: blr
>> ;
>> @@ -1157,15 +1157,15 @@ define <3 x float>
>> @constrained_vector_fsub_v3f32() #0 {
>> ; PC64LE9-NEXT: xssubsp 1, 0, 1
>> ; PC64LE9-NEXT: xssubsp 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI22_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI22_3 at toc@l
>> ; PC64LE9-NEXT: lxvx 36, 0, 3
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: blr
>> entry:
>> @@ -1333,12 +1333,12 @@ define <3 x float>
>> @constrained_vector_sqrt_v3f32() #0 {
>> ; PC64LE-NEXT: xssqrtsp 2, 2
>> ; PC64LE-NEXT: xscvdpspn 0, 0
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 2
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 3, 2
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 3, 2
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: blr
>> ;
>> @@ -1358,10 +1358,10 @@ define <3 x float>
>> @constrained_vector_sqrt_v3f32() #0 {
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> ; PC64LE9-NEXT: xscvdpspn 1, 1
>> ; PC64LE9-NEXT: xscvdpspn 2, 2
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 2, 2, 1
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE9-NEXT: xxsldwi 34, 2, 2, 3
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: blr
>> @@ -1588,11 +1588,11 @@ define <3 x float>
>> @constrained_vector_pow_v3f32() #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI32_4 at toc@l
>> ; PC64LE-NEXT: lvx 4, 0, 3
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 30
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: addi 1, 1, 64
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -1630,15 +1630,15 @@ define <3 x float>
>> @constrained_vector_pow_v3f32() #0 {
>> ; PC64LE9-NEXT: bl powf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 29
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI32_4 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI32_4 at toc@l
>> ; PC64LE9-NEXT: lxvx 36, 0, 3
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: addi 1, 1, 64
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -1992,11 +1992,11 @@ define <3 x float>
>> @constrained_vector_powi_v3f32() #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI37_3 at toc@l
>> ; PC64LE-NEXT: lvx 4, 0, 3
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -2030,15 +2030,15 @@ define <3 x float>
>> @constrained_vector_powi_v3f32() #0 {
>> ; PC64LE9-NEXT: bl __powisf2
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI37_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI37_3 at toc@l
>> ; PC64LE9-NEXT: lxvx 36, 0, 3
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -2360,12 +2360,12 @@ define <3 x float>
>> @constrained_vector_sin_v3f32() #0 {
>> ; PC64LE-NEXT: addis 3, 2, .LCPI42_3 at toc@ha
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI42_3 at toc@l
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -2396,15 +2396,15 @@ define <3 x float>
>> @constrained_vector_sin_v3f32() #0 {
>> ; PC64LE9-NEXT: bl sinf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI42_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI42_3 at toc@l
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -2709,12 +2709,12 @@ define <3 x float>
>> @constrained_vector_cos_v3f32() #0 {
>> ; PC64LE-NEXT: addis 3, 2, .LCPI47_3 at toc@ha
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI47_3 at toc@l
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -2745,15 +2745,15 @@ define <3 x float>
>> @constrained_vector_cos_v3f32() #0 {
>> ; PC64LE9-NEXT: bl cosf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI47_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI47_3 at toc@l
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -3058,12 +3058,12 @@ define <3 x float>
>> @constrained_vector_exp_v3f32() #0 {
>> ; PC64LE-NEXT: addis 3, 2, .LCPI52_3 at toc@ha
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI52_3 at toc@l
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -3094,15 +3094,15 @@ define <3 x float>
>> @constrained_vector_exp_v3f32() #0 {
>> ; PC64LE9-NEXT: bl expf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI52_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI52_3 at toc@l
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -3407,12 +3407,12 @@ define <3 x float>
>> @constrained_vector_exp2_v3f32() #0 {
>> ; PC64LE-NEXT: addis 3, 2, .LCPI57_3 at toc@ha
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI57_3 at toc@l
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -3443,15 +3443,15 @@ define <3 x float>
>> @constrained_vector_exp2_v3f32() #0 {
>> ; PC64LE9-NEXT: bl exp2f
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI57_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI57_3 at toc@l
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -3756,12 +3756,12 @@ define <3 x float>
>> @constrained_vector_log_v3f32() #0 {
>> ; PC64LE-NEXT: addis 3, 2, .LCPI62_3 at toc@ha
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI62_3 at toc@l
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -3792,15 +3792,15 @@ define <3 x float>
>> @constrained_vector_log_v3f32() #0 {
>> ; PC64LE9-NEXT: bl logf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI62_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI62_3 at toc@l
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -4105,12 +4105,12 @@ define <3 x float>
>> @constrained_vector_log10_v3f32() #0 {
>> ; PC64LE-NEXT: addis 3, 2, .LCPI67_3 at toc@ha
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI67_3 at toc@l
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -4141,15 +4141,15 @@ define <3 x float>
>> @constrained_vector_log10_v3f32() #0 {
>> ; PC64LE9-NEXT: bl log10f
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI67_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI67_3 at toc@l
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -4454,12 +4454,12 @@ define <3 x float>
>> @constrained_vector_log2_v3f32() #0 {
>> ; PC64LE-NEXT: addis 3, 2, .LCPI72_3 at toc@ha
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI72_3 at toc@l
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -4490,15 +4490,15 @@ define <3 x float>
>> @constrained_vector_log2_v3f32() #0 {
>> ; PC64LE9-NEXT: bl log2f
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI72_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI72_3 at toc@l
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -4748,12 +4748,12 @@ define <3 x float>
>> @constrained_vector_rint_v3f32() #0 {
>> ; PC64LE-NEXT: xsrdpic 2, 2
>> ; PC64LE-NEXT: xscvdpspn 0, 0
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 2
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 3, 2
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 3, 2
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: blr
>> ;
>> @@ -4773,10 +4773,10 @@ define <3 x float>
>> @constrained_vector_rint_v3f32() #0 {
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> ; PC64LE9-NEXT: xscvdpspn 1, 1
>> ; PC64LE9-NEXT: xscvdpspn 2, 2
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> -; PC64LE9-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 2, 2, 1
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> +; PC64LE9-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE9-NEXT: xxsldwi 34, 2, 2, 3
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: blr
>> @@ -4947,12 +4947,12 @@ define <3 x float>
>> @constrained_vector_nearbyint_v3f32() #0 {
>> ; PC64LE-NEXT: addis 3, 2, .LCPI82_3 at toc@ha
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI82_3 at toc@l
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 31
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: addi 1, 1, 48
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -4983,15 +4983,15 @@ define <3 x float>
>> @constrained_vector_nearbyint_v3f32() #0 {
>> ; PC64LE9-NEXT: bl nearbyintf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 31
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI82_3 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI82_3 at toc@l
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: addi 1, 1, 48
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -5184,11 +5184,11 @@ define <3 x float>
>> @constrained_vector_maxnum_v3f32() #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI87_5 at toc@l
>> ; PC64LE-NEXT: lvx 4, 0, 3
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 30
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: addi 1, 1, 64
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -5227,15 +5227,15 @@ define <3 x float>
>> @constrained_vector_maxnum_v3f32() #0 {
>> ; PC64LE9-NEXT: bl fmaxf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 29
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI87_5 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI87_5 at toc@l
>> ; PC64LE9-NEXT: lxvx 36, 0, 3
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: addi 1, 1, 64
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -5471,11 +5471,11 @@ define <3 x float>
>> @constrained_vector_minnum_v3f32() #0 {
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> ; PC64LE-NEXT: addi 3, 3, .LCPI92_5 at toc@l
>> ; PC64LE-NEXT: lvx 4, 0, 3
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 30
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 2, 3
>> -; PC64LE-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 2, 3
>> +; PC64LE-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE-NEXT: addi 1, 1, 64
>> ; PC64LE-NEXT: ld 0, 16(1)
>> @@ -5514,15 +5514,15 @@ define <3 x float>
>> @constrained_vector_minnum_v3f32() #0 {
>> ; PC64LE9-NEXT: bl fminf
>> ; PC64LE9-NEXT: nop
>> ; PC64LE9-NEXT: xscvdpspn 0, 1
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 29
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: xscvdpspn 0, 30
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI92_5 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI92_5 at toc@l
>> ; PC64LE9-NEXT: lxvx 36, 0, 3
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 3, 2, 4
>> ; PC64LE9-NEXT: addi 1, 1, 64
>> ; PC64LE9-NEXT: ld 0, 16(1)
>> @@ -5686,9 +5686,9 @@ define <2 x float>
>> @constrained_vector_fptrunc_v2f64() #0 {
>> ; PC64LE-NEXT: xsrsp 1, 1
>> ; PC64LE-NEXT: xscvdpspn 0, 0
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 3, 2
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 3, 2
>> ; PC64LE-NEXT: blr
>> ;
>> ; PC64LE9-LABEL: constrained_vector_fptrunc_v2f64:
>> @@ -5698,12 +5698,12 @@ define <2 x float>
>> @constrained_vector_fptrunc_v2f64() #0 {
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI96_1 at toc@ha
>> ; PC64LE9-NEXT: xsrsp 0, 0
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: lfd 0, .LCPI96_1 at toc@l(3)
>> ; PC64LE9-NEXT: xsrsp 0, 0
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: blr
>> entry:
>> %result = call <2 x float>
>> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(
>> @@ -5729,12 +5729,12 @@ define <3 x float>
>> @constrained_vector_fptrunc_v3f64() #0 {
>> ; PC64LE-NEXT: xsrsp 2, 2
>> ; PC64LE-NEXT: xscvdpspn 0, 0
>> ; PC64LE-NEXT: xscvdpspn 1, 1
>> -; PC64LE-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE-NEXT: xscvdpspn 0, 2
>> -; PC64LE-NEXT: xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT: vmrglw 2, 3, 2
>> +; PC64LE-NEXT: xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT: vmrghw 2, 3, 2
>> ; PC64LE-NEXT: lvx 3, 0, 3
>> -; PC64LE-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE-NEXT: blr
>> ;
>> @@ -5745,20 +5745,20 @@ define <3 x float>
>> @constrained_vector_fptrunc_v3f64() #0 {
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_1 at toc@ha
>> ; PC64LE9-NEXT: xsrsp 0, 0
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> -; PC64LE9-NEXT: xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 34, 0, 0, 3
>> ; PC64LE9-NEXT: lfd 0, .LCPI97_1 at toc@l(3)
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_2 at toc@ha
>> ; PC64LE9-NEXT: addi 3, 3, .LCPI97_2 at toc@l
>> ; PC64LE9-NEXT: xsrsp 0, 0
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> -; PC64LE9-NEXT: xxsldwi 35, 0, 0, 1
>> -; PC64LE9-NEXT: vmrglw 2, 3, 2
>> +; PC64LE9-NEXT: xxsldwi 35, 0, 0, 3
>> +; PC64LE9-NEXT: vmrghw 2, 3, 2
>> ; PC64LE9-NEXT: lxvx 35, 0, 3
>> ; PC64LE9-NEXT: addis 3, 2, .LCPI97_3 at toc@ha
>> ; PC64LE9-NEXT: lfd 0, .LCPI97_3 at toc@l(3)
>> ; PC64LE9-NEXT: xsrsp 0, 0
>> ; PC64LE9-NEXT: xscvdpspn 0, 0
>> -; PC64LE9-NEXT: xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT: xxsldwi 36, 0, 0, 3
>> ; PC64LE9-NEXT: vperm 2, 4, 2, 3
>> ; PC64LE9-NEXT: blr
>> entry:
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vsx.ll
>> b/llvm/test/CodeGen/PowerPC/vsx.ll
>> index 8b4e3640ef6b..4a78218262ca 100644
>> --- a/llvm/test/CodeGen/PowerPC/vsx.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vsx.ll
>> @@ -1404,9 +1404,9 @@ define <2 x float> @test44(<2 x i64> %a) {
>> ; CHECK-LE-NEXT: xscvuxdsp f0, f0
>> ; CHECK-LE-NEXT: xscvdpspn vs1, f1
>> ; CHECK-LE-NEXT: xscvdpspn vs0, f0
>> -; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 1
>> -; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-LE-NEXT: vmrglw v2, v3, v2
>> +; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 3
>> +; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-LE-NEXT: vmrghw v2, v3, v2
>> ; CHECK-LE-NEXT: blr
>> %v = uitofp <2 x i64> %a to <2 x float>
>> ret <2 x float> %v
>> @@ -1486,9 +1486,9 @@ define <2 x float> @test45(<2 x i64> %a) {
>> ; CHECK-LE-NEXT: xscvsxdsp f0, f0
>> ; CHECK-LE-NEXT: xscvdpspn vs1, f1
>> ; CHECK-LE-NEXT: xscvdpspn vs0, f0
>> -; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 1
>> -; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 1
>> -; CHECK-LE-NEXT: vmrglw v2, v3, v2
>> +; CHECK-LE-NEXT: xxsldwi v3, vs1, vs1, 3
>> +; CHECK-LE-NEXT: xxsldwi v2, vs0, vs0, 3
>> +; CHECK-LE-NEXT: vmrghw v2, v3, v2
>> ; CHECK-LE-NEXT: blr
>> %v = sitofp <2 x i64> %a to <2 x float>
>> ret <2 x float> %v
>> @@ -2437,12 +2437,11 @@ define <2 x i32> @test80(i32 %v) {
>> ;
>> ; CHECK-LE-LABEL: test80:
>> ; CHECK-LE: # %bb.0:
>> -; CHECK-LE-NEXT: mtfprd f0, r3
>> +; CHECK-LE-NEXT: mtfprwz f0, r3
>> ; CHECK-LE-NEXT: addis r4, r2, .LCPI65_0 at toc@ha
>> ; CHECK-LE-NEXT: addi r3, r4, .LCPI65_0 at toc@l
>> -; CHECK-LE-NEXT: xxswapd vs0, vs0
>> +; CHECK-LE-NEXT: xxspltw v2, vs0, 1
>> ; CHECK-LE-NEXT: lvx v3, 0, r3
>> -; CHECK-LE-NEXT: xxspltw v2, vs0, 3
>> ; CHECK-LE-NEXT: vadduwm v2, v2, v3
>> ; CHECK-LE-NEXT: blr
>> %b1 = insertelement <2 x i32> undef, i32 %v, i32 0
>>
>> diff --git a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>> b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>> index 5c05f8dc3d81..a198604f79a4 100644
>> --- a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>> @@ -17,17 +17,15 @@ define <2 x double> @testi0(<2 x double>* %p1,
>> double* %p2) {
>> ; CHECK-NEXT: lxvd2x vs0, 0, r3
>> ; CHECK-NEXT: lfdx f1, 0, r4
>> ; CHECK-NEXT: xxswapd vs0, vs0
>> -; CHECK-NEXT: xxspltd vs1, vs1, 0
>> -; CHECK-NEXT: xxpermdi v2, vs0, vs1, 1
>> +; CHECK-NEXT: xxmrghd v2, vs0, vs1
>> ; CHECK-NEXT: blr
>> ;
>> ; CHECK-P9-VECTOR-LABEL: testi0:
>> ; CHECK-P9-VECTOR: # %bb.0:
>> ; CHECK-P9-VECTOR-NEXT: lxvd2x vs0, 0, r3
>> ; CHECK-P9-VECTOR-NEXT: lfdx f1, 0, r4
>> -; CHECK-P9-VECTOR-NEXT: xxspltd vs1, vs1, 0
>> ; CHECK-P9-VECTOR-NEXT: xxswapd vs0, vs0
>> -; CHECK-P9-VECTOR-NEXT: xxpermdi v2, vs0, vs1, 1
>> +; CHECK-P9-VECTOR-NEXT: xxmrghd v2, vs0, vs1
>> ; CHECK-P9-VECTOR-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testi0:
>> @@ -51,17 +49,15 @@ define <2 x double> @testi1(<2 x double>* %p1,
>> double* %p2) {
>> ; CHECK-NEXT: lxvd2x vs0, 0, r3
>> ; CHECK-NEXT: lfdx f1, 0, r4
>> ; CHECK-NEXT: xxswapd vs0, vs0
>> -; CHECK-NEXT: xxspltd vs1, vs1, 0
>> -; CHECK-NEXT: xxmrgld v2, vs1, vs0
>> +; CHECK-NEXT: xxpermdi v2, vs1, vs0, 1
>> ; CHECK-NEXT: blr
>> ;
>> ; CHECK-P9-VECTOR-LABEL: testi1:
>> ; CHECK-P9-VECTOR: # %bb.0:
>> ; CHECK-P9-VECTOR-NEXT: lxvd2x vs0, 0, r3
>> ; CHECK-P9-VECTOR-NEXT: lfdx f1, 0, r4
>> -; CHECK-P9-VECTOR-NEXT: xxspltd vs1, vs1, 0
>> ; CHECK-P9-VECTOR-NEXT: xxswapd vs0, vs0
>> -; CHECK-P9-VECTOR-NEXT: xxmrgld v2, vs1, vs0
>> +; CHECK-P9-VECTOR-NEXT: xxpermdi v2, vs1, vs0, 1
>> ; CHECK-P9-VECTOR-NEXT: blr
>> ;
>> ; CHECK-P9-LABEL: testi1:
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200708/8ffc45ee/attachment-0001.html>
More information about the llvm-commits
mailing list