[llvm] 1fed131 - [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE

Eric Christopher via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 8 07:55:49 PDT 2020


To follow up here: 1b1539712e1ee30c02ed20493682fc05d52391c0 fixed the
crashes I was seeing. Thanks Nemanja! :)

On Mon, Jul 6, 2020 at 4:58 PM Eric Christopher <echristo at gmail.com> wrote:

> Hi Nemanja!
>
> Running into a compiler crash with this building skia (https://skia.org/)
> for power after this patch. I'll see what I can do to get a testcase (if it
> doesn't reproduce for you), but would you mind terribly reverting in the
> meantime?
>
> Thanks!
>
> -eric
>
> On Thu, Jun 18, 2020 at 7:55 PM Nemanja Ivanovic via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>>
>> Author: Nemanja Ivanovic
>> Date: 2020-06-18T21:54:22-05:00
>> New Revision: 1fed131660b2c5d3ea7007e273a7a5da80699445
>>
>> URL:
>> https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445
>> DIFF:
>> https://github.com/llvm/llvm-project/commit/1fed131660b2c5d3ea7007e273a7a5da80699445.diff
>>
>> LOG: [PowerPC] Canonicalize shuffles to match more single-instruction
>> masks on LE
>>
>> We currently miss a number of opportunities to emit single-instruction
>> VMRG[LH][BHW] instructions for shuffles on little endian subtargets.
>> Although
>> this in itself is not a huge performance opportunity since loading the
>> permute
>> vector for a VPERM can always be pulled out of loops, producing such merge
>> instructions is useful to downstream optimizations.
>> Since VPERM is essentially opaque to all subsequent optimizations, we
>> want to
>> avoid it as much as possible. Other permute instructions have semantics
>> that can
>> be reasoned about much more easily in later optimizations.
>>
>> This patch does the following:
>> - Canonicalize shuffles so that the first element comes from the first
>> vector
>>   (since that's what most of the mask matching functions want)
>> - Switch the elements that come from splat vectors so that they match the
>>   corresponding elements from the other vector (to allow for merges)
>> - Adds debugging messages for when a shuffle is matched to a VPERM so that
>>   anyone interested in improving this further can get the info for their
>> code
>>
>> Differential revision: https://reviews.llvm.org/D77448
>>
>> Added:
>>
>>
>> Modified:
>>     llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>>     llvm/lib/Target/PowerPC/PPCISelLowering.h
>>     llvm/lib/Target/PowerPC/PPCInstrVSX.td
>>     llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>>     llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>>     llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>>     llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>>     llvm/test/CodeGen/PowerPC/load-and-splat.ll
>>     llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>>     llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>>     llvm/test/CodeGen/PowerPC/pr25080.ll
>>     llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>>     llvm/test/CodeGen/PowerPC/pr38087.ll
>>     llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>>     llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>>     llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>>     llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>>     llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>>     llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>>     llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>>     llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>>     llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>>     llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>>     llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>>     llvm/test/CodeGen/PowerPC/vsx.ll
>>     llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>>
>> Removed:
>>
>>
>>
>>
>> ################################################################################
>> diff  --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> index d7698a5ec962..28bd80610c84 100644
>> --- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
>> @@ -125,6 +125,7 @@ cl::desc("use absolute jump tables on ppc"),
>> cl::Hidden);
>>
>>  STATISTIC(NumTailCalls, "Number of tail calls");
>>  STATISTIC(NumSiblingCalls, "Number of sibling calls");
>> +STATISTIC(ShufflesHandledWithVPERM, "Number of shuffles lowered to a
>> VPERM");
>>
>>  static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int);
>>
>> @@ -1505,6 +1506,8 @@ const char
>> *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
>>    case PPCISD::MTVSRZ:          return "PPCISD::MTVSRZ";
>>    case PPCISD::SINT_VEC_TO_FP:  return "PPCISD::SINT_VEC_TO_FP";
>>    case PPCISD::UINT_VEC_TO_FP:  return "PPCISD::UINT_VEC_TO_FP";
>> +  case PPCISD::SCALAR_TO_VECTOR_PERMUTED:
>> +    return "PPCISD::SCALAR_TO_VECTOR_PERMUTED";
>>    case PPCISD::ANDI_rec_1_EQ_BIT:
>>      return "PPCISD::ANDI_rec_1_EQ_BIT";
>>    case PPCISD::ANDI_rec_1_GT_BIT:
>> @@ -2716,7 +2719,8 @@ static bool usePartialVectorLoads(SDNode *N, const
>> PPCSubtarget& ST) {
>>    for (SDNode::use_iterator UI = LD->use_begin(), UE = LD->use_end();
>>         UI != UE; ++UI)
>>      if (UI.getUse().get().getResNo() == 0 &&
>> -        UI->getOpcode() != ISD::SCALAR_TO_VECTOR)
>> +        UI->getOpcode() != ISD::SCALAR_TO_VECTOR &&
>> +        UI->getOpcode() != PPCISD::SCALAR_TO_VECTOR_PERMUTED)
>>        return false;
>>
>>    return true;
>> @@ -9041,7 +9045,8 @@ static const SDValue *getNormalLoadInput(const
>> SDValue &Op) {
>>    const SDValue *InputLoad = &Op;
>>    if (InputLoad->getOpcode() == ISD::BITCAST)
>>      InputLoad = &InputLoad->getOperand(0);
>> -  if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR)
>> +  if (InputLoad->getOpcode() == ISD::SCALAR_TO_VECTOR ||
>> +      InputLoad->getOpcode() == PPCISD::SCALAR_TO_VECTOR_PERMUTED)
>>      InputLoad = &InputLoad->getOperand(0);
>>    if (InputLoad->getOpcode() != ISD::LOAD)
>>      return nullptr;
>> @@ -9690,6 +9695,15 @@ SDValue
>> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
>>    SDValue V1 = Op.getOperand(0);
>>    SDValue V2 = Op.getOperand(1);
>>    ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);
>> +
>> +  // Any nodes that were combined in the target-independent combiner
>> prior
>> +  // to vector legalization will not be sent to the target combine. Try
>> to
>> +  // combine it here.
>> +  if (SDValue NewShuffle = combineVectorShuffle(SVOp, DAG)) {
>> +    DAG.ReplaceAllUsesOfValueWith(Op, NewShuffle);
>> +    Op = NewShuffle;
>> +    SVOp = cast<ShuffleVectorSDNode>(Op);
>> +  }
>>    EVT VT = Op.getValueType();
>>    bool isLittleEndian = Subtarget.isLittleEndian();
>>
>> @@ -9715,6 +9729,11 @@ SDValue
>> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
>>          Offset = isLittleEndian ? (3 - SplatIdx) * 4 : SplatIdx * 4;
>>        else
>>          Offset = isLittleEndian ? (1 - SplatIdx) * 8 : SplatIdx * 8;
>> +
>> +      // If we are loading a partial vector, it does not make sense to
>> adjust
>> +      // the base pointer. This happens with (splat (s_to_v_permuted
>> (ld))).
>> +      if (LD->getMemoryVT().getSizeInBits() == (IsFourByte ? 32 : 64))
>> +        Offset = 0;
>>        SDValue BasePtr = LD->getBasePtr();
>>        if (Offset != 0)
>>          BasePtr = DAG.getNode(ISD::ADD, dl,
>> getPointerTy(DAG.getDataLayout()),
>> @@ -9988,7 +10007,13 @@ SDValue
>> PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
>>                                               MVT::i32));
>>    }
>>
>> +  ShufflesHandledWithVPERM++;
>>    SDValue VPermMask = DAG.getBuildVector(MVT::v16i8, dl, ResultMask);
>> +  LLVM_DEBUG(dbgs() << "Emitting a VPERM for the following shuffle:\n");
>> +  LLVM_DEBUG(SVOp->dump());
>> +  LLVM_DEBUG(dbgs() << "With the following permute control vector:\n");
>> +  LLVM_DEBUG(VPermMask.dump());
>> +
>>    if (isLittleEndian)
>>      return DAG.getNode(PPCISD::VPERM, dl, V1.getValueType(),
>>                         V2, V1, VPermMask);
>> @@ -14114,6 +14139,199 @@ SDValue
>> PPCTargetLowering::combineStoreFPToInt(SDNode *N,
>>    return Val;
>>  }
>>
>> +static bool isAlternatingShuffMask(const ArrayRef<int> &Mask, int
>> NumElts) {
>> +  // Check that the source of the element keeps flipping
>> +  // (i.e. Mask[i] < NumElts -> Mask[i+i] >= NumElts).
>> +  bool PrevElemFromFirstVec = Mask[0] < NumElts;
>> +  for (int i = 1, e = Mask.size(); i < e; i++) {
>> +    if (PrevElemFromFirstVec && Mask[i] < NumElts)
>> +      return false;
>> +    if (!PrevElemFromFirstVec && Mask[i] >= NumElts)
>> +      return false;
>> +    PrevElemFromFirstVec = !PrevElemFromFirstVec;
>> +  }
>> +  return true;
>> +}
>> +
>> +static bool isSplatBV(SDValue Op) {
>> +  if (Op.getOpcode() != ISD::BUILD_VECTOR)
>> +    return false;
>> +  SDValue FirstOp;
>> +
>> +  // Find first non-undef input.
>> +  for (int i = 0, e = Op.getNumOperands(); i < e; i++) {
>> +    FirstOp = Op.getOperand(i);
>> +    if (!FirstOp.isUndef())
>> +      break;
>> +  }
>> +
>> +  // All inputs are undef or the same as the first non-undef input.
>> +  for (int i = 1, e = Op.getNumOperands(); i < e; i++)
>> +    if (Op.getOperand(i) != FirstOp && !Op.getOperand(i).isUndef())
>> +      return false;
>> +  return true;
>> +}
>> +
>> +static SDValue isScalarToVec(SDValue Op) {
>> +  if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR)
>> +    return Op;
>> +  if (Op.getOpcode() != ISD::BITCAST)
>> +    return SDValue();
>> +  Op = Op.getOperand(0);
>> +  if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR)
>> +    return Op;
>> +  return SDValue();
>> +}
>> +
>> +static void fixupShuffleMaskForPermutedSToV(SmallVectorImpl<int> &ShuffV,
>> +                                            int LHSMaxIdx, int RHSMinIdx,
>> +                                            int RHSMaxIdx, int HalfVec) {
>> +  for (int i = 0, e = ShuffV.size(); i < e; i++) {
>> +    int Idx = ShuffV[i];
>> +    if ((Idx >= 0 && Idx < LHSMaxIdx) || (Idx >= RHSMinIdx && Idx <
>> RHSMaxIdx))
>> +      ShuffV[i] += HalfVec;
>> +  }
>> +  return;
>> +}
>> +
>> +// Replace a SCALAR_TO_VECTOR with a SCALAR_TO_VECTOR_PERMUTED except if
>> +// the original is:
>> +// (<n x Ty> (scalar_to_vector (Ty (extract_elt <n x Ty> %a, C))))
>> +// In such a case, just change the shuffle mask to extract the element
>> +// from the permuted index.
>> +static SDValue getSToVPermuted(SDValue OrigSToV, SelectionDAG &DAG) {
>> +  SDLoc dl(OrigSToV);
>> +  EVT VT = OrigSToV.getValueType();
>> +  assert(OrigSToV.getOpcode() == ISD::SCALAR_TO_VECTOR &&
>> +         "Expecting a SCALAR_TO_VECTOR here");
>> +  SDValue Input = OrigSToV.getOperand(0);
>> +
>> +  if (Input.getOpcode() == ISD::EXTRACT_VECTOR_ELT) {
>> +    ConstantSDNode *Idx = dyn_cast<ConstantSDNode>(Input.getOperand(1));
>> +    SDValue OrigVector = Input.getOperand(0);
>> +
>> +    // Can't handle non-const element indices or
>> diff erent vector types
>> +    // for the input to the extract and the output of the
>> scalar_to_vector.
>> +    if (Idx && VT == OrigVector.getValueType()) {
>> +      SmallVector<int, 16> NewMask(VT.getVectorNumElements(), -1);
>> +      NewMask[VT.getVectorNumElements() / 2] = Idx->getZExtValue();
>> +      return DAG.getVectorShuffle(VT, dl, OrigVector, OrigVector,
>> NewMask);
>> +    }
>> +  }
>> +  return DAG.getNode(PPCISD::SCALAR_TO_VECTOR_PERMUTED, dl, VT,
>> +                     OrigSToV.getOperand(0));
>> +}
>> +
>> +// On little endian subtargets, combine shuffles such as:
>> +// vector_shuffle<16,1,17,3,18,5,19,7,20,9,21,11,22,13,23,15>, <zero>, %b
>> +// into:
>> +// vector_shuffle<16,0,17,1,18,2,19,3,20,4,21,5,22,6,23,7>, <zero>, %b
>> +// because the latter can be matched to a single instruction merge.
>> +// Furthermore, SCALAR_TO_VECTOR on little endian always involves a
>> permute
>> +// to put the value into element zero. Adjust the shuffle mask so that
>> the
>> +// vector can remain in permuted form (to prevent a swap prior to a
>> shuffle).
>> +SDValue PPCTargetLowering::combineVectorShuffle(ShuffleVectorSDNode *SVN,
>> +                                                SelectionDAG &DAG) const
>> {
>> +  SDValue LHS = SVN->getOperand(0);
>> +  SDValue RHS = SVN->getOperand(1);
>> +  auto Mask = SVN->getMask();
>> +  int NumElts = LHS.getValueType().getVectorNumElements();
>> +  SDValue Res(SVN, 0);
>> +  SDLoc dl(SVN);
>> +
>> +  // None of these combines are useful on big endian systems since the
>> ISA
>> +  // already has a big endian bias.
>> +  if (!Subtarget.isLittleEndian())
>> +    return Res;
>> +
>> +  // If this is not a shuffle of a shuffle and the first element comes
>> from
>> +  // the second vector, canonicalize to the commuted form. This will
>> make it
>> +  // more likely to match one of the single instruction patterns.
>> +  if (Mask[0] >= NumElts && LHS.getOpcode() != ISD::VECTOR_SHUFFLE &&
>> +      RHS.getOpcode() != ISD::VECTOR_SHUFFLE) {
>> +    std::swap(LHS, RHS);
>> +    Res = DAG.getCommutedVectorShuffle(*SVN);
>> +    Mask = cast<ShuffleVectorSDNode>(Res)->getMask();
>> +  }
>> +
>> +  // Adjust the shuffle mask if either input vector comes from a
>> +  // SCALAR_TO_VECTOR and keep the respective input vector in permuted
>> +  // form (to prevent the need for a swap).
>> +  SmallVector<int, 16> ShuffV(Mask.begin(), Mask.end());
>> +  SDValue SToVLHS = isScalarToVec(LHS);
>> +  SDValue SToVRHS = isScalarToVec(RHS);
>> +  if (SToVLHS || SToVRHS) {
>> +    int NumEltsIn = SToVLHS ?
>> SToVLHS.getValueType().getVectorNumElements()
>> +                            :
>> SToVRHS.getValueType().getVectorNumElements();
>> +    int NumEltsOut = ShuffV.size();
>> +
>> +    // Initially assume that neither input is permuted. These will be
>> adjusted
>> +    // accordingly if either input is.
>> +    int LHSMaxIdx = -1;
>> +    int RHSMinIdx = -1;
>> +    int RHSMaxIdx = -1;
>> +    int HalfVec = LHS.getValueType().getVectorNumElements() / 2;
>> +
>> +    // Get the permuted scalar to vector nodes for the source(s) that
>> come from
>> +    // ISD::SCALAR_TO_VECTOR.
>> +    if (SToVLHS) {
>> +      // Set up the values for the shuffle vector fixup.
>> +      LHSMaxIdx = NumEltsOut / NumEltsIn;
>> +      SToVLHS = getSToVPermuted(SToVLHS, DAG);
>> +      if (SToVLHS.getValueType() != LHS.getValueType())
>> +        SToVLHS = DAG.getBitcast(LHS.getValueType(), SToVLHS);
>> +      LHS = SToVLHS;
>> +    }
>> +    if (SToVRHS) {
>> +      RHSMinIdx = NumEltsOut;
>> +      RHSMaxIdx = NumEltsOut / NumEltsIn + RHSMinIdx;
>> +      SToVRHS = getSToVPermuted(SToVRHS, DAG);
>> +      if (SToVRHS.getValueType() != RHS.getValueType())
>> +        SToVRHS = DAG.getBitcast(RHS.getValueType(), SToVRHS);
>> +      RHS = SToVRHS;
>> +    }
>> +
>> +    // Fix up the shuffle mask to reflect where the desired element
>> actually is.
>> +    // The minimum and maximum indices that correspond to element zero
>> for both
>> +    // the LHS and RHS are computed and will control which shuffle mask
>> entries
>> +    // are to be changed. For example, if the RHS is permuted, any
>> shuffle mask
>> +    // entries in the range [RHSMinIdx,RHSMaxIdx) will be incremented by
>> +    // HalfVec to refer to the corresponding element in the permuted
>> vector.
>> +    fixupShuffleMaskForPermutedSToV(ShuffV, LHSMaxIdx, RHSMinIdx,
>> RHSMaxIdx,
>> +                                    HalfVec);
>> +    Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS,
>> ShuffV);
>> +
>> +    // We may have simplified away the shuffle. We won't be able to do
>> anything
>> +    // further with it here.
>> +    if (!isa<ShuffleVectorSDNode>(Res))
>> +      return Res;
>> +    Mask = cast<ShuffleVectorSDNode>(Res)->getMask();
>> +  }
>> +
>> +  // The common case after we commuted the shuffle is that the RHS is a
>> splat
>> +  // and we have elements coming in from the splat at indices that are
>> not
>> +  // conducive to using a merge.
>> +  // Example:
>> +  // vector_shuffle<0,17,1,19,2,21,3,23,4,25,5,27,6,29,7,31> t1, <zero>
>> +  if (!isSplatBV(RHS))
>> +    return Res;
>> +
>> +  // We are looking for a mask such that all even elements are from
>> +  // one vector and all odd elements from the other.
>> +  if (!isAlternatingShuffMask(Mask, NumElts))
>> +    return Res;
>> +
>> +  // Adjust the mask so we are pulling in the same index from the splat
>> +  // as the index from the interesting vector in consecutive elements.
>> +  // Example:
>> +  // vector_shuffle<0,16,1,17,2,18,3,19,4,20,5,21,6,22,7,23> t1, <zero>
>> +  for (int i = 1, e = Mask.size(); i < e; i += 2)
>> +    ShuffV[i] = (ShuffV[i - 1] + NumElts);
>> +
>> +  Res = DAG.getVectorShuffle(SVN->getValueType(0), dl, LHS, RHS, ShuffV);
>> +  return Res;
>> +}
>> +
>>  SDValue PPCTargetLowering::combineVReverseMemOP(ShuffleVectorSDNode *SVN,
>>                                                  LSBaseSDNode *LSBase,
>>                                                  DAGCombinerInfo &DCI)
>> const {
>> @@ -14223,7 +14441,7 @@ SDValue
>> PPCTargetLowering::PerformDAGCombine(SDNode *N,
>>        LSBaseSDNode* LSBase = cast<LSBaseSDNode>(N->getOperand(0));
>>        return combineVReverseMemOP(cast<ShuffleVectorSDNode>(N), LSBase,
>> DCI);
>>      }
>> -    break;
>> +    return combineVectorShuffle(cast<ShuffleVectorSDNode>(N), DCI.DAG);
>>    case ISD::STORE: {
>>
>>      EVT Op1VT = N->getOperand(1).getValueType();
>>
>> diff  --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h
>> b/llvm/lib/Target/PowerPC/PPCISelLowering.h
>> index 77252e919553..9f7c6ab53a17 100644
>> --- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
>> +++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
>> @@ -221,6 +221,14 @@ namespace llvm {
>>      /// As with SINT_VEC_TO_FP, used for converting illegal types.
>>      UINT_VEC_TO_FP,
>>
>> +    /// PowerPC instructions that have SCALAR_TO_VECTOR semantics tend to
>> +    /// place the value into the least significant element of the most
>> +    /// significant doubleword in the vector. This is not element zero
>> for
>> +    /// anything smaller than a doubleword on either endianness. This
>> node has
>> +    /// the same semantics as SCALAR_TO_VECTOR except that the value
>> remains in
>> +    /// the aforementioned location in the vector register.
>> +    SCALAR_TO_VECTOR_PERMUTED,
>> +
>>      // FIXME: Remove these once the ANDI glue bug is fixed:
>>      /// i1 = ANDI_rec_1_[EQ|GT]_BIT(i32 or i64 x) - Represents the
>> result of the
>>      /// eq or gt bit of CR0 after executing andi. x, 1. This is used to
>> @@ -1215,6 +1223,8 @@ namespace llvm {
>>      SDValue combineSetCC(SDNode *N, DAGCombinerInfo &DCI) const;
>>      SDValue combineABS(SDNode *N, DAGCombinerInfo &DCI) const;
>>      SDValue combineVSelect(SDNode *N, DAGCombinerInfo &DCI) const;
>> +    SDValue combineVectorShuffle(ShuffleVectorSDNode *SVN,
>> +                                 SelectionDAG &DAG) const;
>>      SDValue combineVReverseMemOP(ShuffleVectorSDNode *SVN, LSBaseSDNode
>> *LSBase,
>>                                   DAGCombinerInfo &DCI) const;
>>
>>
>> diff  --git a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> index e7ec1808ec3b..c43b2716cb37 100644
>> --- a/llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> +++ b/llvm/lib/Target/PowerPC/PPCInstrVSX.td
>> @@ -138,6 +138,8 @@ def PPCldvsxlh : SDNode<"PPCISD::LD_VSX_LH",
>> SDT_PPCldvsxlh,
>>                          [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
>>  def PPCldsplat : SDNode<"PPCISD::LD_SPLAT", SDT_PPCldsplat,
>>                          [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
>> +def PPCSToV : SDNode<"PPCISD::SCALAR_TO_VECTOR_PERMUTED",
>> +                     SDTypeProfile<1, 1, []>, []>;
>>
>>  //-------------------------- Predicate definitions
>> ---------------------------//
>>  def HasVSX : Predicate<"PPCSubTarget->hasVSX()">;
>> @@ -288,6 +290,11 @@ class X_XS6_RA5_RB5<bits<6> opcode, bits<10> xo,
>> string opc,
>>  } // Predicates = HasP9Vector
>>  } // AddedComplexity = 400, hasSideEffects = 0
>>
>> +multiclass ScalToVecWPermute<ValueType Ty, dag In, dag NonPermOut, dag
>> PermOut> {
>> +  def : Pat<(Ty (scalar_to_vector In)), (Ty NonPermOut)>;
>> +  def : Pat<(Ty (PPCSToV In)), (Ty PermOut)>;
>> +}
>> +
>>  //-------------------------- Instruction definitions
>> -------------------------//
>>  // VSX instructions require the VSX feature, they are to be selected over
>>  // equivalent Altivec patterns (as they address a larger register set)
>> and
>> @@ -2710,12 +2717,14 @@ def : Pat<(v2i64 (build_vector DblToLong.A,
>> DblToLong.A)),
>>  def : Pat<(v2i64 (build_vector DblToULong.A, DblToULong.A)),
>>            (v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC),
>>                             (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC),
>> 0))>;
>> -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
>> -          (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> -                            (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
>> -          (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> -                            (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC),
>> 1))>;
>> +defm : ScalToVecWPermute<
>> +  v4i32, FltToIntLoad.A,
>> +  (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC),
>> 1),
>> +  (COPY_TO_REGCLASS (XSCVDPSXWSs (XFLOADf32 xoaddr:$A)), VSRC)>;
>> +defm : ScalToVecWPermute<
>> +  v4i32, FltToUIntLoad.A,
>> +  (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC),
>> 1),
>> +  (COPY_TO_REGCLASS (XSCVDPUXWSs (XFLOADf32 xoaddr:$A)), VSRC)>;
>>  def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)),
>>            (v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>;
>>  def : Pat<(v2f64 (PPCldsplat xoaddr:$A)),
>> @@ -2730,10 +2739,12 @@ def : Pat<(v2i64 (build_vector FltToLong.A,
>> FltToLong.A)),
>>  def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)),
>>            (v2i64 (XXPERMDIs
>>                     (COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>;
>> -def : Pat<(v2i64 (scalar_to_vector DblToLongLoad.A)),
>> -          (v2i64 (XVCVDPSXDS (LXVDSX xoaddr:$A)))>;
>> -def : Pat<(v2i64 (scalar_to_vector DblToULongLoad.A)),
>> -          (v2i64 (XVCVDPUXDS (LXVDSX xoaddr:$A)))>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, DblToLongLoad.A,
>> +  (XVCVDPSXDS (LXVDSX xoaddr:$A)), (XVCVDPSXDS (LXVDSX xoaddr:$A))>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, DblToULongLoad.A,
>> +  (XVCVDPUXDS (LXVDSX xoaddr:$A)), (XVCVDPUXDS (LXVDSX xoaddr:$A))>;
>>  } // HasVSX
>>
>>  // Any big endian VSX subtarget.
>> @@ -2831,9 +2842,10 @@ def : Pat<WToDPExtractConv.BV13U,
>>
>>  // Any little endian VSX subtarget.
>>  let Predicates = [HasVSX, IsLittleEndian] in {
>> -def : Pat<(v2f64 (scalar_to_vector f64:$A)),
>> -          (v2f64 (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),
>> -                           (SUBREG_TO_REG (i64 1), $A, sub_64), 0))>;
>> +defm : ScalToVecWPermute<v2f64, (f64 f64:$A),
>> +                         (XXPERMDI (SUBREG_TO_REG (i64 1), $A, sub_64),
>> +                                   (SUBREG_TO_REG (i64 1), $A, sub_64),
>> 0),
>> +                         (SUBREG_TO_REG (i64 1), $A, sub_64)>;
>>
>>  def : Pat<(f64 (extractelt v2f64:$S, 0)),
>>            (f64 (EXTRACT_SUBREG (XXPERMDI $S, $S, 2), sub_64))>;
>> @@ -2943,18 +2955,24 @@ def : Pat<(PPCstore_scal_int_from_vsr
>>            (STXSDX (XSCVDPUXDS f64:$src), xoaddr:$dst)>;
>>
>>  // Load-and-splat with fp-to-int conversion (using X-Form VSX/FP loads).
>> -def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)),
>> -          (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> -                            (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)),
>> -          (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> -                            (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)),
>> -          (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
>> -                                          (XFLOADf32 xoaddr:$A),
>> VSFRC)), 0))>;
>> -def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)),
>> -          (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
>> -                                          (XFLOADf32 xoaddr:$A),
>> VSFRC)), 0))>;
>> +defm : ScalToVecWPermute<
>> +  v4i32, DblToIntLoad.A,
>> +  (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC),
>> 1),
>> +  (COPY_TO_REGCLASS (XSCVDPSXWS (XFLOADf64 xoaddr:$A)), VSRC)>;
>> +defm : ScalToVecWPermute<
>> +  v4i32, DblToUIntLoad.A,
>> +  (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC),
>> 1),
>> +  (COPY_TO_REGCLASS (XSCVDPUXWS (XFLOADf64 xoaddr:$A)), VSRC)>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, FltToLongLoad.A,
>> +  (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A),
>> VSFRC)), 0),
>> +  (SUBREG_TO_REG (i64 1), (XSCVDPSXDS (COPY_TO_REGCLASS (XFLOADf32
>> xoaddr:$A),
>> +                                                        VSFRC)),
>> sub_64)>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, FltToULongLoad.A,
>> +  (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32 xoaddr:$A),
>> VSFRC)), 0),
>> +  (SUBREG_TO_REG (i64 1), (XSCVDPUXDS (COPY_TO_REGCLASS (XFLOADf32
>> xoaddr:$A),
>> +                                                        VSFRC)),
>> sub_64)>;
>>  } // HasVSX, NoP9Vector
>>
>>  // Any VSX subtarget that only has loads and stores that load in big
>> endian
>> @@ -3156,8 +3174,12 @@ def : Pat<DWToSPExtractConv.El1US1,
>>                              (f64 (COPY_TO_REGCLASS $S1, VSRC)),
>> VSFRC)))>;
>>
>>  // v4f32 scalar <-> vector conversions (LE)
>> -def : Pat<(v4f32 (scalar_to_vector f32:$A)),
>> -          (v4f32 (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1))>;
>> +  // The permuted version is no better than the version that puts the
>> value
>> +  // into the right element because XSCVDPSPN is
>> diff erent from all the other
>> +  // instructions used for PPCSToV.
>> +  defm : ScalToVecWPermute<v4f32, (f32 f32:$A),
>> +                           (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1),
>> +                           (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 3)>;
>>  def : Pat<(f32 (vector_extract v4f32:$S, 0)),
>>            (f32 (XSCVSPDPN (XXSLDWI $S, $S, 3)))>;
>>  def : Pat<(f32 (vector_extract v4f32:$S, 1)),
>> @@ -3189,18 +3211,25 @@ def : Pat<(f64 (PPCfcfid (f64 (PPCmtvsra (i32
>> (extractelt v4i32:$A, 3)))))),
>>  // LIWAX - This instruction is used for sign extending i32 -> i64.
>>  // LIWZX - This instruction will be emitted for i32, f32, and when
>>  //         zero-extending i32 to i64 (zext i32 -> i64).
>> -def : Pat<(v2i64 (scalar_to_vector (i64 (sextloadi32 xoaddr:$src)))),
>> -          (v2i64 (XXPERMDIs
>> -          (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2))>;
>> -def : Pat<(v2i64 (scalar_to_vector (i64 (zextloadi32 xoaddr:$src)))),
>> -          (v2i64 (XXPERMDIs
>> -          (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
>> -def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),
>> -          (v4i32 (XXPERMDIs
>> -          (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
>> -def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))),
>> -          (v4f32 (XXPERMDIs
>> -          (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2))>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, (i64 (sextloadi32 xoaddr:$src)),
>> +  (XXPERMDIs (COPY_TO_REGCLASS (LIWAX xoaddr:$src), VSFRC), 2),
>> +  (SUBREG_TO_REG (i64 1), (LIWAX xoaddr:$src), sub_64)>;
>> +
>> +defm : ScalToVecWPermute<
>> +  v2i64, (i64 (zextloadi32 xoaddr:$src)),
>> +  (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
>> +  (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
>> +
>> +defm : ScalToVecWPermute<
>> +  v4i32, (i32 (load xoaddr:$src)),
>> +  (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
>> +  (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
>> +
>> +defm : ScalToVecWPermute<
>> +  v4f32, (f32 (load xoaddr:$src)),
>> +  (XXPERMDIs (COPY_TO_REGCLASS (LIWZX xoaddr:$src), VSFRC), 2),
>> +  (SUBREG_TO_REG (i64 1), (LIWZX xoaddr:$src), sub_64)>;
>>
>>  def : Pat<DWToSPExtractConv.BVU,
>>            (v4f32 (VPKUDUM (XXSLDWI (XVCVUXDSP $S2), (XVCVUXDSP $S2), 3),
>> @@ -3336,14 +3365,17 @@ def : Pat<(i64 (vector_extract v2i64:$S,
>> i64:$Idx)),
>>  // Little endian VSX subtarget with direct moves.
>>  let Predicates = [HasVSX, HasDirectMove, IsLittleEndian] in {
>>    // v16i8 scalar <-> vector conversions (LE)
>> -  def : Pat<(v16i8 (scalar_to_vector i32:$A)),
>> -            (v16i8 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;
>> -  def : Pat<(v8i16 (scalar_to_vector i32:$A)),
>> -            (v8i16 (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC))>;
>> -  def : Pat<(v4i32 (scalar_to_vector i32:$A)),
>> -            (v4i32 MovesToVSR.LE_WORD_0)>;
>> -  def : Pat<(v2i64 (scalar_to_vector i64:$A)),
>> -            (v2i64 MovesToVSR.LE_DWORD_0)>;
>> +  defm : ScalToVecWPermute<v16i8, (i32 i32:$A),
>> +                           (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC),
>> +                           (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1,
>> VSRC)>;
>> +  defm : ScalToVecWPermute<v8i16, (i32 i32:$A),
>> +                           (COPY_TO_REGCLASS MovesToVSR.LE_WORD_0, VSRC),
>> +                           (COPY_TO_REGCLASS MovesToVSR.LE_WORD_1,
>> VSRC)>;
>> +  defm : ScalToVecWPermute<v4i32, (i32 i32:$A), MovesToVSR.LE_WORD_0,
>> +                           (SUBREG_TO_REG (i64 1), (MTVSRWZ $A),
>> sub_64)>;
>> +  defm : ScalToVecWPermute<v2i64, (i64 i64:$A), MovesToVSR.LE_DWORD_0,
>> +                           MovesToVSR.LE_DWORD_1>;
>> +
>>    // v2i64 scalar <-> vector conversions (LE)
>>    def : Pat<(i64 (vector_extract v2i64:$S, 0)),
>>              (i64 VectorExtractions.LE_DWORD_0)>;
>> @@ -3641,30 +3673,41 @@ def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS,
>> xoaddr:$dst),
>>            (STXVX $rS, xoaddr:$dst)>;
>>
>>  // Build vectors from i8 loads
>> -def : Pat<(v16i8 (scalar_to_vector ScalarLoads.Li8)),
>> -          (v16i8 (VSPLTBs 7, (LXSIBZX xoaddr:$src)))>;
>> -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.ZELi8)),
>> -          (v8i16 (VSPLTHs 3, (LXSIBZX xoaddr:$src)))>;
>> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi8)),
>> -         (v4i32 (XXSPLTWs (LXSIBZX xoaddr:$src), 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi8i64)),
>> -          (v2i64 (XXPERMDIs (LXSIBZX xoaddr:$src), 0))>;
>> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi8)),
>> -          (v4i32 (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi8i64)),
>> -          (v2i64 (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), 0))>;
>> +defm : ScalToVecWPermute<v16i8, ScalarLoads.Li8,
>> +                         (VSPLTBs 7, (LXSIBZX xoaddr:$src)),
>> +                         (VSPLTBs 7, (LXSIBZX xoaddr:$src))>;
>> +defm : ScalToVecWPermute<v8i16, ScalarLoads.ZELi8,
>> +                         (VSPLTHs 3, (LXSIBZX xoaddr:$src)),
>> +                         (VSPLTHs 3, (LXSIBZX xoaddr:$src))>;
>> +defm : ScalToVecWPermute<v4i32, ScalarLoads.ZELi8,
>> +                         (XXSPLTWs (LXSIBZX xoaddr:$src), 1),
>> +                         (XXSPLTWs (LXSIBZX xoaddr:$src), 1)>;
>> +defm : ScalToVecWPermute<v2i64, ScalarLoads.ZELi8i64,
>> +                         (XXPERMDIs (LXSIBZX xoaddr:$src), 0),
>> +                         (XXPERMDIs (LXSIBZX xoaddr:$src), 0)>;
>> +defm : ScalToVecWPermute<v4i32, ScalarLoads.SELi8,
>> +                         (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1),
>> +                         (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)),
>> 1)>;
>> +defm : ScalToVecWPermute<v2i64, ScalarLoads.SELi8i64,
>> +                         (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)),
>> 0),
>> +                         (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)),
>> 0)>;
>>
>>  // Build vectors from i16 loads
>> -def : Pat<(v8i16 (scalar_to_vector ScalarLoads.Li16)),
>> -          (v8i16 (VSPLTHs 3, (LXSIHZX xoaddr:$src)))>;
>> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi16)),
>> -          (v4i32 (XXSPLTWs (LXSIHZX xoaddr:$src), 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
>> -         (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>;
>> -def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi16)),
>> -          (v4i32 (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi16i64)),
>> -          (v2i64 (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), 0))>;
>> +defm : ScalToVecWPermute<v8i16, ScalarLoads.Li16,
>> +                         (VSPLTHs 3, (LXSIHZX xoaddr:$src)),
>> +                         (VSPLTHs 3, (LXSIHZX xoaddr:$src))>;
>> +defm : ScalToVecWPermute<v4i32, ScalarLoads.ZELi16,
>> +                         (XXSPLTWs (LXSIHZX xoaddr:$src), 1),
>> +                         (XXSPLTWs (LXSIHZX xoaddr:$src), 1)>;
>> +defm : ScalToVecWPermute<v2i64, ScalarLoads.ZELi16i64,
>> +                         (XXPERMDIs (LXSIHZX xoaddr:$src), 0),
>> +                         (XXPERMDIs (LXSIHZX xoaddr:$src), 0)>;
>> +defm : ScalToVecWPermute<v4i32, ScalarLoads.SELi16,
>> +                         (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1),
>> +                         (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)),
>> 1)>;
>> +defm : ScalToVecWPermute<v2i64, ScalarLoads.SELi16i64,
>> +                         (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)),
>> 0),
>> +                         (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)),
>> 0)>;
>>
>>  // Load/convert and convert/store patterns for f16.
>>  def : Pat<(f64 (extloadf16 xoaddr:$src)),
>> @@ -3806,8 +3849,7 @@ def : Pat<(f32 (PPCxsminc f32:$XA, f32:$XB)),
>>                                   VSSRC))>;
>>
>>  // Endianness-neutral patterns for const splats with ISA 3.0
>> instructions.
>> -def : Pat<(v4i32 (scalar_to_vector i32:$A)),
>> -          (v4i32 (MTVSRWS $A))>;
>> +defm : ScalToVecWPermute<v4i32, (i32 i32:$A), (MTVSRWS $A), (MTVSRWS
>> $A)>;
>>  def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
>>            (v4i32 (MTVSRWS $A))>;
>>  def : Pat<(v16i8 (build_vector immNonAllOneAnyExt8:$A,
>> immNonAllOneAnyExt8:$A,
>> @@ -3819,24 +3861,32 @@ def : Pat<(v16i8 (build_vector
>> immNonAllOneAnyExt8:$A, immNonAllOneAnyExt8:$A,
>>                                 immNonAllOneAnyExt8:$A,
>> immNonAllOneAnyExt8:$A,
>>                                 immNonAllOneAnyExt8:$A,
>> immNonAllOneAnyExt8:$A)),
>>            (v16i8 (COPY_TO_REGCLASS (XXSPLTIB imm:$A), VSRC))>;
>> -def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
>> -          (v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>;
>> -def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
>> -          (v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>;
>> -def : Pat<(v4i32 (scalar_to_vector DblToIntLoadP9.A)),
>> -          (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> -                            (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v4i32 (scalar_to_vector DblToUIntLoadP9.A)),
>> -          (v4i32 (XXSPLTW (COPY_TO_REGCLASS
>> -                            (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC),
>> 1))>;
>> -def : Pat<(v2i64 (scalar_to_vector FltToLongLoadP9.A)),
>> -          (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
>> -                                          (DFLOADf32 iaddrX4:$A),
>> -                                          VSFRC)), 0))>;
>> -def : Pat<(v2i64 (scalar_to_vector FltToULongLoadP9.A)),
>> -          (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
>> -                                          (DFLOADf32 iaddrX4:$A),
>> -                                          VSFRC)), 0))>;
>> +defm : ScalToVecWPermute<v4i32, FltToIntLoad.A,
>> +                         (XVCVSPSXWS (LXVWSX xoaddr:$A)),
>> +                         (XVCVSPSXWS (LXVWSX xoaddr:$A))>;
>> +defm : ScalToVecWPermute<v4i32, FltToUIntLoad.A,
>> +                         (XVCVSPUXWS (LXVWSX xoaddr:$A)),
>> +                         (XVCVSPUXWS (LXVWSX xoaddr:$A))>;
>> +defm : ScalToVecWPermute<
>> +  v4i32, DblToIntLoadP9.A,
>> +  (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), VSRC),
>> 1),
>> +  (SUBREG_TO_REG (i64 1), (XSCVDPSXWS (DFLOADf64 iaddrX4:$A)), sub_64)>;
>> +defm : ScalToVecWPermute<
>> +  v4i32, DblToUIntLoadP9.A,
>> +  (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), VSRC),
>> 1),
>> +  (SUBREG_TO_REG (i64 1), (XSCVDPUXWS (DFLOADf64 iaddrX4:$A)), sub_64)>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, FltToLongLoadP9.A,
>> +  (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A),
>> VSFRC)), 0),
>> +  (SUBREG_TO_REG
>> +     (i64 1),
>> +     (XSCVDPSXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)),
>> sub_64)>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, FltToULongLoadP9.A,
>> +  (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A),
>> VSFRC)), 0),
>> +  (SUBREG_TO_REG
>> +     (i64 1),
>> +     (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 iaddrX4:$A), VSFRC)),
>> sub_64)>;
>>  def : Pat<(v4f32 (PPCldsplat xoaddr:$A)),
>>            (v4f32 (LXVWSX xoaddr:$A))>;
>>  def : Pat<(v4i32 (PPCldsplat xoaddr:$A)),
>> @@ -4116,19 +4166,23 @@ def : Pat<(truncstorei16 (i32 (vector_extract
>> v8i16:$S, 6)), xoaddr:$dst),
>>  def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 7)),
>> xoaddr:$dst),
>>            (STXSIHXv (COPY_TO_REGCLASS (v16i8 (VSLDOI $S, $S, 10)),
>> VSRC), xoaddr:$dst)>;
>>
>> -def : Pat<(v2i64 (scalar_to_vector (i64 (load iaddrX4:$src)))),
>> -          (v2i64 (XXPERMDIs
>> -          (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>;
>> -def : Pat<(v2i64 (scalar_to_vector (i64 (load xaddrX4:$src)))),
>> -          (v2i64 (XXPERMDIs
>> -          (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, (i64 (load iaddrX4:$src)),
>> +  (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2),
>> +  (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>;
>> +defm : ScalToVecWPermute<
>> +  v2i64, (i64 (load xaddrX4:$src)),
>> +  (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2),
>> +  (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>;
>> +defm : ScalToVecWPermute<
>> +  v2f64, (f64 (load iaddrX4:$src)),
>> +  (XXPERMDIs (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2),
>> +  (SUBREG_TO_REG (i64 1), (DFLOADf64 iaddrX4:$src), sub_64)>;
>> +defm : ScalToVecWPermute<
>> +  v2f64, (f64 (load xaddrX4:$src)),
>> +  (XXPERMDIs (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2),
>> +  (SUBREG_TO_REG (i64 1), (XFLOADf64 xaddrX4:$src), sub_64)>;
>>
>> -def : Pat<(v2f64 (scalar_to_vector (f64 (load iaddrX4:$src)))),
>> -          (v2f64 (XXPERMDIs
>> -          (COPY_TO_REGCLASS (DFLOADf64 iaddrX4:$src), VSFRC), 2))>;
>> -def : Pat<(v2f64 (scalar_to_vector (f64 (load xaddrX4:$src)))),
>> -          (v2f64 (XXPERMDIs
>> -          (COPY_TO_REGCLASS (XFLOADf64 xaddrX4:$src), VSFRC), 2))>;
>>  def : Pat<(store (i64 (extractelt v2i64:$A, 0)), xaddrX4:$src),
>>            (XFSTOREf64 (EXTRACT_SUBREG (XXPERMDI $A, $A, 2),
>>                         sub_64), xaddrX4:$src)>;
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> index 8c9ffa815467..4d06571d0ec7 100644
>> --- a/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> +++ b/llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
>> @@ -13,8 +13,7 @@ define void @testExpandPostRAPseudo(i32* nocapture
>> readonly %ptr) {
>>  ; CHECK-P8:  # %bb.0: # %entry
>>  ; CHECK-P8:    lfiwzx f0, 0, r3
>>  ; CHECK-P8:    ld r4, .LC0 at toc@l(r4)
>> -; CHECK-P8:    xxswapd vs0, f0
>> -; CHECK-P8:    xxspltw v2, vs0, 3
>> +; CHECK-P8:    xxspltw v2, vs0, 1
>>  ; CHECK-P8:    stvx v2, 0, r4
>>  ; CHECK-P8:    lis r4, 1024
>>  ; CHECK-P8:    lfiwax f0, 0, r3
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> index ee0cc41ea6bd..1cb7d7b62055 100644
>> --- a/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> +++ b/llvm/test/CodeGen/PowerPC/build-vector-tests.ll
>> @@ -1282,8 +1282,7 @@ define <4 x i32> @spltMemVali(i32* nocapture
>> readonly %ptr) {
>>  ; P8LE-LABEL: spltMemVali:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfiwzx f0, 0, r3
>> -; P8LE-NEXT:    xxswapd vs0, f0
>> -; P8LE-NEXT:    xxspltw v2, vs0, 3
>> +; P8LE-NEXT:    xxspltw v2, vs0, 1
>>  ; P8LE-NEXT:    blr
>>  entry:
>>    %0 = load i32, i32* %ptr, align 4
>> @@ -2801,8 +2800,7 @@ define <4 x i32> @spltMemValui(i32* nocapture
>> readonly %ptr) {
>>  ; P8LE-LABEL: spltMemValui:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfiwzx f0, 0, r3
>> -; P8LE-NEXT:    xxswapd vs0, f0
>> -; P8LE-NEXT:    xxspltw v2, vs0, 3
>> +; P8LE-NEXT:    xxspltw v2, vs0, 1
>>  ; P8LE-NEXT:    blr
>>  entry:
>>    %0 = load i32, i32* %ptr, align 4
>> @@ -4573,7 +4571,7 @@ define <2 x i64> @spltMemValConvftoll(float*
>> nocapture readonly %ptr) {
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfs f0, 0(r3)
>>  ; P9LE-NEXT:    xscvdpsxds f0, f0
>> -; P9LE-NEXT:    xxspltd v2, f0, 0
>> +; P9LE-NEXT:    xxspltd v2, vs0, 0
>>  ; P9LE-NEXT:    blr
>>  ;
>>  ; P8BE-LABEL: spltMemValConvftoll:
>> @@ -4587,7 +4585,7 @@ define <2 x i64> @spltMemValConvftoll(float*
>> nocapture readonly %ptr) {
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfsx f0, 0, r3
>>  ; P8LE-NEXT:    xscvdpsxds f0, f0
>> -; P8LE-NEXT:    xxspltd v2, f0, 0
>> +; P8LE-NEXT:    xxspltd v2, vs0, 0
>>  ; P8LE-NEXT:    blr
>>  entry:
>>    %0 = load float, float* %ptr, align 4
>> @@ -5761,7 +5759,7 @@ define <2 x i64> @spltMemValConvftoull(float*
>> nocapture readonly %ptr) {
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfs f0, 0(r3)
>>  ; P9LE-NEXT:    xscvdpuxds f0, f0
>> -; P9LE-NEXT:    xxspltd v2, f0, 0
>> +; P9LE-NEXT:    xxspltd v2, vs0, 0
>>  ; P9LE-NEXT:    blr
>>  ;
>>  ; P8BE-LABEL: spltMemValConvftoull:
>> @@ -5775,7 +5773,7 @@ define <2 x i64> @spltMemValConvftoull(float*
>> nocapture readonly %ptr) {
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfsx f0, 0, r3
>>  ; P8LE-NEXT:    xscvdpuxds f0, f0
>> -; P8LE-NEXT:    xxspltd v2, f0, 0
>> +; P8LE-NEXT:    xxspltd v2, vs0, 0
>>  ; P8LE-NEXT:    blr
>>  entry:
>>    %0 = load float, float* %ptr, align 4
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> index 2ffe98e1f694..7fac0511e3c5 100644
>> --- a/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> +++ b/llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
>> @@ -23,18 +23,12 @@ entry:
>>  define dso_local <16 x i8> @testmrghb2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: testmrghb2:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    addis r3, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT:    addi r3, r3, .LCPI1_0 at toc@l
>> -; CHECK-P8-NEXT:    lvx v4, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testmrghb2:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    addis r3, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r3, r3, .LCPI1_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v4, 0, r3
>> -; CHECK-P9-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P9-NEXT:    blr
>>  entry:
>>    %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 24, i32 8, i32 25, i32 9, i32 26, i32 10, i32 27, i32 11, i32 28, i32 12,
>> i32 29, i32 13, i32 30, i32 14, i32 31, i32 15>
>> @@ -57,18 +51,12 @@ entry:
>>  define dso_local <16 x i8> @testmrghh2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: testmrghh2:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    addis r3, r2, .LCPI3_0 at toc@ha
>> -; CHECK-P8-NEXT:    addi r3, r3, .LCPI3_0 at toc@l
>> -; CHECK-P8-NEXT:    lvx v4, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testmrghh2:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    addis r3, r2, .LCPI3_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r3, r3, .LCPI3_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v4, 0, r3
>> -; CHECK-P9-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P9-NEXT:    blr
>>  entry:
>>    %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 24, i32 25, i32 8, i32 9, i32 26, i32 27, i32 10, i32 11, i32 28, i32 29,
>> i32 12, i32 13, i32 30, i32 31, i32 14, i32 15>
>> @@ -91,18 +79,12 @@ entry:
>>  define dso_local <16 x i8> @testmrglb2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: testmrglb2:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    addis r3, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P8-NEXT:    addi r3, r3, .LCPI5_0 at toc@l
>> -; CHECK-P8-NEXT:    lvx v4, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT:    vmrglb v2, v2, v3
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testmrglb2:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    addis r3, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r3, r3, .LCPI5_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v4, 0, r3
>> -; CHECK-P9-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT:    vmrglb v2, v2, v3
>>  ; CHECK-P9-NEXT:    blr
>>  entry:
>>    %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 16, i32 0, i32 17, i32 1, i32 18, i32 2, i32 19, i32 3, i32 20, i32 4, i32
>> 21, i32 5, i32 22, i32 6, i32 23, i32 7>
>> @@ -125,18 +107,12 @@ entry:
>>  define dso_local <16 x i8> @testmrglh2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: testmrglh2:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    addis r3, r2, .LCPI7_0 at toc@ha
>> -; CHECK-P8-NEXT:    addi r3, r3, .LCPI7_0 at toc@l
>> -; CHECK-P8-NEXT:    lvx v4, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT:    vmrglh v2, v2, v3
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testmrglh2:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    addis r3, r2, .LCPI7_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r3, r3, .LCPI7_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v4, 0, r3
>> -; CHECK-P9-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>>  ; CHECK-P9-NEXT:    blr
>>  entry:
>>    %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 16, i32 17, i32 0, i32 1, i32 18, i32 19, i32 2, i32 3, i32 20, i32 21, i32
>> 4, i32 5, i32 22, i32 23, i32 6, i32 7>
>> @@ -159,18 +135,12 @@ entry:
>>  define dso_local <16 x i8> @testmrghw2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: testmrghw2:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    addis r3, r2, .LCPI9_0 at toc@ha
>> -; CHECK-P8-NEXT:    addi r3, r3, .LCPI9_0 at toc@l
>> -; CHECK-P8-NEXT:    lvx v4, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testmrghw2:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    addis r3, r2, .LCPI9_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r3, r3, .LCPI9_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v4, 0, r3
>> -; CHECK-P9-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P9-NEXT:    blr
>>  entry:
>>    %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 24, i32 25, i32 26, i32 27, i32 8, i32 9, i32 10, i32 11, i32 28, i32 29,
>> i32 30, i32 31, i32 12, i32 13, i32 14, i32 15>
>> @@ -193,18 +163,12 @@ entry:
>>  define dso_local <16 x i8> @testmrglw2(<16 x i8> %a, <16 x i8> %b)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: testmrglw2:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    addis r3, r2, .LCPI11_0 at toc@ha
>> -; CHECK-P8-NEXT:    addi r3, r3, .LCPI11_0 at toc@l
>> -; CHECK-P8-NEXT:    lvx v4, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testmrglw2:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    addis r3, r2, .LCPI11_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r3, r3, .LCPI11_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v4, 0, r3
>> -; CHECK-P9-NEXT:    vperm v2, v3, v2, v4
>> +; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>>  ; CHECK-P9-NEXT:    blr
>>  entry:
>>    %shuffle = shufflevector <16 x i8> %a, <16 x i8> %b, <16 x i32> <i32
>> 16, i32 17, i32 18, i32 19, i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32
>> 22, i32 23, i32 4, i32 5, i32 6, i32 7>
>> @@ -215,24 +179,16 @@ define dso_local <8 x i16> @testmrglb3(<8 x i8>*
>> nocapture readonly %a) local_un
>>  ; CHECK-P8-LABEL: testmrglb3:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    ld r3, 0(r3)
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI12_0 at toc@ha
>> -; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    addi r3, r4, .LCPI12_0 at toc@l
>> -; CHECK-P8-NEXT:    lvx v3, 0, r3
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    vperm v2, v2, v4, v3
>> +; CHECK-P8-NEXT:    xxlxor v2, v2, v2
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>> +; CHECK-P8-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testmrglb3:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    lfd f0, 0(r3)
>> -; CHECK-P9-NEXT:    addis r3, r2, .LCPI12_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r3, r3, .LCPI12_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v3, 0, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, f0
>> -; CHECK-P9-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P9-NEXT:    vperm v2, v2, v4, v3
>> +; CHECK-P9-NEXT:    lxsd v2, 0(r3)
>> +; CHECK-P9-NEXT:    xxlxor v3, v3, v3
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  entry:
>>    %0 = load <8 x i8>, <8 x i8>* %a, align 8
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> index a23db59635a4..3a43b3584caf 100644
>> --- a/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> +++ b/llvm/test/CodeGen/PowerPC/fp-strict-round.ll
>> @@ -331,12 +331,12 @@ define <2 x float> @fptrunc_v2f32_v2f64(<2 x
>> double> %vf1) {
>>  ; P9:       # %bb.0:
>>  ; P9-NEXT:    xsrsp f0, v2
>>  ; P9-NEXT:    xscvdpspn vs0, f0
>> -; P9-NEXT:    xxsldwi v3, vs0, vs0, 1
>> +; P9-NEXT:    xxsldwi v3, vs0, vs0, 3
>>  ; P9-NEXT:    xxswapd vs0, v2
>>  ; P9-NEXT:    xsrsp f0, f0
>>  ; P9-NEXT:    xscvdpspn vs0, f0
>> -; P9-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; P9-NEXT:    vmrglw v2, v3, v2
>> +; P9-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; P9-NEXT:    vmrghw v2, v3, v2
>>  ; P9-NEXT:    blr
>>    %res = call <2 x float>
>> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(
>>                          <2 x double> %vf1,
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> b/llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> index f411712ba3fa..26da1fdaefef 100644
>> --- a/llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> +++ b/llvm/test/CodeGen/PowerPC/load-and-splat.ll
>> @@ -40,8 +40,7 @@ define dso_local void @test2(<4 x float>* nocapture %c,
>> float* nocapture readonl
>>  ; P8:       # %bb.0: # %entry
>>  ; P8-NEXT:    addi r4, r4, 12
>>  ; P8-NEXT:    lfiwzx f0, 0, r4
>> -; P8-NEXT:    xxswapd vs0, f0
>> -; P8-NEXT:    xxspltw v2, vs0, 3
>> +; P8-NEXT:    xxspltw v2, vs0, 1
>>  ; P8-NEXT:    stvx v2, 0, r3
>>  ; P8-NEXT:    blr
>>  entry:
>> @@ -65,8 +64,7 @@ define dso_local void @test3(<4 x i32>* nocapture %c,
>> i32* nocapture readonly %a
>>  ; P8:       # %bb.0: # %entry
>>  ; P8-NEXT:    addi r4, r4, 12
>>  ; P8-NEXT:    lfiwzx f0, 0, r4
>> -; P8-NEXT:    xxswapd vs0, f0
>> -; P8-NEXT:    xxspltw v2, vs0, 3
>> +; P8-NEXT:    xxspltw v2, vs0, 1
>>  ; P8-NEXT:    stvx v2, 0, r3
>>  ; P8-NEXT:    blr
>>  entry:
>> @@ -110,8 +108,7 @@ define <16 x i8> @unadjusted_lxvwsx(i32* %s, i32* %t)
>> {
>>  ; P8-LABEL: unadjusted_lxvwsx:
>>  ; P8:       # %bb.0: # %entry
>>  ; P8-NEXT:    lfiwzx f0, 0, r3
>> -; P8-NEXT:    xxswapd vs0, f0
>> -; P8-NEXT:    xxspltw v2, vs0, 3
>> +; P8-NEXT:    xxspltw v2, vs0, 1
>>  ; P8-NEXT:    blr
>>    entry:
>>      %0 = bitcast i32* %s to <4 x i8>*
>> @@ -131,8 +128,7 @@ define <16 x i8> @adjusted_lxvwsx(i64* %s, i64* %t) {
>>  ; P8:       # %bb.0: # %entry
>>  ; P8-NEXT:    ld r3, 0(r3)
>>  ; P8-NEXT:    mtfprd f0, r3
>> -; P8-NEXT:    xxswapd v2, vs0
>> -; P8-NEXT:    xxspltw v2, v2, 2
>> +; P8-NEXT:    xxspltw v2, vs0, 0
>>  ; P8-NEXT:    blr
>>    entry:
>>      %0 = bitcast i64* %s to <8 x i8>*
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> index 409978549c36..a03ab5f9519e 100644
>> --- a/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> +++ b/llvm/test/CodeGen/PowerPC/load-v4i8-improved.ll
>> @@ -9,8 +9,7 @@ define <16 x i8> @test(i32* %s, i32* %t) {
>>  ; CHECK-LE-LABEL: test:
>>  ; CHECK-LE:       # %bb.0: # %entry
>>  ; CHECK-LE-NEXT:    lfiwzx f0, 0, r3
>> -; CHECK-LE-NEXT:    xxswapd vs0, f0
>> -; CHECK-LE-NEXT:    xxspltw v2, vs0, 3
>> +; CHECK-LE-NEXT:    xxspltw v2, vs0, 1
>>  ; CHECK-LE-NEXT:    blr
>>
>>  ; CHECK-LABEL: test:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> index e1f0e827b9f6..dffa0fb98fc0 100644
>> --- a/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> +++ b/llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
>> @@ -21,8 +21,8 @@ entry:
>>  ; CHECK: sldi r3, r3, 56
>>  ; CHECK: mtvsrd v2, r3
>>  ; CHECK-LE-LABEL: buildc
>> -; CHECK-LE: mtfprd f0, r3
>> -; CHECK-LE: xxswapd v2, vs0
>> +; CHECK-LE: mtvsrd v2, r3
>> +; CHECK-LE: vspltb v2, v2, 7
>>  }
>>
>>  ; Function Attrs: norecurse nounwind readnone
>> @@ -35,8 +35,8 @@ entry:
>>  ; CHECK: sldi r3, r3, 48
>>  ; CHECK: mtvsrd v2, r3
>>  ; CHECK-LE-LABEL: builds
>> -; CHECK-LE: mtfprd f0, r3
>> -; CHECK-LE: xxswapd v2, vs0
>> +; CHECK-LE: mtvsrd v2, r3
>> +; CHECK-LE: vsplth v2, v2, 3
>>  }
>>
>>  ; Function Attrs: norecurse nounwind readnone
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/pr25080.ll
>> b/llvm/test/CodeGen/PowerPC/pr25080.ll
>> index 7a2fb76fd453..f87cb5b940ca 100644
>> --- a/llvm/test/CodeGen/PowerPC/pr25080.ll
>> +++ b/llvm/test/CodeGen/PowerPC/pr25080.ll
>> @@ -17,41 +17,33 @@ define <8 x i16> @pr25080(<8 x i32> %a) {
>>  ; LE-NEXT:    mfvsrwz 3, 34
>>  ; LE-NEXT:    xxsldwi 1, 34, 34, 1
>>  ; LE-NEXT:    mfvsrwz 4, 35
>> -; LE-NEXT:    xxsldwi 4, 34, 34, 3
>> -; LE-NEXT:    mtfprd 2, 3
>> +; LE-NEXT:    xxsldwi 2, 34, 34, 3
>> +; LE-NEXT:    mtvsrd 36, 3
>>  ; LE-NEXT:    mffprwz 3, 0
>>  ; LE-NEXT:    xxswapd 0, 35
>> -; LE-NEXT:    mtfprd 3, 4
>> -; LE-NEXT:    xxsldwi 5, 35, 35, 1
>> +; LE-NEXT:    mtvsrd 37, 4
>>  ; LE-NEXT:    mffprwz 4, 1
>> -; LE-NEXT:    xxsldwi 7, 35, 35, 3
>> -; LE-NEXT:    mtfprd 1, 3
>> -; LE-NEXT:    xxswapd 33, 3
>> -; LE-NEXT:    mffprwz 3, 4
>> -; LE-NEXT:    mtfprd 4, 4
>> -; LE-NEXT:    xxswapd 34, 1
>> +; LE-NEXT:    xxsldwi 1, 35, 35, 1
>> +; LE-NEXT:    mtvsrd 34, 3
>> +; LE-NEXT:    mffprwz 3, 2
>> +; LE-NEXT:    mtvsrd 32, 4
>>  ; LE-NEXT:    mffprwz 4, 0
>> -; LE-NEXT:    mtfprd 0, 3
>> -; LE-NEXT:    xxswapd 35, 4
>> -; LE-NEXT:    mffprwz 3, 5
>> -; LE-NEXT:    mtfprd 6, 4
>> -; LE-NEXT:    xxswapd 36, 0
>> -; LE-NEXT:    mtfprd 1, 3
>> -; LE-NEXT:    mffprwz 3, 7
>> -; LE-NEXT:    xxswapd 37, 6
>> -; LE-NEXT:    vmrglh 2, 3, 2
>> -; LE-NEXT:    xxswapd 35, 2
>> -; LE-NEXT:    mtfprd 2, 3
>> -; LE-NEXT:    xxswapd 32, 1
>> +; LE-NEXT:    xxsldwi 0, 35, 35, 3
>> +; LE-NEXT:    mtvsrd 33, 3
>> +; LE-NEXT:    mffprwz 3, 1
>> +; LE-NEXT:    mtvsrd 38, 4
>> +; LE-NEXT:    mtvsrd 35, 3
>> +; LE-NEXT:    mffprwz 3, 0
>> +; LE-NEXT:    vmrghh 2, 0, 2
>> +; LE-NEXT:    mtvsrd 32, 3
>>  ; LE-NEXT:    addis 3, 2, .LCPI0_1 at toc@ha
>> +; LE-NEXT:    vmrghh 4, 1, 4
>>  ; LE-NEXT:    addi 3, 3, .LCPI0_1 at toc@l
>> -; LE-NEXT:    xxswapd 38, 2
>> -; LE-NEXT:    vmrglh 3, 4, 3
>> -; LE-NEXT:    vmrglh 4, 0, 5
>> -; LE-NEXT:    vmrglh 5, 6, 1
>> -; LE-NEXT:    vmrglw 2, 3, 2
>> -; LE-NEXT:    vmrglw 3, 5, 4
>> +; LE-NEXT:    vmrghh 3, 3, 6
>> +; LE-NEXT:    vmrghh 5, 0, 5
>> +; LE-NEXT:    vmrglw 2, 4, 2
>>  ; LE-NEXT:    vspltish 4, 15
>> +; LE-NEXT:    vmrglw 3, 5, 3
>>  ; LE-NEXT:    xxmrgld 34, 35, 34
>>  ; LE-NEXT:    lvx 3, 0, 3
>>  ; LE-NEXT:    xxlor 34, 34, 35
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> index 4c10c3813fb5..d3bfb910fc9f 100644
>> --- a/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> +++ b/llvm/test/CodeGen/PowerPC/pr25157-peephole.ll
>> @@ -58,12 +58,11 @@ L.LB38_2452:
>>
>>  ; CHECK-LABEL: @aercalc_
>>  ; CHECK: lfs
>> -; CHECK: xxspltd
>> +; CHECK: xxswapd
>>  ; CHECK: stxvd2x
>>  ; CHECK-NOT: xxswapd
>>
>>  ; CHECK-P9-LABEL: @aercalc_
>>  ; CHECK-P9: lfs
>> -; CHECK-P9: xxspltd
>>  ; CHECK-P9: stxv
>>  ; CHECK-P9-NOT: xxswapd
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/pr38087.ll
>> b/llvm/test/CodeGen/PowerPC/pr38087.ll
>> index e05a3d2b97aa..49b3d39bc18c 100644
>> --- a/llvm/test/CodeGen/PowerPC/pr38087.ll
>> +++ b/llvm/test/CodeGen/PowerPC/pr38087.ll
>> @@ -11,9 +11,8 @@ declare { i32, i1 } @llvm.usub.with.overflow.i32(i32,
>> i32) #0
>>  define void @draw_llvm_vs_variant0(<4 x float> %x) {
>>  ; CHECK-LABEL: draw_llvm_vs_variant0:
>>  ; CHECK:       # %bb.0: # %entry
>> -; CHECK-NEXT:    lfd f0, 0(r3)
>> -; CHECK-NEXT:    xxswapd v3, f0
>> -; CHECK-NEXT:    vmrglh v3, v3, v3
>> +; CHECK-NEXT:    lxsd v3, 0(r3)
>> +; CHECK-NEXT:    vmrghh v3, v3, v3
>>  ; CHECK-NEXT:    vextsh2w v3, v3
>>  ; CHECK-NEXT:    xvcvsxwsp vs0, v3
>>  ; CHECK-NEXT:    xxspltw vs0, vs0, 2
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> index 4c9137d86124..6584cb74bdb5 100644
>> --- a/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> +++ b/llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
>> @@ -11,34 +11,31 @@
>>  define signext i32 @test_pre_inc_disable_1(i8* nocapture readonly %pix1,
>> i32 signext %i_stride_pix1, i8* nocapture readonly %pix2) {
>>  ; CHECK-LABEL: test_pre_inc_disable_1:
>>  ; CHECK:       # %bb.0: # %entry
>> -; CHECK-NEXT:    lfd f0, 0(r5)
>> +; CHECK-NEXT:    lxsd v5, 0(r5)
>>  ; CHECK-NEXT:    addis r5, r2, .LCPI0_0 at toc@ha
>>  ; CHECK-NEXT:    addi r5, r5, .LCPI0_0 at toc@l
>>  ; CHECK-NEXT:    lxvx v2, 0, r5
>>  ; CHECK-NEXT:    addis r5, r2, .LCPI0_1 at toc@ha
>>  ; CHECK-NEXT:    addi r5, r5, .LCPI0_1 at toc@l
>>  ; CHECK-NEXT:    lxvx v4, 0, r5
>> -; CHECK-NEXT:    xxswapd v5, f0
>> -; CHECK-NEXT:    xxlxor v3, v3, v3
>>  ; CHECK-NEXT:    li r5, 4
>> +; CHECK-NEXT:    xxlxor v3, v3, v3
>>  ; CHECK-NEXT:    vperm v0, v3, v5, v2
>>  ; CHECK-NEXT:    mtctr r5
>>  ; CHECK-NEXT:    li r5, 0
>> -; CHECK-NEXT:    vperm v1, v5, v3, v4
>> +; CHECK-NEXT:    vperm v1, v3, v5, v4
>>  ; CHECK-NEXT:    li r6, 0
>>  ; CHECK-NEXT:    xvnegsp v5, v0
>>  ; CHECK-NEXT:    xvnegsp v0, v1
>>  ; CHECK-NEXT:    .p2align 4
>>  ; CHECK-NEXT:  .LBB0_1: # %for.cond1.preheader
>>  ; CHECK-NEXT:    #
>> -; CHECK-NEXT:    lfd f0, 0(r3)
>> -; CHECK-NEXT:    xxswapd v1, f0
>> -; CHECK-NEXT:    lfdx f0, r3, r4
>> -; CHECK-NEXT:    vperm v6, v1, v3, v4
>> +; CHECK-NEXT:    lxsd v1, 0(r3)
>> +; CHECK-NEXT:    vperm v6, v3, v1, v4
>>  ; CHECK-NEXT:    vperm v1, v3, v1, v2
>>  ; CHECK-NEXT:    xvnegsp v1, v1
>> -; CHECK-NEXT:    add r7, r3, r4
>>  ; CHECK-NEXT:    xvnegsp v6, v6
>> +; CHECK-NEXT:    add r7, r3, r4
>>  ; CHECK-NEXT:    vabsduw v1, v1, v5
>>  ; CHECK-NEXT:    vabsduw v6, v6, v0
>>  ; CHECK-NEXT:    vadduwm v1, v6, v1
>> @@ -46,15 +43,14 @@ define signext i32 @test_pre_inc_disable_1(i8*
>> nocapture readonly %pix1, i32 sig
>>  ; CHECK-NEXT:    vadduwm v1, v1, v6
>>  ; CHECK-NEXT:    xxspltw v6, v1, 2
>>  ; CHECK-NEXT:    vadduwm v1, v1, v6
>> -; CHECK-NEXT:    xxswapd v6, f0
>> +; CHECK-NEXT:    lxsdx v6, r3, r4
>>  ; CHECK-NEXT:    vextuwrx r3, r5, v1
>> -; CHECK-NEXT:    vperm v7, v6, v3, v4
>> +; CHECK-NEXT:    vperm v7, v3, v6, v4
>>  ; CHECK-NEXT:    vperm v6, v3, v6, v2
>> -; CHECK-NEXT:    add r6, r3, r6
>> -; CHECK-NEXT:    add r3, r7, r4
>>  ; CHECK-NEXT:    xvnegsp v6, v6
>>  ; CHECK-NEXT:    xvnegsp v1, v7
>>  ; CHECK-NEXT:    vabsduw v6, v6, v5
>> +; CHECK-NEXT:    add r6, r3, r6
>>  ; CHECK-NEXT:    vabsduw v1, v1, v0
>>  ; CHECK-NEXT:    vadduwm v1, v1, v6
>>  ; CHECK-NEXT:    xxswapd v6, v1
>> @@ -62,6 +58,7 @@ define signext i32 @test_pre_inc_disable_1(i8*
>> nocapture readonly %pix1, i32 sig
>>  ; CHECK-NEXT:    xxspltw v6, v1, 2
>>  ; CHECK-NEXT:    vadduwm v1, v1, v6
>>  ; CHECK-NEXT:    vextuwrx r8, r5, v1
>> +; CHECK-NEXT:    add r3, r7, r4
>>  ; CHECK-NEXT:    add r6, r8, r6
>>  ; CHECK-NEXT:    bdnz .LBB0_1
>>  ; CHECK-NEXT:  # %bb.2: # %for.cond.cleanup
>> @@ -181,29 +178,27 @@ for.cond.cleanup:                                 ;
>> preds = %for.cond1.preheader
>>  define signext i32 @test_pre_inc_disable_2(i8* nocapture readonly %pix1,
>> i8* nocapture readonly %pix2) {
>>  ; CHECK-LABEL: test_pre_inc_disable_2:
>>  ; CHECK:       # %bb.0: # %entry
>> -; CHECK-NEXT:    lfd f0, 0(r3)
>> +; CHECK-NEXT:    lxsd v2, 0(r3)
>>  ; CHECK-NEXT:    addis r3, r2, .LCPI1_0 at toc@ha
>>  ; CHECK-NEXT:    addi r3, r3, .LCPI1_0 at toc@l
>>  ; CHECK-NEXT:    lxvx v4, 0, r3
>>  ; CHECK-NEXT:    addis r3, r2, .LCPI1_1 at toc@ha
>> -; CHECK-NEXT:    xxswapd v2, f0
>> -; CHECK-NEXT:    lfd f0, 0(r4)
>>  ; CHECK-NEXT:    addi r3, r3, .LCPI1_1 at toc@l
>> -; CHECK-NEXT:    xxlxor v3, v3, v3
>>  ; CHECK-NEXT:    lxvx v0, 0, r3
>> -; CHECK-NEXT:    xxswapd v1, f0
>> -; CHECK-NEXT:    vperm v5, v2, v3, v4
>> +; CHECK-NEXT:    lxsd v1, 0(r4)
>> +; CHECK-NEXT:    xxlxor v3, v3, v3
>> +; CHECK-NEXT:    vperm v5, v3, v2, v4
>>  ; CHECK-NEXT:    vperm v2, v3, v2, v0
>>  ; CHECK-NEXT:    vperm v0, v3, v1, v0
>> -; CHECK-NEXT:    vperm v3, v1, v3, v4
>> +; CHECK-NEXT:    vperm v3, v3, v1, v4
>>  ; CHECK-NEXT:    vabsduw v2, v2, v0
>>  ; CHECK-NEXT:    vabsduw v3, v5, v3
>>  ; CHECK-NEXT:    vadduwm v2, v3, v2
>>  ; CHECK-NEXT:    xxswapd v3, v2
>> -; CHECK-NEXT:    li r3, 0
>>  ; CHECK-NEXT:    vadduwm v2, v2, v3
>>  ; CHECK-NEXT:    xxspltw v3, v2, 2
>>  ; CHECK-NEXT:    vadduwm v2, v2, v3
>> +; CHECK-NEXT:    li r3, 0
>>  ; CHECK-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-NEXT:    extsw r3, r3
>>  ; CHECK-NEXT:    blr
>> @@ -286,16 +281,14 @@ define void @test32(i8* nocapture readonly %pix2,
>> i32 signext %i_pix2) {
>>  ; CHECK-LABEL: test32:
>>  ; CHECK:       # %bb.0: # %entry
>>  ; CHECK-NEXT:    add r5, r3, r4
>> -; CHECK-NEXT:    lfiwzx f0, r3, r4
>> +; CHECK-NEXT:    lxsiwzx v2, r3, r4
>>  ; CHECK-NEXT:    addis r3, r2, .LCPI2_0 at toc@ha
>>  ; CHECK-NEXT:    addi r3, r3, .LCPI2_0 at toc@l
>>  ; CHECK-NEXT:    lxvx v4, 0, r3
>>  ; CHECK-NEXT:    li r3, 4
>> -; CHECK-NEXT:    xxswapd v2, f0
>> -; CHECK-NEXT:    lfiwzx f0, r5, r3
>> +; CHECK-NEXT:    lxsiwzx v5, r5, r3
>>  ; CHECK-NEXT:    xxlxor v3, v3, v3
>>  ; CHECK-NEXT:    vperm v2, v2, v3, v4
>> -; CHECK-NEXT:    xxswapd v5, f0
>>  ; CHECK-NEXT:    vperm v3, v5, v3, v4
>>  ; CHECK-NEXT:    vspltisw v4, 8
>>  ; CHECK-NEXT:    vnegw v3, v3
>> @@ -361,16 +354,15 @@ define void @test16(i16* nocapture readonly %sums,
>> i32 signext %delta, i32 signe
>>  ; CHECK-NEXT:    lxsihzx v2, r6, r7
>>  ; CHECK-NEXT:    lxsihzx v4, r3, r4
>>  ; CHECK-NEXT:    li r6, 0
>> -; CHECK-NEXT:    mtfprd f0, r6
>> +; CHECK-NEXT:    mtvsrd v3, r6
>>  ; CHECK-NEXT:    vsplth v4, v4, 3
>> -; CHECK-NEXT:    xxswapd v3, vs0
>>  ; CHECK-NEXT:    vsplth v2, v2, 3
>>  ; CHECK-NEXT:    addis r3, r2, .LCPI3_0 at toc@ha
>>  ; CHECK-NEXT:    addi r3, r3, .LCPI3_0 at toc@l
>> -; CHECK-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-NEXT:    vmrglw v3, v3, v4
>> +; CHECK-NEXT:    vmrghh v4, v3, v4
>> +; CHECK-NEXT:    vmrghh v2, v3, v2
>> +; CHECK-NEXT:    vsplth v3, v3, 3
>> +; CHECK-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-NEXT:    lxvx v4, 0, r3
>>  ; CHECK-NEXT:    li r3, 0
>>  ; CHECK-NEXT:    vperm v2, v2, v3, v4
>> @@ -446,18 +438,17 @@ define void @test8(i8* nocapture readonly %sums,
>> i32 signext %delta, i32 signext
>>  ; CHECK-NEXT:    add r6, r3, r4
>>  ; CHECK-NEXT:    lxsibzx v2, r3, r4
>>  ; CHECK-NEXT:    li r3, 0
>> -; CHECK-NEXT:    mtfprd f0, r3
>> +; CHECK-NEXT:    mtvsrd v3, r3
>>  ; CHECK-NEXT:    li r3, 8
>>  ; CHECK-NEXT:    lxsibzx v5, r6, r3
>> -; CHECK-NEXT:    xxswapd v3, vs0
>> -; CHECK-NEXT:    vspltb v4, v3, 15
>> -; CHECK-NEXT:    vspltb v2, v2, 7
>> -; CHECK-NEXT:    vmrglb v2, v3, v2
>>  ; CHECK-NEXT:    addis r3, r2, .LCPI4_0 at toc@ha
>>  ; CHECK-NEXT:    addi r3, r3, .LCPI4_0 at toc@l
>> +; CHECK-NEXT:    vspltb v2, v2, 7
>> +; CHECK-NEXT:    vmrghb v2, v3, v2
>> +; CHECK-NEXT:    vspltb v4, v3, 7
>>  ; CHECK-NEXT:    vspltb v5, v5, 7
>>  ; CHECK-NEXT:    vmrglh v2, v2, v4
>> -; CHECK-NEXT:    vmrglb v3, v3, v5
>> +; CHECK-NEXT:    vmrghb v3, v3, v5
>>  ; CHECK-NEXT:    vmrglw v2, v2, v4
>>  ; CHECK-NEXT:    vmrglh v3, v3, v4
>>  ; CHECK-NEXT:    vmrglw v3, v4, v3
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> index 099611a7b5e3..50b864980d98 100644
>> --- a/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> +++ b/llvm/test/CodeGen/PowerPC/qpx-load-splat.ll
>> @@ -53,8 +53,7 @@ define <4 x float> @foof(float* nocapture readonly %a)
>> #0 {
>>  ; CHECK-LABEL: foof:
>>  ; CHECK:       # %bb.0: # %entry
>>  ; CHECK-NEXT:    lfiwzx f0, 0, r3
>> -; CHECK-NEXT:    xxswapd vs0, f0
>> -; CHECK-NEXT:    xxspltw v2, vs0, 3
>> +; CHECK-NEXT:    xxspltw v2, vs0, 1
>>  ; CHECK-NEXT:    blr
>>  entry:
>>    %0 = load float, float* %a, align 4
>> @@ -68,8 +67,7 @@ define <4 x float> @foofx(float* nocapture readonly %a,
>> i64 %idx) #0 {
>>  ; CHECK:       # %bb.0: # %entry
>>  ; CHECK-NEXT:    sldi r4, r4, 2
>>  ; CHECK-NEXT:    lfiwzx f0, r3, r4
>> -; CHECK-NEXT:    xxswapd vs0, f0
>> -; CHECK-NEXT:    xxspltw v2, vs0, 3
>> +; CHECK-NEXT:    xxspltw v2, vs0, 1
>>  ; CHECK-NEXT:    blr
>>  entry:
>>    %p = getelementptr float, float* %a, i64 %idx
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> index b43e2c8b97af..c12f7f9a9f05 100644
>> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_1.ll
>> @@ -13,8 +13,7 @@ define <2 x i64> @s2v_test1(i64* nocapture readonly
>> %int64, <2 x i64> %vec) {
>>  ; P9LE-LABEL: s2v_test1:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfd f0, 0(r3)
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test1:
>> @@ -33,8 +32,7 @@ define <2 x i64> @s2v_test2(i64* nocapture readonly
>> %int64, <2 x i64> %vec)  {
>>  ; P9LE-LABEL: s2v_test2:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfd f0, 8(r3)
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test2:
>> @@ -55,8 +53,7 @@ define <2 x i64> @s2v_test3(i64* nocapture readonly
>> %int64, <2 x i64> %vec, i32
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    sldi r4, r7, 3
>>  ; P9LE-NEXT:    lfdx f0, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test3
>> @@ -78,8 +75,7 @@ define <2 x i64> @s2v_test4(i64* nocapture readonly
>> %int64, <2 x i64> %vec)  {
>>  ; P9LE-LABEL: s2v_test4:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfd f0, 8(r3)
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test4:
>> @@ -99,8 +95,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i64*
>> nocapture readonly %ptr1)  {
>>  ; P9LE-LABEL: s2v_test5:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfd f0, 0(r5)
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test5:
>> @@ -119,8 +114,7 @@ define <2 x double> @s2v_test_f1(double* nocapture
>> readonly %f64, <2 x double> %
>>  ; P9LE-LABEL: s2v_test_f1:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfd f0, 0(r3)
>> -; P9LE-NEXT:    xxswapd vs0, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f1:
>> @@ -132,8 +126,7 @@ define <2 x double> @s2v_test_f1(double* nocapture
>> readonly %f64, <2 x double> %
>>  ; P8LE-LABEL: s2v_test_f1:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfdx f0, 0, r3
>> -; P8LE-NEXT:    xxspltd vs0, vs0, 0
>> -; P8LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f1:
>> @@ -152,8 +145,7 @@ define <2 x double> @s2v_test_f2(double* nocapture
>> readonly %f64, <2 x double> %
>>  ; P9LE-LABEL: s2v_test_f2:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfd f0, 8(r3)
>> -; P9LE-NEXT:    xxswapd vs0, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f2:
>> @@ -165,8 +157,7 @@ define <2 x double> @s2v_test_f2(double* nocapture
>> readonly %f64, <2 x double> %
>>  ; P8LE-LABEL: s2v_test_f2:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfd f0, 8(r3)
>> -; P8LE-NEXT:    xxspltd vs0, vs0, 0
>> -; P8LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f2:
>> @@ -187,8 +178,7 @@ define <2 x double> @s2v_test_f3(double* nocapture
>> readonly %f64, <2 x double> %
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    sldi r4, r7, 3
>>  ; P9LE-NEXT:    lfdx f0, r3, r4
>> -; P9LE-NEXT:    xxswapd vs0, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f3:
>> @@ -202,8 +192,7 @@ define <2 x double> @s2v_test_f3(double* nocapture
>> readonly %f64, <2 x double> %
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    sldi r4, r7, 3
>>  ; P8LE-NEXT:    lfdx f0, r3, r4
>> -; P8LE-NEXT:    xxspltd vs0, vs0, 0
>> -; P8LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f3:
>> @@ -225,8 +214,7 @@ define <2 x double> @s2v_test_f4(double* nocapture
>> readonly %f64, <2 x double> %
>>  ; P9LE-LABEL: s2v_test_f4:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfd f0, 8(r3)
>> -; P9LE-NEXT:    xxswapd vs0, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f4:
>> @@ -238,8 +226,7 @@ define <2 x double> @s2v_test_f4(double* nocapture
>> readonly %f64, <2 x double> %
>>  ; P8LE-LABEL: s2v_test_f4:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfd f0, 8(r3)
>> -; P8LE-NEXT:    xxspltd vs0, vs0, 0
>> -; P8LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f4:
>> @@ -259,8 +246,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec,
>> double* nocapture readonly %
>>  ; P9LE-LABEL: s2v_test_f5:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfd f0, 0(r5)
>> -; P9LE-NEXT:    xxswapd vs0, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f5:
>> @@ -272,8 +258,7 @@ define <2 x double> @s2v_test_f5(<2 x double> %vec,
>> double* nocapture readonly %
>>  ; P8LE-LABEL: s2v_test_f5:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfdx f0, 0, r5
>> -; P8LE-NEXT:    xxspltd vs0, vs0, 0
>> -; P8LE-NEXT:    xxpermdi v2, v2, vs0, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f5:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> index 83691b52575d..f4572c359942 100644
>> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
>> @@ -12,8 +12,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly
>> %int32, <2 x i64> %vec)  {
>>  ; P9LE-LABEL: s2v_test1:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfiwax f0, 0, r3
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test1:
>> @@ -25,8 +24,7 @@ define <2 x i64> @s2v_test1(i32* nocapture readonly
>> %int32, <2 x i64> %vec)  {
>>  ; P8LE-LABEL: s2v_test1:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfiwax f0, 0, r3
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test1:
>> @@ -47,8 +45,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly
>> %int32, <2 x i64> %vec)  {
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    addi r3, r3, 4
>>  ; P9LE-NEXT:    lfiwax f0, 0, r3
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test2:
>> @@ -62,8 +59,7 @@ define <2 x i64> @s2v_test2(i32* nocapture readonly
>> %int32, <2 x i64> %vec)  {
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    addi r3, r3, 4
>>  ; P8LE-NEXT:    lfiwax f0, 0, r3
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test2:
>> @@ -86,8 +82,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly
>> %int32, <2 x i64> %vec, i32
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    sldi r4, r7, 2
>>  ; P9LE-NEXT:    lfiwax f0, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test3:
>> @@ -101,8 +96,7 @@ define <2 x i64> @s2v_test3(i32* nocapture readonly
>> %int32, <2 x i64> %vec, i32
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    sldi r4, r7, 2
>>  ; P8LE-NEXT:    lfiwax f0, r3, r4
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test3:
>> @@ -126,8 +120,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly
>> %int32, <2 x i64> %vec)  {
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    addi r3, r3, 4
>>  ; P9LE-NEXT:    lfiwax f0, 0, r3
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test4:
>> @@ -141,8 +134,7 @@ define <2 x i64> @s2v_test4(i32* nocapture readonly
>> %int32, <2 x i64> %vec)  {
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    addi r3, r3, 4
>>  ; P8LE-NEXT:    lfiwax f0, 0, r3
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test4:
>> @@ -164,8 +156,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32*
>> nocapture readonly %ptr1)  {
>>  ; P9LE-LABEL: s2v_test5:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfiwax f0, 0, r5
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P9LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test5:
>> @@ -177,8 +168,7 @@ define <2 x i64> @s2v_test5(<2 x i64> %vec, i32*
>> nocapture readonly %ptr1)  {
>>  ; P8LE-LABEL: s2v_test5:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfiwax f0, 0, r5
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    xxpermdi v2, v2, v3, 1
>> +; P8LE-NEXT:    xxmrghd v2, v2, vs0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test5:
>> @@ -198,8 +188,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly
>> %ptr)  {
>>  ; P9LE-LABEL: s2v_test6:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfiwax f0, 0, r3
>> -; P9LE-NEXT:    xxswapd v2, f0
>> -; P9LE-NEXT:    xxspltd v2, v2, 1
>> +; P9LE-NEXT:    xxspltd v2, vs0, 0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test6:
>> @@ -211,8 +200,7 @@ define <2 x i64> @s2v_test6(i32* nocapture readonly
>> %ptr)  {
>>  ; P8LE-LABEL: s2v_test6:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfiwax f0, 0, r3
>> -; P8LE-NEXT:    xxswapd v2, f0
>> -; P8LE-NEXT:    xxspltd v2, v2, 1
>> +; P8LE-NEXT:    xxspltd v2, vs0, 0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test6:
>> @@ -233,8 +221,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly
>> %ptr)  {
>>  ; P9LE-LABEL: s2v_test7:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    lfiwax f0, 0, r3
>> -; P9LE-NEXT:    xxswapd v2, f0
>> -; P9LE-NEXT:    xxspltd v2, v2, 1
>> +; P9LE-NEXT:    xxspltd v2, vs0, 0
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test7:
>> @@ -246,8 +233,7 @@ define <2 x i64> @s2v_test7(i32* nocapture readonly
>> %ptr)  {
>>  ; P8LE-LABEL: s2v_test7:
>>  ; P8LE:       # %bb.0: # %entry
>>  ; P8LE-NEXT:    lfiwax f0, 0, r3
>> -; P8LE-NEXT:    xxswapd v2, f0
>> -; P8LE-NEXT:    xxspltd v2, v2, 1
>> +; P8LE-NEXT:    xxspltd v2, vs0, 0
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test7:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> index 2261d75c6619..3dc34533420c 100644
>> --- a/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> +++ b/llvm/test/CodeGen/PowerPC/scalar_vector_test_4.ll
>> @@ -11,12 +11,11 @@
>>  define <4 x i32> @s2v_test1(i32* nocapture readonly %int32, <4 x i32>
>> %vec)  {
>>  ; P8LE-LABEL: s2v_test1:
>>  ; P8LE:       # %bb.0: # %entry
>> -; P8LE-NEXT:    lfiwzx f0, 0, r3
>>  ; P8LE-NEXT:    addis r4, r2, .LCPI0_0 at toc@ha
>> -; P8LE-NEXT:    addi r3, r4, .LCPI0_0 at toc@l
>> -; P8LE-NEXT:    lvx v3, 0, r3
>> -; P8LE-NEXT:    xxswapd v4, f0
>> -; P8LE-NEXT:    vperm v2, v4, v2, v3
>> +; P8LE-NEXT:    lxsiwzx v4, 0, r3
>> +; P8LE-NEXT:    addi r4, r4, .LCPI0_0 at toc@l
>> +; P8LE-NEXT:    lvx v3, 0, r4
>> +; P8LE-NEXT:    vperm v2, v2, v4, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test1:
>> @@ -36,13 +35,12 @@ entry:
>>  define <4 x i32> @s2v_test2(i32* nocapture readonly %int32, <4 x i32>
>> %vec)  {
>>  ; P8LE-LABEL: s2v_test2:
>>  ; P8LE:       # %bb.0: # %entry
>> -; P8LE-NEXT:    addi r3, r3, 4
>>  ; P8LE-NEXT:    addis r4, r2, .LCPI1_0 at toc@ha
>> -; P8LE-NEXT:    lfiwzx f0, 0, r3
>> -; P8LE-NEXT:    addi r3, r4, .LCPI1_0 at toc@l
>> -; P8LE-NEXT:    lvx v3, 0, r3
>> -; P8LE-NEXT:    xxswapd v4, f0
>> -; P8LE-NEXT:    vperm v2, v4, v2, v3
>> +; P8LE-NEXT:    addi r3, r3, 4
>> +; P8LE-NEXT:    addi r4, r4, .LCPI1_0 at toc@l
>> +; P8LE-NEXT:    lxsiwzx v4, 0, r3
>> +; P8LE-NEXT:    lvx v3, 0, r4
>> +; P8LE-NEXT:    vperm v2, v2, v4, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test2:
>> @@ -64,13 +62,12 @@ entry:
>>  define <4 x i32> @s2v_test3(i32* nocapture readonly %int32, <4 x i32>
>> %vec, i32 signext %Idx)  {
>>  ; P8LE-LABEL: s2v_test3:
>>  ; P8LE:       # %bb.0: # %entry
>> -; P8LE-NEXT:    sldi r5, r7, 2
>>  ; P8LE-NEXT:    addis r4, r2, .LCPI2_0 at toc@ha
>> -; P8LE-NEXT:    lfiwzx f0, r3, r5
>> -; P8LE-NEXT:    addi r3, r4, .LCPI2_0 at toc@l
>> -; P8LE-NEXT:    lvx v4, 0, r3
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    vperm v2, v3, v2, v4
>> +; P8LE-NEXT:    sldi r5, r7, 2
>> +; P8LE-NEXT:    addi r4, r4, .LCPI2_0 at toc@l
>> +; P8LE-NEXT:    lxsiwzx v3, r3, r5
>> +; P8LE-NEXT:    lvx v4, 0, r4
>> +; P8LE-NEXT:    vperm v2, v2, v3, v4
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test3:
>> @@ -93,13 +90,12 @@ entry:
>>  define <4 x i32> @s2v_test4(i32* nocapture readonly %int32, <4 x i32>
>> %vec)  {
>>  ; P8LE-LABEL: s2v_test4:
>>  ; P8LE:       # %bb.0: # %entry
>> -; P8LE-NEXT:    addi r3, r3, 4
>>  ; P8LE-NEXT:    addis r4, r2, .LCPI3_0 at toc@ha
>> -; P8LE-NEXT:    lfiwzx f0, 0, r3
>> -; P8LE-NEXT:    addi r3, r4, .LCPI3_0 at toc@l
>> -; P8LE-NEXT:    lvx v3, 0, r3
>> -; P8LE-NEXT:    xxswapd v4, f0
>> -; P8LE-NEXT:    vperm v2, v4, v2, v3
>> +; P8LE-NEXT:    addi r3, r3, 4
>> +; P8LE-NEXT:    addi r4, r4, .LCPI3_0 at toc@l
>> +; P8LE-NEXT:    lxsiwzx v4, 0, r3
>> +; P8LE-NEXT:    lvx v3, 0, r4
>> +; P8LE-NEXT:    vperm v2, v2, v4, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test4:
>> @@ -121,12 +117,11 @@ entry:
>>  define <4 x i32> @s2v_test5(<4 x i32> %vec, i32* nocapture readonly
>> %ptr1)  {
>>  ; P8LE-LABEL: s2v_test5:
>>  ; P8LE:       # %bb.0: # %entry
>> -; P8LE-NEXT:    lfiwzx f0, 0, r5
>>  ; P8LE-NEXT:    addis r3, r2, .LCPI4_0 at toc@ha
>> +; P8LE-NEXT:    lxsiwzx v4, 0, r5
>>  ; P8LE-NEXT:    addi r3, r3, .LCPI4_0 at toc@l
>>  ; P8LE-NEXT:    lvx v3, 0, r3
>> -; P8LE-NEXT:    xxswapd v4, f0
>> -; P8LE-NEXT:    vperm v2, v4, v2, v3
>> +; P8LE-NEXT:    vperm v2, v2, v4, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test5:
>> @@ -146,12 +141,11 @@ entry:
>>  define <4 x float> @s2v_test_f1(float* nocapture readonly %f64, <4 x
>> float> %vec)  {
>>  ; P8LE-LABEL: s2v_test_f1:
>>  ; P8LE:       # %bb.0: # %entry
>> -; P8LE-NEXT:    lfiwzx f0, 0, r3
>>  ; P8LE-NEXT:    addis r4, r2, .LCPI5_0 at toc@ha
>> -; P8LE-NEXT:    addi r3, r4, .LCPI5_0 at toc@l
>> -; P8LE-NEXT:    lvx v3, 0, r3
>> -; P8LE-NEXT:    xxswapd v4, f0
>> -; P8LE-NEXT:    vperm v2, v4, v2, v3
>> +; P8LE-NEXT:    lxsiwzx v4, 0, r3
>> +; P8LE-NEXT:    addi r4, r4, .LCPI5_0 at toc@l
>> +; P8LE-NEXT:    lvx v3, 0, r4
>> +; P8LE-NEXT:    vperm v2, v2, v4, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f1:
>> @@ -172,10 +166,9 @@ define <2 x float> @s2v_test_f2(float* nocapture
>> readonly %f64, <2 x float> %vec
>>  ; P9LE-LABEL: s2v_test_f2:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    addi r3, r3, 4
>> -; P9LE-DAG:     xxspltw v2, v2, 2
>> -; P9LE-DAG:     lfiwzx f0, 0, r3
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    vmrglw v2, v2, v3
>> +; P9LE-NEXT:    lxsiwzx v3, 0, r3
>> +; P9LE-NEXT:    vmrglw v2, v2, v2
>> +; P9LE-NEXT:    vmrghw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f2:
>> @@ -189,11 +182,10 @@ define <2 x float> @s2v_test_f2(float* nocapture
>> readonly %f64, <2 x float> %vec
>>
>>  ; P8LE-LABEL: s2v_test_f2:
>>  ; P8LE:       # %bb.0: # %entry
>> +; P8LE-NEXT:    vmrglw v2, v2, v2
>>  ; P8LE-NEXT:    addi r3, r3, 4
>> -; P8LE-NEXT:    xxspltw v2, v2, 2
>> -; P8LE-NEXT:    lfiwzx f0, 0, r3
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    vmrglw v2, v2, v3
>> +; P8LE-NEXT:    lxsiwzx v3, 0, r3
>> +; P8LE-NEXT:    vmrghw v2, v2, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f2:
>> @@ -216,10 +208,9 @@ define <2 x float> @s2v_test_f3(float* nocapture
>> readonly %f64, <2 x float> %vec
>>  ; P9LE-LABEL: s2v_test_f3:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    sldi r4, r7, 2
>> -; P9LE-NEXT:    lfiwzx f0, r3, r4
>> -; P9LE-DAG:     xxspltw v2, v2, 2
>> -; P9LE-DAG:     xxswapd v3, f0
>> -; P9LE-NEXT:    vmrglw v2, v2, v3
>> +; P9LE-NEXT:    lxsiwzx v3, r3, r4
>> +; P9LE-NEXT:    vmrglw v2, v2, v2
>> +; P9LE-NEXT:    vmrghw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f3:
>> @@ -233,11 +224,10 @@ define <2 x float> @s2v_test_f3(float* nocapture
>> readonly %f64, <2 x float> %vec
>>
>>  ; P8LE-LABEL: s2v_test_f3:
>>  ; P8LE:       # %bb.0: # %entry
>> +; P8LE-NEXT:    vmrglw v2, v2, v2
>>  ; P8LE-NEXT:    sldi r4, r7, 2
>> -; P8LE-NEXT:    xxspltw v2, v2, 2
>> -; P8LE-NEXT:    lfiwzx f0, r3, r4
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    vmrglw v2, v2, v3
>> +; P8LE-NEXT:    lxsiwzx v3, r3, r4
>> +; P8LE-NEXT:    vmrghw v2, v2, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f3:
>> @@ -261,10 +251,9 @@ define <2 x float> @s2v_test_f4(float* nocapture
>> readonly %f64, <2 x float> %vec
>>  ; P9LE-LABEL: s2v_test_f4:
>>  ; P9LE:       # %bb.0: # %entry
>>  ; P9LE-NEXT:    addi r3, r3, 4
>> -; P9LE-NEXT:    lfiwzx f0, 0, r3
>> -; P9LE-DAG:     xxspltw v2, v2, 2
>> -; P9LE-DAG:     xxswapd v3, f0
>> -; P9LE-NEXT:    vmrglw v2, v2, v3
>> +; P9LE-NEXT:    lxsiwzx v3, 0, r3
>> +; P9LE-NEXT:    vmrglw v2, v2, v2
>> +; P9LE-NEXT:    vmrghw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f4:
>> @@ -278,11 +267,10 @@ define <2 x float> @s2v_test_f4(float* nocapture
>> readonly %f64, <2 x float> %vec
>>
>>  ; P8LE-LABEL: s2v_test_f4:
>>  ; P8LE:       # %bb.0: # %entry
>> +; P8LE-NEXT:    vmrglw v2, v2, v2
>>  ; P8LE-NEXT:    addi r3, r3, 4
>> -; P8LE-NEXT:    xxspltw v2, v2, 2
>> -; P8LE-NEXT:    lfiwzx f0, 0, r3
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    vmrglw v2, v2, v3
>> +; P8LE-NEXT:    lxsiwzx v3, 0, r3
>> +; P8LE-NEXT:    vmrghw v2, v2, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f4:
>> @@ -304,10 +292,9 @@ entry:
>>  define <2 x float> @s2v_test_f5(<2 x float> %vec, float* nocapture
>> readonly %ptr1)  {
>>  ; P9LE-LABEL: s2v_test_f5:
>>  ; P9LE:       # %bb.0: # %entry
>> -; P9LE-NEXT:    lfiwzx f0, 0, r5
>> -; P9LE-NEXT:    xxspltw v2, v2, 2
>> -; P9LE-NEXT:    xxswapd v3, f0
>> -; P9LE-NEXT:    vmrglw v2, v2, v3
>> +; P9LE-NEXT:    lxsiwzx v3, 0, r5
>> +; P9LE-NEXT:    vmrglw v2, v2, v2
>> +; P9LE-NEXT:    vmrghw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>
>>  ; P9BE-LABEL: s2v_test_f5:
>> @@ -320,10 +307,9 @@ define <2 x float> @s2v_test_f5(<2 x float> %vec,
>> float* nocapture readonly %ptr
>>
>>  ; P8LE-LABEL: s2v_test_f5:
>>  ; P8LE:       # %bb.0: # %entry
>> -; P8LE-NEXT:    lfiwzx f0, 0, r5
>> -; P8LE-NEXT:    xxspltw v2, v2, 2
>> -; P8LE-NEXT:    xxswapd v3, f0
>> -; P8LE-NEXT:    vmrglw v2, v2, v3
>> +; P8LE-NEXT:    vmrglw v2, v2, v2
>> +; P8LE-NEXT:    lxsiwzx v3, 0, r5
>> +; P8LE-NEXT:    vmrghw v2, v2, v3
>>  ; P8LE-NEXT:    blr
>>
>>  ; P8BE-LABEL: s2v_test_f5:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> index 935630745f47..097ba07a5b1e 100644
>> --- a/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> +++ b/llvm/test/CodeGen/PowerPC/srem-vector-lkk.ll
>> @@ -13,60 +13,56 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 0
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, -21386
>> -; P9LE-NEXT:    ori r5, r5, 37253
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r5, r4, r5
>> -; P9LE-NEXT:    add r4, r5, r4
>> +; P9LE-NEXT:    lis r4, -21386
>> +; P9LE-NEXT:    ori r4, r4, 37253
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>> +; P9LE-NEXT:    add r4, r4, r3
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 6
>>  ; P9LE-NEXT:    add r4, r4, r5
>> -; P9LE-NEXT:    lis r5, 31710
>>  ; P9LE-NEXT:    mulli r4, r4, 95
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, 31710
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    ori r5, r5, 63421
>> -; P9LE-NEXT:    mulhw r5, r4, r5
>> -; P9LE-NEXT:    sub r4, r5, r4
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    ori r4, r4, 63421
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>> +; P9LE-NEXT:    sub r4, r4, r3
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 6
>>  ; P9LE-NEXT:    add r4, r4, r5
>> -; P9LE-NEXT:    lis r5, 21399
>>  ; P9LE-NEXT:    mulli r4, r4, -124
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, 21399
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    ori r5, r5, 33437
>> -; P9LE-NEXT:    mulhw r4, r4, r5
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    ori r4, r4, 33437
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 5
>>  ; P9LE-NEXT:    add r4, r4, r5
>> -; P9LE-NEXT:    lis r5, -16728
>>  ; P9LE-NEXT:    mulli r4, r4, 98
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    ori r5, r5, 63249
>> -; P9LE-NEXT:    mulhw r4, r4, r5
>> +; P9LE-NEXT:    lis r4, -16728
>> +; P9LE-NEXT:    ori r4, r4, 63249
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 8
>>  ; P9LE-NEXT:    add r4, r4, r5
>>  ; P9LE-NEXT:    mulli r4, r4, -1003
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v2, v4
>>  ; P9LE-NEXT:    vmrglw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -135,58 +131,54 @@ define <4 x i16> @fold_srem_vec_1(<4 x i16> %x) {
>>  ; P8LE:       # %bb.0:
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>>  ; P8LE-NEXT:    lis r3, 21399
>> -; P8LE-NEXT:    lis r9, -21386
>> -; P8LE-NEXT:    lis r11, 31710
>>  ; P8LE-NEXT:    lis r8, -16728
>> +; P8LE-NEXT:    lis r9, -21386
>> +; P8LE-NEXT:    lis r10, 31710
>>  ; P8LE-NEXT:    ori r3, r3, 33437
>> -; P8LE-NEXT:    ori r9, r9, 37253
>>  ; P8LE-NEXT:    ori r8, r8, 63249
>> +; P8LE-NEXT:    ori r9, r9, 37253
>> +; P8LE-NEXT:    ori r10, r10, 63421
>>  ; P8LE-NEXT:    mffprd r4, f0
>>  ; P8LE-NEXT:    rldicl r5, r4, 32, 48
>> -; P8LE-NEXT:    clrldi r7, r4, 48
>>  ; P8LE-NEXT:    rldicl r6, r4, 16, 48
>> +; P8LE-NEXT:    clrldi r7, r4, 48
>> +; P8LE-NEXT:    extsh r5, r5
>> +; P8LE-NEXT:    extsh r6, r6
>>  ; P8LE-NEXT:    rldicl r4, r4, 48, 48
>> -; P8LE-NEXT:    extsh r10, r5
>> -; P8LE-NEXT:    extsh r0, r7
>> -; P8LE-NEXT:    mulhw r3, r10, r3
>> -; P8LE-NEXT:    ori r10, r11, 63421
>> -; P8LE-NEXT:    extsh r11, r4
>> -; P8LE-NEXT:    extsh r12, r6
>> -; P8LE-NEXT:    mulhw r9, r0, r9
>> -; P8LE-NEXT:    mulhw r10, r11, r10
>> -; P8LE-NEXT:    mulhw r8, r12, r8
>> -; P8LE-NEXT:    srwi r12, r3, 31
>> +; P8LE-NEXT:    extsh r7, r7
>> +; P8LE-NEXT:    mulhw r3, r5, r3
>> +; P8LE-NEXT:    extsh r4, r4
>> +; P8LE-NEXT:    mulhw r8, r6, r8
>> +; P8LE-NEXT:    mulhw r9, r7, r9
>> +; P8LE-NEXT:    mulhw r10, r4, r10
>> +; P8LE-NEXT:    srwi r11, r3, 31
>>  ; P8LE-NEXT:    srawi r3, r3, 5
>> -; P8LE-NEXT:    add r9, r9, r0
>> -; P8LE-NEXT:    sub r10, r10, r11
>> -; P8LE-NEXT:    add r3, r3, r12
>> +; P8LE-NEXT:    add r3, r3, r11
>> +; P8LE-NEXT:    srwi r11, r8, 31
>> +; P8LE-NEXT:    add r9, r9, r7
>> +; P8LE-NEXT:    srawi r8, r8, 8
>> +; P8LE-NEXT:    sub r10, r10, r4
>> +; P8LE-NEXT:    add r8, r8, r11
>>  ; P8LE-NEXT:    srwi r11, r9, 31
>>  ; P8LE-NEXT:    srawi r9, r9, 6
>> -; P8LE-NEXT:    srwi r12, r8, 31
>> -; P8LE-NEXT:    srawi r8, r8, 8
>> +; P8LE-NEXT:    mulli r3, r3, 98
>>  ; P8LE-NEXT:    add r9, r9, r11
>>  ; P8LE-NEXT:    srwi r11, r10, 31
>>  ; P8LE-NEXT:    srawi r10, r10, 6
>> -; P8LE-NEXT:    add r8, r8, r12
>> -; P8LE-NEXT:    mulli r3, r3, 98
>> -; P8LE-NEXT:    add r10, r10, r11
>>  ; P8LE-NEXT:    mulli r8, r8, -1003
>> +; P8LE-NEXT:    add r10, r10, r11
>>  ; P8LE-NEXT:    mulli r9, r9, 95
>>  ; P8LE-NEXT:    mulli r10, r10, -124
>>  ; P8LE-NEXT:    sub r3, r5, r3
>> +; P8LE-NEXT:    mtvsrd v2, r3
>>  ; P8LE-NEXT:    sub r5, r6, r8
>> -; P8LE-NEXT:    mtfprd f0, r3
>>  ; P8LE-NEXT:    sub r3, r7, r9
>> +; P8LE-NEXT:    mtvsrd v3, r5
>>  ; P8LE-NEXT:    sub r4, r4, r10
>> -; P8LE-NEXT:    mtfprd f1, r5
>> -; P8LE-NEXT:    mtfprd f2, r3
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    mtfprd f3, r4
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    xxswapd v5, vs3
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    vmrglh v3, v5, v4
>> +; P8LE-NEXT:    mtvsrd v4, r3
>> +; P8LE-NEXT:    mtvsrd v5, r4
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>> +; P8LE-NEXT:    vmrghh v3, v5, v4
>>  ; P8LE-NEXT:    vmrglw v2, v2, v3
>>  ; P8LE-NEXT:    blr
>>  ;
>> @@ -256,56 +248,52 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 0
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, -21386
>> -; P9LE-NEXT:    ori r5, r5, 37253
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r6, r4, r5
>> -; P9LE-NEXT:    add r4, r6, r4
>> -; P9LE-NEXT:    srwi r6, r4, 31
>> -; P9LE-NEXT:    srawi r4, r4, 6
>> -; P9LE-NEXT:    add r4, r4, r6
>> -; P9LE-NEXT:    mulli r4, r4, 95
>> -; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, -21386
>> +; P9LE-NEXT:    ori r4, r4, 37253
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r5, r3, r4
>> +; P9LE-NEXT:    add r5, r5, r3
>> +; P9LE-NEXT:    srwi r6, r5, 31
>> +; P9LE-NEXT:    srawi r5, r5, 6
>> +; P9LE-NEXT:    add r5, r5, r6
>> +; P9LE-NEXT:    mulli r5, r5, 95
>> +; P9LE-NEXT:    sub r3, r3, r5
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r6, r4, r5
>> -; P9LE-NEXT:    add r4, r6, r4
>> -; P9LE-NEXT:    srwi r6, r4, 31
>> -; P9LE-NEXT:    srawi r4, r4, 6
>> -; P9LE-NEXT:    add r4, r4, r6
>> -; P9LE-NEXT:    mulli r4, r4, 95
>> -; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r5, r3, r4
>> +; P9LE-NEXT:    add r5, r5, r3
>> +; P9LE-NEXT:    srwi r6, r5, 31
>> +; P9LE-NEXT:    srawi r5, r5, 6
>> +; P9LE-NEXT:    add r5, r5, r6
>> +; P9LE-NEXT:    mulli r5, r5, 95
>> +; P9LE-NEXT:    sub r3, r3, r5
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r6, r4, r5
>> -; P9LE-NEXT:    add r4, r6, r4
>> -; P9LE-NEXT:    srwi r6, r4, 31
>> -; P9LE-NEXT:    srawi r4, r4, 6
>> -; P9LE-NEXT:    add r4, r4, r6
>> -; P9LE-NEXT:    mulli r4, r4, 95
>> -; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r5, r3, r4
>> +; P9LE-NEXT:    add r5, r5, r3
>> +; P9LE-NEXT:    srwi r6, r5, 31
>> +; P9LE-NEXT:    srawi r5, r5, 6
>> +; P9LE-NEXT:    add r5, r5, r6
>> +; P9LE-NEXT:    mulli r5, r5, 95
>> +; P9LE-NEXT:    sub r3, r3, r5
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r5, r4, r5
>> -; P9LE-NEXT:    add r4, r5, r4
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>> +; P9LE-NEXT:    add r4, r4, r3
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 6
>>  ; P9LE-NEXT:    add r4, r4, r5
>>  ; P9LE-NEXT:    mulli r4, r4, 95
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v2, v4
>>  ; P9LE-NEXT:    vmrglw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -370,56 +358,50 @@ define <4 x i16> @fold_srem_vec_2(<4 x i16> %x) {
>>  ; P8LE:       # %bb.0:
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>>  ; P8LE-NEXT:    lis r3, -21386
>> -; P8LE-NEXT:    std r30, -16(r1) # 8-byte Folded Spill
>>  ; P8LE-NEXT:    ori r3, r3, 37253
>>  ; P8LE-NEXT:    mffprd r4, f0
>>  ; P8LE-NEXT:    clrldi r5, r4, 48
>>  ; P8LE-NEXT:    rldicl r6, r4, 48, 48
>> -; P8LE-NEXT:    extsh r8, r5
>> +; P8LE-NEXT:    extsh r5, r5
>>  ; P8LE-NEXT:    rldicl r7, r4, 32, 48
>> -; P8LE-NEXT:    extsh r9, r6
>> -; P8LE-NEXT:    mulhw r10, r8, r3
>> +; P8LE-NEXT:    extsh r6, r6
>> +; P8LE-NEXT:    mulhw r8, r5, r3
>>  ; P8LE-NEXT:    rldicl r4, r4, 16, 48
>> -; P8LE-NEXT:    extsh r11, r7
>> -; P8LE-NEXT:    mulhw r12, r9, r3
>> -; P8LE-NEXT:    extsh r0, r4
>> -; P8LE-NEXT:    mulhw r30, r11, r3
>> -; P8LE-NEXT:    mulhw r3, r0, r3
>> -; P8LE-NEXT:    add r8, r10, r8
>> -; P8LE-NEXT:    add r9, r12, r9
>> -; P8LE-NEXT:    srwi r10, r8, 31
>> +; P8LE-NEXT:    extsh r7, r7
>> +; P8LE-NEXT:    mulhw r9, r6, r3
>> +; P8LE-NEXT:    extsh r4, r4
>> +; P8LE-NEXT:    mulhw r10, r7, r3
>> +; P8LE-NEXT:    mulhw r3, r4, r3
>> +; P8LE-NEXT:    add r8, r8, r5
>> +; P8LE-NEXT:    add r9, r9, r6
>> +; P8LE-NEXT:    srwi r11, r8, 31
>>  ; P8LE-NEXT:    srawi r8, r8, 6
>> -; P8LE-NEXT:    add r11, r30, r11
>> -; P8LE-NEXT:    add r3, r3, r0
>> -; P8LE-NEXT:    ld r30, -16(r1) # 8-byte Folded Reload
>> -; P8LE-NEXT:    add r8, r8, r10
>> -; P8LE-NEXT:    srwi r10, r9, 31
>> +; P8LE-NEXT:    add r10, r10, r7
>> +; P8LE-NEXT:    add r3, r3, r4
>> +; P8LE-NEXT:    add r8, r8, r11
>> +; P8LE-NEXT:    srwi r11, r9, 31
>>  ; P8LE-NEXT:    srawi r9, r9, 6
>>  ; P8LE-NEXT:    mulli r8, r8, 95
>> -; P8LE-NEXT:    add r9, r9, r10
>> -; P8LE-NEXT:    srwi r10, r11, 31
>> -; P8LE-NEXT:    srawi r11, r11, 6
>> +; P8LE-NEXT:    add r9, r9, r11
>> +; P8LE-NEXT:    srwi r11, r10, 31
>> +; P8LE-NEXT:    srawi r10, r10, 6
>>  ; P8LE-NEXT:    mulli r9, r9, 95
>> -; P8LE-NEXT:    add r10, r11, r10
>> +; P8LE-NEXT:    add r10, r10, r11
>>  ; P8LE-NEXT:    srwi r11, r3, 31
>>  ; P8LE-NEXT:    srawi r3, r3, 6
>>  ; P8LE-NEXT:    mulli r10, r10, 95
>>  ; P8LE-NEXT:    sub r5, r5, r8
>>  ; P8LE-NEXT:    add r3, r3, r11
>> -; P8LE-NEXT:    mtfprd f0, r5
>> +; P8LE-NEXT:    mtvsrd v2, r5
>>  ; P8LE-NEXT:    mulli r3, r3, 95
>>  ; P8LE-NEXT:    sub r6, r6, r9
>> -; P8LE-NEXT:    mtfprd f1, r6
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> +; P8LE-NEXT:    mtvsrd v3, r6
>>  ; P8LE-NEXT:    sub r5, r7, r10
>> -; P8LE-NEXT:    mtfprd f2, r5
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> +; P8LE-NEXT:    mtvsrd v4, r5
>>  ; P8LE-NEXT:    sub r3, r4, r3
>> -; P8LE-NEXT:    mtfprd f3, r3
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    xxswapd v5, vs3
>> -; P8LE-NEXT:    vmrglh v3, v5, v4
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>> +; P8LE-NEXT:    mtvsrd v5, r3
>> +; P8LE-NEXT:    vmrghh v3, v5, v4
>>  ; P8LE-NEXT:    vmrglw v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>> @@ -487,67 +469,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 0
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, -21386
>> -; P9LE-NEXT:    ori r5, r5, 37253
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r6, r4, r5
>> -; P9LE-NEXT:    add r4, r6, r4
>> -; P9LE-NEXT:    srwi r6, r4, 31
>> -; P9LE-NEXT:    srawi r4, r4, 6
>> -; P9LE-NEXT:    add r4, r4, r6
>> -; P9LE-NEXT:    mulli r6, r4, 95
>> +; P9LE-NEXT:    lis r4, -21386
>> +; P9LE-NEXT:    ori r4, r4, 37253
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r5, r3, r4
>> +; P9LE-NEXT:    add r5, r5, r3
>> +; P9LE-NEXT:    srwi r6, r5, 31
>> +; P9LE-NEXT:    srawi r5, r5, 6
>> +; P9LE-NEXT:    add r5, r5, r6
>> +; P9LE-NEXT:    mulli r6, r5, 95
>>  ; P9LE-NEXT:    sub r3, r3, r6
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    extsh r6, r3
>> -; P9LE-NEXT:    mulhw r7, r6, r5
>> +; P9LE-NEXT:    mulhw r7, r6, r4
>>  ; P9LE-NEXT:    add r6, r7, r6
>>  ; P9LE-NEXT:    srwi r7, r6, 31
>>  ; P9LE-NEXT:    srawi r6, r6, 6
>>  ; P9LE-NEXT:    add r6, r6, r7
>>  ; P9LE-NEXT:    mulli r7, r6, 95
>>  ; P9LE-NEXT:    sub r3, r3, r7
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    extsh r7, r3
>> -; P9LE-NEXT:    mulhw r8, r7, r5
>> +; P9LE-NEXT:    mulhw r8, r7, r4
>>  ; P9LE-NEXT:    add r7, r8, r7
>>  ; P9LE-NEXT:    srwi r8, r7, 31
>>  ; P9LE-NEXT:    srawi r7, r7, 6
>>  ; P9LE-NEXT:    add r7, r7, r8
>>  ; P9LE-NEXT:    mulli r8, r7, 95
>>  ; P9LE-NEXT:    sub r3, r3, r8
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    extsh r8, r3
>> -; P9LE-NEXT:    mulhw r5, r8, r5
>> -; P9LE-NEXT:    add r5, r5, r8
>> -; P9LE-NEXT:    srwi r8, r5, 31
>> -; P9LE-NEXT:    srawi r5, r5, 6
>> -; P9LE-NEXT:    add r5, r5, r8
>> -; P9LE-NEXT:    mulli r8, r5, 95
>> +; P9LE-NEXT:    mulhw r4, r8, r4
>> +; P9LE-NEXT:    add r4, r4, r8
>> +; P9LE-NEXT:    srwi r8, r4, 31
>> +; P9LE-NEXT:    srawi r4, r4, 6
>> +; P9LE-NEXT:    add r4, r4, r8
>> +; P9LE-NEXT:    mulli r8, r4, 95
>>  ; P9LE-NEXT:    sub r3, r3, r8
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    mtfprd f0, r4
>> -; P9LE-NEXT:    vmrglh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v4, r6
>>  ; P9LE-NEXT:    vmrglw v2, v2, v3
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r6
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r7
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r5
>> -; P9LE-NEXT:    xxswapd v5, vs0
>> -; P9LE-NEXT:    vmrglh v4, v5, v4
>> +; P9LE-NEXT:    mtvsrd v3, r5
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r7
>> +; P9LE-NEXT:    mtvsrd v5, r4
>> +; P9LE-NEXT:    vmrghh v4, v5, v4
>>  ; P9LE-NEXT:    vmrglw v3, v4, v3
>>  ; P9LE-NEXT:    vadduhm v2, v2, v3
>>  ; P9LE-NEXT:    blr
>> @@ -624,69 +598,59 @@ define <4 x i16> @combine_srem_sdiv(<4 x i16> %x) {
>>  ; P8LE-LABEL: combine_srem_sdiv:
>>  ; P8LE:       # %bb.0:
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>> -; P8LE-NEXT:    lis r4, -21386
>> -; P8LE-NEXT:    std r30, -16(r1) # 8-byte Folded Spill
>> -; P8LE-NEXT:    ori r4, r4, 37253
>> -; P8LE-NEXT:    mffprd r5, f0
>> -; P8LE-NEXT:    clrldi r3, r5, 48
>> -; P8LE-NEXT:    rldicl r6, r5, 48, 48
>> -; P8LE-NEXT:    rldicl r7, r5, 32, 48
>> -; P8LE-NEXT:    extsh r8, r3
>> -; P8LE-NEXT:    extsh r9, r6
>> -; P8LE-NEXT:    extsh r10, r7
>> -; P8LE-NEXT:    mulhw r11, r8, r4
>> -; P8LE-NEXT:    rldicl r5, r5, 16, 48
>> -; P8LE-NEXT:    mulhw r12, r9, r4
>> -; P8LE-NEXT:    mulhw r0, r10, r4
>> -; P8LE-NEXT:    extsh r30, r5
>> -; P8LE-NEXT:    mulhw r4, r30, r4
>> +; P8LE-NEXT:    lis r3, -21386
>> +; P8LE-NEXT:    ori r3, r3, 37253
>> +; P8LE-NEXT:    mffprd r4, f0
>> +; P8LE-NEXT:    clrldi r5, r4, 48
>> +; P8LE-NEXT:    rldicl r6, r4, 48, 48
>> +; P8LE-NEXT:    rldicl r7, r4, 32, 48
>> +; P8LE-NEXT:    extsh r5, r5
>> +; P8LE-NEXT:    extsh r8, r6
>> +; P8LE-NEXT:    extsh r9, r7
>> +; P8LE-NEXT:    mulhw r10, r5, r3
>> +; P8LE-NEXT:    mulhw r11, r8, r3
>> +; P8LE-NEXT:    rldicl r4, r4, 16, 48
>> +; P8LE-NEXT:    mulhw r12, r9, r3
>> +; P8LE-NEXT:    extsh r0, r4
>> +; P8LE-NEXT:    mulhw r3, r0, r3
>> +; P8LE-NEXT:    add r10, r10, r5
>>  ; P8LE-NEXT:    add r8, r11, r8
>> +; P8LE-NEXT:    srwi r11, r10, 31
>>  ; P8LE-NEXT:    add r9, r12, r9
>> -; P8LE-NEXT:    srwi r11, r8, 31
>> -; P8LE-NEXT:    add r10, r0, r10
>> -; P8LE-NEXT:    srawi r8, r8, 6
>> -; P8LE-NEXT:    srawi r12, r9, 6
>> +; P8LE-NEXT:    srawi r10, r10, 6
>> +; P8LE-NEXT:    srawi r12, r8, 6
>> +; P8LE-NEXT:    srwi r8, r8, 31
>> +; P8LE-NEXT:    add r10, r10, r11
>> +; P8LE-NEXT:    add r3, r3, r0
>> +; P8LE-NEXT:    srawi r11, r9, 6
>>  ; P8LE-NEXT:    srwi r9, r9, 31
>> -; P8LE-NEXT:    add r8, r8, r11
>> -; P8LE-NEXT:    add r4, r4, r30
>> -; P8LE-NEXT:    ld r30, -16(r1) # 8-byte Folded Reload
>> -; P8LE-NEXT:    srawi r11, r10, 6
>> -; P8LE-NEXT:    srwi r10, r10, 31
>> -; P8LE-NEXT:    add r9, r12, r9
>> -; P8LE-NEXT:    mtfprd f0, r8
>> -; P8LE-NEXT:    mulli r12, r8, 95
>> -; P8LE-NEXT:    add r10, r11, r10
>> -; P8LE-NEXT:    srwi r8, r4, 31
>> -; P8LE-NEXT:    mtfprd f1, r9
>> -; P8LE-NEXT:    srawi r4, r4, 6
>> -; P8LE-NEXT:    mulli r11, r9, 95
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    mtfprd f2, r10
>> -; P8LE-NEXT:    mulli r9, r10, 95
>> -; P8LE-NEXT:    add r4, r4, r8
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> -; P8LE-NEXT:    mtfprd f3, r4
>> -; P8LE-NEXT:    mulli r4, r4, 95
>> -; P8LE-NEXT:    xxswapd v1, vs2
>> -; P8LE-NEXT:    sub r3, r3, r12
>> -; P8LE-NEXT:    mtfprd f0, r3
>> -; P8LE-NEXT:    sub r6, r6, r11
>> -; P8LE-NEXT:    xxswapd v6, vs3
>> -; P8LE-NEXT:    sub r3, r7, r9
>> -; P8LE-NEXT:    mtfprd f1, r6
>> -; P8LE-NEXT:    mtfprd f4, r3
>> -; P8LE-NEXT:    sub r3, r5, r4
>> -; P8LE-NEXT:    mtfprd f5, r3
>> -; P8LE-NEXT:    xxswapd v4, vs1
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    xxswapd v3, vs0
>> -; P8LE-NEXT:    xxswapd v5, vs4
>> -; P8LE-NEXT:    xxswapd v0, vs5
>> -; P8LE-NEXT:    vmrglh v3, v4, v3
>> -; P8LE-NEXT:    vmrglh v4, v0, v5
>> -; P8LE-NEXT:    vmrglh v5, v6, v1
>> -; P8LE-NEXT:    vmrglw v3, v4, v3
>> -; P8LE-NEXT:    vmrglw v2, v5, v2
>> +; P8LE-NEXT:    add r8, r12, r8
>> +; P8LE-NEXT:    mtvsrd v2, r10
>> +; P8LE-NEXT:    mulli r12, r10, 95
>> +; P8LE-NEXT:    add r9, r11, r9
>> +; P8LE-NEXT:    srwi r11, r3, 31
>> +; P8LE-NEXT:    mtvsrd v3, r8
>> +; P8LE-NEXT:    srawi r3, r3, 6
>> +; P8LE-NEXT:    mulli r10, r8, 95
>> +; P8LE-NEXT:    mtvsrd v4, r9
>> +; P8LE-NEXT:    add r3, r3, r11
>> +; P8LE-NEXT:    mulli r8, r9, 95
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>> +; P8LE-NEXT:    mulli r9, r3, 95
>> +; P8LE-NEXT:    sub r5, r5, r12
>> +; P8LE-NEXT:    sub r6, r6, r10
>> +; P8LE-NEXT:    mtvsrd v3, r5
>> +; P8LE-NEXT:    mtvsrd v5, r6
>> +; P8LE-NEXT:    sub r5, r7, r8
>> +; P8LE-NEXT:    sub r4, r4, r9
>> +; P8LE-NEXT:    mtvsrd v0, r5
>> +; P8LE-NEXT:    mtvsrd v1, r4
>> +; P8LE-NEXT:    vmrghh v3, v5, v3
>> +; P8LE-NEXT:    mtvsrd v5, r3
>> +; P8LE-NEXT:    vmrghh v0, v1, v0
>> +; P8LE-NEXT:    vmrghh v4, v5, v4
>> +; P8LE-NEXT:    vmrglw v3, v0, v3
>> +; P8LE-NEXT:    vmrglw v2, v4, v2
>>  ; P8LE-NEXT:    vadduhm v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>> @@ -767,47 +731,43 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x
>> i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 0
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    srawi r4, r4, 6
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    srawi r4, r3, 6
>>  ; P9LE-NEXT:    addze r4, r4
>>  ; P9LE-NEXT:    slwi r4, r4, 6
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    srawi r4, r4, 5
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    srawi r4, r3, 5
>>  ; P9LE-NEXT:    addze r4, r4
>>  ; P9LE-NEXT:    slwi r4, r4, 5
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, -21386
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, -21386
>> -; P9LE-NEXT:    ori r5, r5, 37253
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r5, r4, r5
>> -; P9LE-NEXT:    add r4, r5, r4
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    ori r4, r4, 37253
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>> +; P9LE-NEXT:    add r4, r4, r3
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 6
>>  ; P9LE-NEXT:    add r4, r4, r5
>>  ; P9LE-NEXT:    mulli r4, r4, 95
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    srawi r4, r4, 3
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    srawi r4, r3, 3
>>  ; P9LE-NEXT:    addze r4, r4
>>  ; P9LE-NEXT:    slwi r4, r4, 3
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v2, v4, v2
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v4, v2
>>  ; P9LE-NEXT:    vmrglw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -866,42 +826,38 @@ define <4 x i16> @dont_fold_srem_power_of_two(<4 x
>> i16> %x) {
>>  ; P8LE-NEXT:    ori r3, r3, 37253
>>  ; P8LE-NEXT:    mffprd r4, f0
>>  ; P8LE-NEXT:    rldicl r5, r4, 16, 48
>> -; P8LE-NEXT:    clrldi r7, r4, 48
>> -; P8LE-NEXT:    extsh r6, r5
>> -; P8LE-NEXT:    extsh r8, r7
>> -; P8LE-NEXT:    mulhw r3, r6, r3
>> -; P8LE-NEXT:    rldicl r9, r4, 48, 48
>> -; P8LE-NEXT:    srawi r8, r8, 6
>> -; P8LE-NEXT:    extsh r10, r9
>> +; P8LE-NEXT:    clrldi r6, r4, 48
>> +; P8LE-NEXT:    extsh r5, r5
>> +; P8LE-NEXT:    extsh r6, r6
>> +; P8LE-NEXT:    mulhw r3, r5, r3
>> +; P8LE-NEXT:    rldicl r7, r4, 48, 48
>> +; P8LE-NEXT:    srawi r8, r6, 6
>> +; P8LE-NEXT:    extsh r7, r7
>>  ; P8LE-NEXT:    addze r8, r8
>>  ; P8LE-NEXT:    rldicl r4, r4, 32, 48
>> -; P8LE-NEXT:    srawi r10, r10, 5
>> +; P8LE-NEXT:    srawi r9, r7, 5
>> +; P8LE-NEXT:    extsh r4, r4
>>  ; P8LE-NEXT:    slwi r8, r8, 6
>> -; P8LE-NEXT:    add r3, r3, r6
>> -; P8LE-NEXT:    addze r6, r10
>> -; P8LE-NEXT:    sub r7, r7, r8
>> +; P8LE-NEXT:    add r3, r3, r5
>> +; P8LE-NEXT:    addze r9, r9
>> +; P8LE-NEXT:    sub r6, r6, r8
>>  ; P8LE-NEXT:    srwi r10, r3, 31
>>  ; P8LE-NEXT:    srawi r3, r3, 6
>> -; P8LE-NEXT:    mtfprd f0, r7
>> -; P8LE-NEXT:    slwi r6, r6, 5
>> +; P8LE-NEXT:    slwi r8, r9, 5
>> +; P8LE-NEXT:    mtvsrd v2, r6
>>  ; P8LE-NEXT:    add r3, r3, r10
>> -; P8LE-NEXT:    extsh r10, r4
>> -; P8LE-NEXT:    sub r6, r9, r6
>> +; P8LE-NEXT:    srawi r9, r4, 3
>> +; P8LE-NEXT:    sub r6, r7, r8
>>  ; P8LE-NEXT:    mulli r3, r3, 95
>> -; P8LE-NEXT:    srawi r8, r10, 3
>> -; P8LE-NEXT:    mtfprd f1, r6
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    addze r7, r8
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> +; P8LE-NEXT:    addze r7, r9
>> +; P8LE-NEXT:    mtvsrd v3, r6
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>>  ; P8LE-NEXT:    sub r3, r5, r3
>>  ; P8LE-NEXT:    slwi r5, r7, 3
>>  ; P8LE-NEXT:    sub r4, r4, r5
>> -; P8LE-NEXT:    mtfprd f2, r3
>> -; P8LE-NEXT:    mtfprd f3, r4
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    xxswapd v5, vs3
>> -; P8LE-NEXT:    vmrglh v3, v4, v5
>> +; P8LE-NEXT:    mtvsrd v4, r3
>> +; P8LE-NEXT:    mtvsrd v5, r4
>> +; P8LE-NEXT:    vmrghh v3, v4, v5
>>  ; P8LE-NEXT:    vmrglw v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>> @@ -959,48 +915,46 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, -14230
>> -; P9LE-NEXT:    ori r5, r5, 30865
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r5, r4, r5
>> -; P9LE-NEXT:    add r4, r5, r4
>> +; P9LE-NEXT:    lis r4, -14230
>> +; P9LE-NEXT:    ori r4, r4, 30865
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>> +; P9LE-NEXT:    add r4, r4, r3
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 9
>>  ; P9LE-NEXT:    add r4, r4, r5
>> -; P9LE-NEXT:    lis r5, -19946
>>  ; P9LE-NEXT:    mulli r4, r4, 654
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, -19946
>> +; P9LE-NEXT:    mtvsrd v3, r3
>> +; P9LE-NEXT:    li r3, 0
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 4
>> -; P9LE-NEXT:    ori r5, r5, 17097
>> -; P9LE-NEXT:    xxlxor v3, v3, v3
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r5, r4, r5
>> -; P9LE-NEXT:    add r4, r5, r4
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    ori r4, r4, 17097
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>> +; P9LE-NEXT:    add r4, r4, r3
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 4
>>  ; P9LE-NEXT:    add r4, r4, r5
>> -; P9LE-NEXT:    lis r5, 24749
>>  ; P9LE-NEXT:    mulli r4, r4, 23
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    vmrghh v3, v3, v4
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    ori r5, r5, 47143
>> -; P9LE-NEXT:    mulhw r4, r4, r5
>> +; P9LE-NEXT:    lis r4, 24749
>> +; P9LE-NEXT:    ori r4, r4, 47143
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 11
>>  ; P9LE-NEXT:    add r4, r4, r5
>>  ; P9LE-NEXT:    mulli r4, r4, 5423
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v2, v4
>>  ; P9LE-NEXT:    vmrglw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -1058,49 +1012,47 @@ define <4 x i16> @dont_fold_srem_one(<4 x i16>
>> %x) {
>>  ; P8LE-LABEL: dont_fold_srem_one:
>>  ; P8LE:       # %bb.0:
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>> -; P8LE-NEXT:    lis r3, 24749
>> -; P8LE-NEXT:    lis r7, -19946
>> -; P8LE-NEXT:    lis r9, -14230
>> -; P8LE-NEXT:    xxlxor v5, v5, v5
>> -; P8LE-NEXT:    ori r3, r3, 47143
>> -; P8LE-NEXT:    ori r7, r7, 17097
>> -; P8LE-NEXT:    mffprd r4, f0
>> -; P8LE-NEXT:    rldicl r5, r4, 16, 48
>> -; P8LE-NEXT:    rldicl r6, r4, 32, 48
>> -; P8LE-NEXT:    rldicl r4, r4, 48, 48
>> -; P8LE-NEXT:    extsh r8, r5
>> -; P8LE-NEXT:    extsh r10, r6
>> -; P8LE-NEXT:    mulhw r3, r8, r3
>> -; P8LE-NEXT:    ori r8, r9, 30865
>> -; P8LE-NEXT:    extsh r9, r4
>> -; P8LE-NEXT:    mulhw r7, r10, r7
>> -; P8LE-NEXT:    mulhw r8, r9, r8
>> -; P8LE-NEXT:    add r7, r7, r10
>> -; P8LE-NEXT:    srwi r10, r3, 31
>> -; P8LE-NEXT:    add r8, r8, r9
>> -; P8LE-NEXT:    srawi r3, r3, 11
>> -; P8LE-NEXT:    srwi r9, r7, 31
>> -; P8LE-NEXT:    srawi r7, r7, 4
>> -; P8LE-NEXT:    add r3, r3, r10
>> -; P8LE-NEXT:    add r7, r7, r9
>> +; P8LE-NEXT:    lis r5, 24749
>> +; P8LE-NEXT:    lis r6, -19946
>> +; P8LE-NEXT:    lis r8, -14230
>> +; P8LE-NEXT:    ori r5, r5, 47143
>> +; P8LE-NEXT:    ori r6, r6, 17097
>> +; P8LE-NEXT:    ori r8, r8, 30865
>> +; P8LE-NEXT:    mffprd r3, f0
>> +; P8LE-NEXT:    rldicl r4, r3, 16, 48
>> +; P8LE-NEXT:    rldicl r7, r3, 32, 48
>> +; P8LE-NEXT:    rldicl r3, r3, 48, 48
>> +; P8LE-NEXT:    extsh r4, r4
>> +; P8LE-NEXT:    extsh r7, r7
>> +; P8LE-NEXT:    extsh r3, r3
>> +; P8LE-NEXT:    mulhw r5, r4, r5
>> +; P8LE-NEXT:    mulhw r6, r7, r6
>> +; P8LE-NEXT:    mulhw r8, r3, r8
>> +; P8LE-NEXT:    srwi r9, r5, 31
>> +; P8LE-NEXT:    srawi r5, r5, 11
>> +; P8LE-NEXT:    add r6, r6, r7
>> +; P8LE-NEXT:    add r8, r8, r3
>> +; P8LE-NEXT:    add r5, r5, r9
>> +; P8LE-NEXT:    srwi r9, r6, 31
>> +; P8LE-NEXT:    srawi r6, r6, 4
>> +; P8LE-NEXT:    add r6, r6, r9
>>  ; P8LE-NEXT:    srwi r9, r8, 31
>>  ; P8LE-NEXT:    srawi r8, r8, 9
>> -; P8LE-NEXT:    mulli r3, r3, 5423
>> +; P8LE-NEXT:    mulli r5, r5, 5423
>>  ; P8LE-NEXT:    add r8, r8, r9
>> -; P8LE-NEXT:    mulli r7, r7, 23
>> +; P8LE-NEXT:    mulli r6, r6, 23
>> +; P8LE-NEXT:    li r9, 0
>>  ; P8LE-NEXT:    mulli r8, r8, 654
>> -; P8LE-NEXT:    sub r3, r5, r3
>> -; P8LE-NEXT:    mtfprd f0, r3
>> -; P8LE-NEXT:    sub r3, r6, r7
>> -; P8LE-NEXT:    sub r4, r4, r8
>> -; P8LE-NEXT:    mtfprd f1, r3
>> -; P8LE-NEXT:    mtfprd f2, r4
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    vmrglh v2, v2, v3
>> -; P8LE-NEXT:    vmrglh v3, v4, v5
>> -; P8LE-NEXT:    vmrglw v2, v2, v3
>> +; P8LE-NEXT:    mtvsrd v2, r9
>> +; P8LE-NEXT:    sub r4, r4, r5
>> +; P8LE-NEXT:    sub r5, r7, r6
>> +; P8LE-NEXT:    mtvsrd v3, r4
>> +; P8LE-NEXT:    sub r3, r3, r8
>> +; P8LE-NEXT:    mtvsrd v4, r5
>> +; P8LE-NEXT:    mtvsrd v5, r3
>> +; P8LE-NEXT:    vmrghh v3, v3, v4
>> +; P8LE-NEXT:    vmrghh v2, v5, v2
>> +; P8LE-NEXT:    vmrglw v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>>  ; P8BE-LABEL: dont_fold_srem_one:
>> @@ -1161,43 +1113,41 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x
>> i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, -19946
>> -; P9LE-NEXT:    ori r5, r5, 17097
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    mulhw r5, r4, r5
>> -; P9LE-NEXT:    add r4, r5, r4
>> +; P9LE-NEXT:    lis r4, -19946
>> +; P9LE-NEXT:    ori r4, r4, 17097
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>> +; P9LE-NEXT:    add r4, r4, r3
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 4
>>  ; P9LE-NEXT:    add r4, r4, r5
>> -; P9LE-NEXT:    lis r5, 24749
>>  ; P9LE-NEXT:    mulli r4, r4, 23
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, 24749
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    ori r5, r5, 47143
>> -; P9LE-NEXT:    mulhw r4, r4, r5
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    ori r4, r4, 47143
>> +; P9LE-NEXT:    mulhw r4, r3, r4
>>  ; P9LE-NEXT:    srwi r5, r4, 31
>>  ; P9LE-NEXT:    srawi r4, r4, 11
>>  ; P9LE-NEXT:    add r4, r4, r5
>>  ; P9LE-NEXT:    mulli r4, r4, 5423
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    extsh r4, r3
>> -; P9LE-NEXT:    srawi r4, r4, 15
>> +; P9LE-NEXT:    extsh r3, r3
>> +; P9LE-NEXT:    srawi r4, r3, 15
>>  ; P9LE-NEXT:    addze r4, r4
>>  ; P9LE-NEXT:    slwi r4, r4, 15
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxlxor v4, v4, v4
>> -; P9LE-NEXT:    vmrglh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    li r3, 0
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>> +; P9LE-NEXT:    vmrghh v2, v2, v4
>>  ; P9LE-NEXT:    vmrglw v2, v3, v2
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -1252,42 +1202,40 @@ define <4 x i16> @dont_fold_urem_i16_smax(<4 x
>> i16> %x) {
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>>  ; P8LE-NEXT:    lis r4, 24749
>>  ; P8LE-NEXT:    lis r5, -19946
>> -; P8LE-NEXT:    xxlxor v5, v5, v5
>>  ; P8LE-NEXT:    ori r4, r4, 47143
>>  ; P8LE-NEXT:    ori r5, r5, 17097
>>  ; P8LE-NEXT:    mffprd r3, f0
>>  ; P8LE-NEXT:    rldicl r6, r3, 16, 48
>>  ; P8LE-NEXT:    rldicl r7, r3, 32, 48
>> -; P8LE-NEXT:    extsh r8, r6
>> -; P8LE-NEXT:    extsh r9, r7
>> -; P8LE-NEXT:    mulhw r4, r8, r4
>> -; P8LE-NEXT:    mulhw r5, r9, r5
>> +; P8LE-NEXT:    extsh r6, r6
>> +; P8LE-NEXT:    extsh r7, r7
>> +; P8LE-NEXT:    mulhw r4, r6, r4
>> +; P8LE-NEXT:    mulhw r5, r7, r5
>>  ; P8LE-NEXT:    rldicl r3, r3, 48, 48
>> +; P8LE-NEXT:    extsh r3, r3
>>  ; P8LE-NEXT:    srwi r8, r4, 31
>>  ; P8LE-NEXT:    srawi r4, r4, 11
>> -; P8LE-NEXT:    add r5, r5, r9
>> +; P8LE-NEXT:    add r5, r5, r7
>>  ; P8LE-NEXT:    add r4, r4, r8
>>  ; P8LE-NEXT:    srwi r8, r5, 31
>>  ; P8LE-NEXT:    srawi r5, r5, 4
>>  ; P8LE-NEXT:    mulli r4, r4, 5423
>>  ; P8LE-NEXT:    add r5, r5, r8
>> -; P8LE-NEXT:    extsh r8, r3
>> +; P8LE-NEXT:    srawi r9, r3, 15
>> +; P8LE-NEXT:    li r8, 0
>>  ; P8LE-NEXT:    mulli r5, r5, 23
>> -; P8LE-NEXT:    srawi r8, r8, 15
>> +; P8LE-NEXT:    mtvsrd v2, r8
>>  ; P8LE-NEXT:    sub r4, r6, r4
>> -; P8LE-NEXT:    addze r6, r8
>> -; P8LE-NEXT:    mtfprd f0, r4
>> -; P8LE-NEXT:    slwi r4, r6, 15
>> +; P8LE-NEXT:    addze r6, r9
>> +; P8LE-NEXT:    slwi r6, r6, 15
>> +; P8LE-NEXT:    mtvsrd v3, r4
>>  ; P8LE-NEXT:    sub r5, r7, r5
>> -; P8LE-NEXT:    sub r3, r3, r4
>> -; P8LE-NEXT:    mtfprd f1, r5
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    mtfprd f2, r3
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    vmrglh v2, v2, v3
>> -; P8LE-NEXT:    vmrglh v3, v4, v5
>> -; P8LE-NEXT:    vmrglw v2, v2, v3
>> +; P8LE-NEXT:    sub r3, r3, r6
>> +; P8LE-NEXT:    mtvsrd v4, r5
>> +; P8LE-NEXT:    mtvsrd v5, r3
>> +; P8LE-NEXT:    vmrghh v3, v3, v4
>> +; P8LE-NEXT:    vmrghh v2, v5, v2
>> +; P8LE-NEXT:    vmrglw v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>>  ; P8BE-LABEL: dont_fold_urem_i16_smax:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> index 323397202c00..95f0fc25f2dd 100644
>> --- a/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> +++ b/llvm/test/CodeGen/PowerPC/swaps-le-5.ll
>> @@ -15,10 +15,10 @@ entry:
>>  }
>>
>>  ; CHECK-LABEL: @bar0
>> +; CHECK-DAG: xxswapd 1, 1
>>  ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
>> -; CHECK-DAG: xxspltd [[REG2:[0-9]+]]
>> -; CHECK: xxpermdi [[REG3:[0-9]+]], [[REG2]], [[REG1]], 1
>> -; CHECK: stxvd2x [[REG3]]
>> +; CHECK: xxmrgld [[REG2:[0-9]+]], 1, [[REG1]]
>> +; CHECK: stxvd2x [[REG2]]
>>  ; CHECK-NOT: xxswapd
>>
>>  define void @bar1(double %y) {
>> @@ -30,10 +30,10 @@ entry:
>>  }
>>
>>  ; CHECK-LABEL: @bar1
>> +; CHECK-DAG: xxswapd 1, 1
>>  ; CHECK-DAG: lxvd2x [[REG1:[0-9]+]]
>> -; CHECK-DAG: xxspltd [[REG2:[0-9]+]]
>> -; CHECK: xxmrghd [[REG3:[0-9]+]], [[REG1]], [[REG2]]
>> -; CHECK: stxvd2x [[REG3]]
>> +; CHECK: xxpermdi [[REG2:[0-9]+]], [[REG1]], 1, 1
>> +; CHECK: stxvd2x [[REG2]]
>>  ; CHECK-NOT: xxswapd
>>
>>  define void @baz0() {
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> index 23738eaa95a7..4437e6799269 100644
>> --- a/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> +++ b/llvm/test/CodeGen/PowerPC/swaps-le-6.ll
>> @@ -27,7 +27,7 @@ define void @bar0() {
>>  ; CHECK:     ld r3, .LC0 at toc@l(r3)
>>  ; CHECK:     addis r3, r2, .LC2 at toc@ha
>>  ; CHECK:     ld r3, .LC2 at toc@l(r3)
>> -; CHECK:     xxpermdi vs0, vs0, vs1, 1
>> +; CHECK:     xxmrgld vs0, vs0, vs1
>>  ; CHECK:     stxvd2x vs0, 0, r3
>>  ; CHECK:     blr
>>  ;
>> @@ -38,7 +38,7 @@ define void @bar0() {
>>  ; CHECK-P9-NOVECTOR:     addis r3, r2, .LC1 at toc@ha
>>  ; CHECK-P9-NOVECTOR:     addis r3, r2, .LC2 at toc@ha
>>  ; CHECK-P9-NOVECTOR:     ld r3, .LC2 at toc@l(r3)
>> -; CHECK-P9-NOVECTOR:     xxpermdi vs0, vs1, vs0, 1
>> +; CHECK-P9-NOVECTOR:     xxmrgld vs0, vs1, vs0
>>  ; CHECK-P9-NOVECTOR:     stxvd2x vs0, 0, r3
>>  ; CHECK-P9-NOVECTOR:     blr
>>  ;
>> @@ -72,7 +72,7 @@ define void @bar1() {
>>  ; CHECK:     ld r3, .LC0 at toc@l(r3)
>>  ; CHECK:     addis r3, r2, .LC2 at toc@ha
>>  ; CHECK:     ld r3, .LC2 at toc@l(r3)
>> -; CHECK:     xxmrghd vs0, vs1, vs0
>> +; CHECK:     xxpermdi vs0, vs1, vs0, 1
>>  ; CHECK:     stxvd2x vs0, 0, r3
>>  ; CHECK:     blr
>>  ;
>> @@ -83,7 +83,7 @@ define void @bar1() {
>>  ; CHECK-P9-NOVECTOR:     addis r3, r2, .LC1 at toc@ha
>>  ; CHECK-P9-NOVECTOR:     addis r3, r2, .LC2 at toc@ha
>>  ; CHECK-P9-NOVECTOR:     ld r3, .LC2 at toc@l(r3)
>> -; CHECK-P9-NOVECTOR:     xxmrghd vs0, vs0, vs1
>> +; CHECK-P9-NOVECTOR:     xxpermdi vs0, vs0, vs1, 1
>>  ; CHECK-P9-NOVECTOR:     stxvd2x vs0, 0, r3
>>  ; CHECK-P9-NOVECTOR:     blr
>>  ;
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> index d853a420dcd8..4bb3730aa043 100644
>> --- a/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> +++ b/llvm/test/CodeGen/PowerPC/urem-vector-lkk.ll
>> @@ -13,53 +13,50 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, 21399
>> -; P9LE-NEXT:    ori r5, r5, 33437
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r4, r4, r5
>> -; P9LE-NEXT:    lis r5, 16727
>> -; P9LE-NEXT:    ori r5, r5, 2287
>> +; P9LE-NEXT:    lis r4, 21399
>> +; P9LE-NEXT:    ori r4, r4, 33437
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    mulhwu r4, r3, r4
>>  ; P9LE-NEXT:    srwi r4, r4, 5
>>  ; P9LE-NEXT:    mulli r4, r4, 98
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, 16727
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r4, r4, r5
>> -; P9LE-NEXT:    lis r5, 8456
>> -; P9LE-NEXT:    ori r5, r5, 16913
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    ori r4, r4, 2287
>> +; P9LE-NEXT:    mulhwu r4, r3, r4
>>  ; P9LE-NEXT:    srwi r4, r4, 8
>>  ; P9LE-NEXT:    mulli r4, r4, 1003
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    rlwinm r4, r3, 30, 18, 31
>> -; P9LE-NEXT:    mulhwu r4, r4, r5
>> -; P9LE-NEXT:    lis r5, 22765
>> -; P9LE-NEXT:    ori r5, r5, 8969
>> -; P9LE-NEXT:    srwi r4, r4, 2
>> -; P9LE-NEXT:    mulli r4, r4, 124
>> -; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r5, 8456
>> +; P9LE-NEXT:    ori r5, r5, 16913
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    clrlwi r4, r3, 16
>> +; P9LE-NEXT:    rlwinm r3, r3, 30, 18, 31
>> +; P9LE-NEXT:    mulhwu r3, r3, r5
>> +; P9LE-NEXT:    srwi r3, r3, 2
>> +; P9LE-NEXT:    mulli r3, r3, 124
>> +; P9LE-NEXT:    sub r3, r4, r3
>> +; P9LE-NEXT:    lis r4, 22765
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 0
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r5, r4, r5
>> -; P9LE-NEXT:    sub r4, r4, r5
>> -; P9LE-NEXT:    srwi r4, r4, 1
>> -; P9LE-NEXT:    add r4, r4, r5
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    ori r4, r4, 8969
>> +; P9LE-NEXT:    mulhwu r4, r3, r4
>> +; P9LE-NEXT:    sub r5, r3, r4
>> +; P9LE-NEXT:    srwi r5, r5, 1
>> +; P9LE-NEXT:    add r4, r5, r4
>>  ; P9LE-NEXT:    srwi r4, r4, 6
>>  ; P9LE-NEXT:    mulli r4, r4, 95
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v2, v4, v2
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v4, v2
>>  ; P9LE-NEXT:    vmrglw v2, v3, v2
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -123,50 +120,47 @@ define <4 x i16> @fold_urem_vec_1(<4 x i16> %x) {
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>>  ; P8LE-NEXT:    lis r3, 22765
>>  ; P8LE-NEXT:    lis r7, 21399
>> -; P8LE-NEXT:    lis r10, 16727
>> +; P8LE-NEXT:    lis r9, 16727
>> +; P8LE-NEXT:    lis r10, 8456
>>  ; P8LE-NEXT:    ori r3, r3, 8969
>>  ; P8LE-NEXT:    ori r7, r7, 33437
>> -; P8LE-NEXT:    ori r10, r10, 2287
>> +; P8LE-NEXT:    ori r9, r9, 2287
>> +; P8LE-NEXT:    ori r10, r10, 16913
>>  ; P8LE-NEXT:    mffprd r4, f0
>>  ; P8LE-NEXT:    clrldi r6, r4, 48
>>  ; P8LE-NEXT:    rldicl r5, r4, 32, 48
>> -; P8LE-NEXT:    clrlwi r9, r6, 16
>> +; P8LE-NEXT:    clrlwi r6, r6, 16
>>  ; P8LE-NEXT:    rldicl r8, r4, 16, 48
>> -; P8LE-NEXT:    clrlwi r11, r5, 16
>> -; P8LE-NEXT:    mulhwu r3, r9, r3
>> -; P8LE-NEXT:    clrlwi r12, r8, 16
>> -; P8LE-NEXT:    mulhwu r7, r11, r7
>> -; P8LE-NEXT:    lis r11, 8456
>> +; P8LE-NEXT:    clrlwi r5, r5, 16
>> +; P8LE-NEXT:    mulhwu r3, r6, r3
>>  ; P8LE-NEXT:    rldicl r4, r4, 48, 48
>> -; P8LE-NEXT:    mulhwu r10, r12, r10
>> -; P8LE-NEXT:    ori r11, r11, 16913
>> -; P8LE-NEXT:    rlwinm r12, r4, 30, 18, 31
>> -; P8LE-NEXT:    mulhwu r11, r12, r11
>> -; P8LE-NEXT:    sub r9, r9, r3
>> -; P8LE-NEXT:    srwi r9, r9, 1
>> +; P8LE-NEXT:    clrlwi r8, r8, 16
>> +; P8LE-NEXT:    rlwinm r11, r4, 30, 18, 31
>> +; P8LE-NEXT:    mulhwu r7, r5, r7
>> +; P8LE-NEXT:    clrlwi r4, r4, 16
>> +; P8LE-NEXT:    mulhwu r9, r8, r9
>> +; P8LE-NEXT:    mulhwu r10, r11, r10
>> +; P8LE-NEXT:    sub r11, r6, r3
>> +; P8LE-NEXT:    srwi r11, r11, 1
>>  ; P8LE-NEXT:    srwi r7, r7, 5
>> -; P8LE-NEXT:    add r3, r9, r3
>> -; P8LE-NEXT:    srwi r9, r10, 8
>> +; P8LE-NEXT:    add r3, r11, r3
>> +; P8LE-NEXT:    srwi r9, r9, 8
>> +; P8LE-NEXT:    srwi r10, r10, 2
>>  ; P8LE-NEXT:    srwi r3, r3, 6
>>  ; P8LE-NEXT:    mulli r7, r7, 98
>> -; P8LE-NEXT:    srwi r10, r11, 2
>>  ; P8LE-NEXT:    mulli r9, r9, 1003
>>  ; P8LE-NEXT:    mulli r3, r3, 95
>>  ; P8LE-NEXT:    mulli r10, r10, 124
>>  ; P8LE-NEXT:    sub r5, r5, r7
>>  ; P8LE-NEXT:    sub r7, r8, r9
>> -; P8LE-NEXT:    mtfprd f0, r5
>>  ; P8LE-NEXT:    sub r3, r6, r3
>> +; P8LE-NEXT:    mtvsrd v2, r5
>>  ; P8LE-NEXT:    sub r4, r4, r10
>> -; P8LE-NEXT:    mtfprd f1, r7
>> -; P8LE-NEXT:    mtfprd f2, r3
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    mtfprd f3, r4
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    xxswapd v5, vs3
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    vmrglh v3, v5, v4
>> +; P8LE-NEXT:    mtvsrd v3, r7
>> +; P8LE-NEXT:    mtvsrd v4, r3
>> +; P8LE-NEXT:    mtvsrd v5, r4
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>> +; P8LE-NEXT:    vmrghh v3, v5, v4
>>  ; P8LE-NEXT:    vmrglw v2, v2, v3
>>  ; P8LE-NEXT:    blr
>>  ;
>> @@ -230,56 +224,52 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 0
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, 22765
>> -; P9LE-NEXT:    ori r5, r5, 8969
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r6, r4, r5
>> -; P9LE-NEXT:    sub r4, r4, r6
>> -; P9LE-NEXT:    srwi r4, r4, 1
>> -; P9LE-NEXT:    add r4, r4, r6
>> -; P9LE-NEXT:    srwi r4, r4, 6
>> -; P9LE-NEXT:    mulli r4, r4, 95
>> -; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, 22765
>> +; P9LE-NEXT:    ori r4, r4, 8969
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    mulhwu r5, r3, r4
>> +; P9LE-NEXT:    sub r6, r3, r5
>> +; P9LE-NEXT:    srwi r6, r6, 1
>> +; P9LE-NEXT:    add r5, r6, r5
>> +; P9LE-NEXT:    srwi r5, r5, 6
>> +; P9LE-NEXT:    mulli r5, r5, 95
>> +; P9LE-NEXT:    sub r3, r3, r5
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r6, r4, r5
>> -; P9LE-NEXT:    sub r4, r4, r6
>> -; P9LE-NEXT:    srwi r4, r4, 1
>> -; P9LE-NEXT:    add r4, r4, r6
>> -; P9LE-NEXT:    srwi r4, r4, 6
>> -; P9LE-NEXT:    mulli r4, r4, 95
>> -; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    mulhwu r5, r3, r4
>> +; P9LE-NEXT:    sub r6, r3, r5
>> +; P9LE-NEXT:    srwi r6, r6, 1
>> +; P9LE-NEXT:    add r5, r6, r5
>> +; P9LE-NEXT:    srwi r5, r5, 6
>> +; P9LE-NEXT:    mulli r5, r5, 95
>> +; P9LE-NEXT:    sub r3, r3, r5
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r6, r4, r5
>> -; P9LE-NEXT:    sub r4, r4, r6
>> -; P9LE-NEXT:    srwi r4, r4, 1
>> -; P9LE-NEXT:    add r4, r4, r6
>> -; P9LE-NEXT:    srwi r4, r4, 6
>> -; P9LE-NEXT:    mulli r4, r4, 95
>> -; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    mulhwu r5, r3, r4
>> +; P9LE-NEXT:    sub r6, r3, r5
>> +; P9LE-NEXT:    srwi r6, r6, 1
>> +; P9LE-NEXT:    add r5, r6, r5
>> +; P9LE-NEXT:    srwi r5, r5, 6
>> +; P9LE-NEXT:    mulli r5, r5, 95
>> +; P9LE-NEXT:    sub r3, r3, r5
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r5, r4, r5
>> -; P9LE-NEXT:    sub r4, r4, r5
>> -; P9LE-NEXT:    srwi r4, r4, 1
>> -; P9LE-NEXT:    add r4, r4, r5
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    mulhwu r4, r3, r4
>> +; P9LE-NEXT:    sub r5, r3, r4
>> +; P9LE-NEXT:    srwi r5, r5, 1
>> +; P9LE-NEXT:    add r4, r5, r4
>>  ; P9LE-NEXT:    srwi r4, r4, 6
>>  ; P9LE-NEXT:    mulli r4, r4, 95
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v2, v4
>>  ; P9LE-NEXT:    vmrglw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -344,36 +334,34 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
>>  ; P8LE:       # %bb.0:
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>>  ; P8LE-NEXT:    lis r3, 22765
>> -; P8LE-NEXT:    std r30, -16(r1) # 8-byte Folded Spill
>>  ; P8LE-NEXT:    ori r3, r3, 8969
>>  ; P8LE-NEXT:    mffprd r4, f0
>>  ; P8LE-NEXT:    clrldi r5, r4, 48
>>  ; P8LE-NEXT:    rldicl r6, r4, 48, 48
>> -; P8LE-NEXT:    clrlwi r8, r5, 16
>> +; P8LE-NEXT:    clrlwi r5, r5, 16
>>  ; P8LE-NEXT:    rldicl r7, r4, 32, 48
>> -; P8LE-NEXT:    clrlwi r9, r6, 16
>> +; P8LE-NEXT:    clrlwi r6, r6, 16
>> +; P8LE-NEXT:    mulhwu r8, r5, r3
>>  ; P8LE-NEXT:    rldicl r4, r4, 16, 48
>> -; P8LE-NEXT:    mulhwu r10, r8, r3
>> -; P8LE-NEXT:    clrlwi r11, r7, 16
>> -; P8LE-NEXT:    clrlwi r0, r4, 16
>> -; P8LE-NEXT:    mulhwu r12, r9, r3
>> -; P8LE-NEXT:    mulhwu r30, r11, r3
>> -; P8LE-NEXT:    mulhwu r3, r0, r3
>> -; P8LE-NEXT:    sub r8, r8, r10
>> -; P8LE-NEXT:    srwi r8, r8, 1
>> -; P8LE-NEXT:    sub r9, r9, r12
>> -; P8LE-NEXT:    add r8, r8, r10
>> -; P8LE-NEXT:    sub r10, r11, r30
>> -; P8LE-NEXT:    sub r11, r0, r3
>> -; P8LE-NEXT:    srwi r9, r9, 1
>> -; P8LE-NEXT:    srwi r10, r10, 1
>> +; P8LE-NEXT:    clrlwi r7, r7, 16
>> +; P8LE-NEXT:    mulhwu r9, r6, r3
>> +; P8LE-NEXT:    clrlwi r4, r4, 16
>> +; P8LE-NEXT:    mulhwu r10, r7, r3
>> +; P8LE-NEXT:    mulhwu r3, r4, r3
>> +; P8LE-NEXT:    sub r11, r5, r8
>> +; P8LE-NEXT:    sub r12, r6, r9
>> +; P8LE-NEXT:    srwi r11, r11, 1
>> +; P8LE-NEXT:    add r8, r11, r8
>> +; P8LE-NEXT:    sub r11, r7, r10
>> +; P8LE-NEXT:    srwi r12, r12, 1
>> +; P8LE-NEXT:    add r9, r12, r9
>> +; P8LE-NEXT:    sub r12, r4, r3
>>  ; P8LE-NEXT:    srwi r11, r11, 1
>> -; P8LE-NEXT:    add r9, r9, r12
>>  ; P8LE-NEXT:    srwi r8, r8, 6
>> -; P8LE-NEXT:    add r10, r10, r30
>> -; P8LE-NEXT:    add r3, r11, r3
>> +; P8LE-NEXT:    add r10, r11, r10
>> +; P8LE-NEXT:    srwi r11, r12, 1
>>  ; P8LE-NEXT:    srwi r9, r9, 6
>> -; P8LE-NEXT:    ld r30, -16(r1) # 8-byte Folded Reload
>> +; P8LE-NEXT:    add r3, r11, r3
>>  ; P8LE-NEXT:    mulli r8, r8, 95
>>  ; P8LE-NEXT:    srwi r10, r10, 6
>>  ; P8LE-NEXT:    srwi r3, r3, 6
>> @@ -382,18 +370,14 @@ define <4 x i16> @fold_urem_vec_2(<4 x i16> %x) {
>>  ; P8LE-NEXT:    mulli r3, r3, 95
>>  ; P8LE-NEXT:    sub r5, r5, r8
>>  ; P8LE-NEXT:    sub r6, r6, r9
>> -; P8LE-NEXT:    mtfprd f0, r5
>> +; P8LE-NEXT:    mtvsrd v2, r5
>>  ; P8LE-NEXT:    sub r5, r7, r10
>>  ; P8LE-NEXT:    sub r3, r4, r3
>> -; P8LE-NEXT:    mtfprd f1, r6
>> -; P8LE-NEXT:    mtfprd f2, r5
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    mtfprd f3, r3
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    xxswapd v5, vs3
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    vmrglh v3, v5, v4
>> +; P8LE-NEXT:    mtvsrd v3, r6
>> +; P8LE-NEXT:    mtvsrd v4, r5
>> +; P8LE-NEXT:    mtvsrd v5, r3
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>> +; P8LE-NEXT:    vmrghh v3, v5, v4
>>  ; P8LE-NEXT:    vmrglw v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>> @@ -461,67 +445,59 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 0
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, 22765
>> -; P9LE-NEXT:    ori r5, r5, 8969
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r6, r4, r5
>> -; P9LE-NEXT:    sub r4, r4, r6
>> -; P9LE-NEXT:    srwi r4, r4, 1
>> -; P9LE-NEXT:    add r4, r4, r6
>> -; P9LE-NEXT:    srwi r4, r4, 6
>> -; P9LE-NEXT:    mulli r6, r4, 95
>> +; P9LE-NEXT:    lis r4, 22765
>> +; P9LE-NEXT:    ori r4, r4, 8969
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    mulhwu r5, r3, r4
>> +; P9LE-NEXT:    sub r6, r3, r5
>> +; P9LE-NEXT:    srwi r6, r6, 1
>> +; P9LE-NEXT:    add r5, r6, r5
>> +; P9LE-NEXT:    srwi r5, r5, 6
>> +; P9LE-NEXT:    mulli r6, r5, 95
>>  ; P9LE-NEXT:    sub r3, r3, r6
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    clrlwi r6, r3, 16
>> -; P9LE-NEXT:    mulhwu r7, r6, r5
>> +; P9LE-NEXT:    mulhwu r7, r6, r4
>>  ; P9LE-NEXT:    sub r6, r6, r7
>>  ; P9LE-NEXT:    srwi r6, r6, 1
>>  ; P9LE-NEXT:    add r6, r6, r7
>>  ; P9LE-NEXT:    srwi r6, r6, 6
>>  ; P9LE-NEXT:    mulli r7, r6, 95
>>  ; P9LE-NEXT:    sub r3, r3, r7
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    clrlwi r7, r3, 16
>> -; P9LE-NEXT:    mulhwu r8, r7, r5
>> +; P9LE-NEXT:    mulhwu r8, r7, r4
>>  ; P9LE-NEXT:    sub r7, r7, r8
>>  ; P9LE-NEXT:    srwi r7, r7, 1
>>  ; P9LE-NEXT:    add r7, r7, r8
>>  ; P9LE-NEXT:    srwi r7, r7, 6
>>  ; P9LE-NEXT:    mulli r8, r7, 95
>>  ; P9LE-NEXT:    sub r3, r3, r8
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    clrlwi r8, r3, 16
>> -; P9LE-NEXT:    mulhwu r5, r8, r5
>> -; P9LE-NEXT:    sub r8, r8, r5
>> +; P9LE-NEXT:    mulhwu r4, r8, r4
>> +; P9LE-NEXT:    sub r8, r8, r4
>>  ; P9LE-NEXT:    srwi r8, r8, 1
>> -; P9LE-NEXT:    add r5, r8, r5
>> -; P9LE-NEXT:    srwi r5, r5, 6
>> -; P9LE-NEXT:    mulli r8, r5, 95
>> +; P9LE-NEXT:    add r4, r8, r4
>> +; P9LE-NEXT:    srwi r4, r4, 6
>> +; P9LE-NEXT:    mulli r8, r4, 95
>>  ; P9LE-NEXT:    sub r3, r3, r8
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    mtfprd f0, r4
>> -; P9LE-NEXT:    vmrglh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v2, v4
>> +; P9LE-NEXT:    mtvsrd v4, r6
>>  ; P9LE-NEXT:    vmrglw v2, v2, v3
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r6
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r7
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r5
>> -; P9LE-NEXT:    xxswapd v5, vs0
>> -; P9LE-NEXT:    vmrglh v4, v5, v4
>> +; P9LE-NEXT:    mtvsrd v3, r5
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    mtvsrd v4, r7
>> +; P9LE-NEXT:    mtvsrd v5, r4
>> +; P9LE-NEXT:    vmrghh v4, v5, v4
>>  ; P9LE-NEXT:    vmrglw v3, v4, v3
>>  ; P9LE-NEXT:    vadduhm v2, v2, v3
>>  ; P9LE-NEXT:    blr
>> @@ -598,69 +574,61 @@ define <4 x i16> @combine_urem_udiv(<4 x i16> %x) {
>>  ; P8LE-LABEL: combine_urem_udiv:
>>  ; P8LE:       # %bb.0:
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>> -; P8LE-NEXT:    lis r4, 22765
>> +; P8LE-NEXT:    lis r3, 22765
>>  ; P8LE-NEXT:    std r30, -16(r1) # 8-byte Folded Spill
>> -; P8LE-NEXT:    ori r4, r4, 8969
>> -; P8LE-NEXT:    mffprd r5, f0
>> -; P8LE-NEXT:    clrldi r3, r5, 48
>> -; P8LE-NEXT:    rldicl r6, r5, 48, 48
>> -; P8LE-NEXT:    clrlwi r8, r3, 16
>> -; P8LE-NEXT:    rldicl r7, r5, 32, 48
>> -; P8LE-NEXT:    clrlwi r9, r6, 16
>> -; P8LE-NEXT:    mulhwu r10, r8, r4
>> -; P8LE-NEXT:    clrlwi r11, r7, 16
>> -; P8LE-NEXT:    rldicl r5, r5, 16, 48
>> -; P8LE-NEXT:    mulhwu r12, r9, r4
>> -; P8LE-NEXT:    mulhwu r0, r11, r4
>> -; P8LE-NEXT:    clrlwi r30, r5, 16
>> -; P8LE-NEXT:    mulhwu r4, r30, r4
>> -; P8LE-NEXT:    sub r8, r8, r10
>> +; P8LE-NEXT:    ori r3, r3, 8969
>> +; P8LE-NEXT:    mffprd r4, f0
>> +; P8LE-NEXT:    clrldi r5, r4, 48
>> +; P8LE-NEXT:    rldicl r6, r4, 48, 48
>> +; P8LE-NEXT:    clrlwi r5, r5, 16
>> +; P8LE-NEXT:    clrlwi r8, r6, 16
>> +; P8LE-NEXT:    rldicl r7, r4, 32, 48
>> +; P8LE-NEXT:    rldicl r4, r4, 16, 48
>> +; P8LE-NEXT:    mulhwu r9, r5, r3
>> +; P8LE-NEXT:    mulhwu r11, r8, r3
>> +; P8LE-NEXT:    clrlwi r10, r7, 16
>> +; P8LE-NEXT:    clrlwi r12, r4, 16
>> +; P8LE-NEXT:    mulhwu r0, r10, r3
>> +; P8LE-NEXT:    mulhwu r3, r12, r3
>> +; P8LE-NEXT:    sub r30, r5, r9
>> +; P8LE-NEXT:    sub r8, r8, r11
>> +; P8LE-NEXT:    srwi r30, r30, 1
>>  ; P8LE-NEXT:    srwi r8, r8, 1
>> -; P8LE-NEXT:    sub r9, r9, r12
>> -; P8LE-NEXT:    add r8, r8, r10
>> -; P8LE-NEXT:    sub r10, r11, r0
>> -; P8LE-NEXT:    srwi r9, r9, 1
>> +; P8LE-NEXT:    sub r10, r10, r0
>> +; P8LE-NEXT:    add r9, r30, r9
>> +; P8LE-NEXT:    add r8, r8, r11
>> +; P8LE-NEXT:    sub r11, r12, r3
>>  ; P8LE-NEXT:    srwi r10, r10, 1
>> -; P8LE-NEXT:    sub r11, r30, r4
>> -; P8LE-NEXT:    add r9, r9, r12
>> -; P8LE-NEXT:    srwi r8, r8, 6
>>  ; P8LE-NEXT:    ld r30, -16(r1) # 8-byte Folded Reload
>> -; P8LE-NEXT:    add r10, r10, r0
>> -; P8LE-NEXT:    srwi r11, r11, 1
>>  ; P8LE-NEXT:    srwi r9, r9, 6
>> -; P8LE-NEXT:    mtfprd f0, r8
>> -; P8LE-NEXT:    mulli r12, r8, 95
>> +; P8LE-NEXT:    srwi r11, r11, 1
>> +; P8LE-NEXT:    srwi r8, r8, 6
>> +; P8LE-NEXT:    add r10, r10, r0
>> +; P8LE-NEXT:    mulli r12, r9, 95
>> +; P8LE-NEXT:    add r3, r11, r3
>> +; P8LE-NEXT:    mtvsrd v2, r9
>>  ; P8LE-NEXT:    srwi r10, r10, 6
>> -; P8LE-NEXT:    add r4, r11, r4
>> -; P8LE-NEXT:    mtfprd f1, r9
>> -; P8LE-NEXT:    mulli r8, r9, 95
>> -; P8LE-NEXT:    mulli r9, r10, 95
>> -; P8LE-NEXT:    srwi r4, r4, 6
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    mtfprd f2, r10
>> -; P8LE-NEXT:    mtfprd f3, r4
>> -; P8LE-NEXT:    mulli r4, r4, 95
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> -; P8LE-NEXT:    xxswapd v1, vs2
>> -; P8LE-NEXT:    sub r3, r3, r12
>> -; P8LE-NEXT:    xxswapd v6, vs3
>> -; P8LE-NEXT:    mtfprd f0, r3
>> -; P8LE-NEXT:    sub r3, r7, r9
>> -; P8LE-NEXT:    sub r6, r6, r8
>> -; P8LE-NEXT:    mtfprd f4, r3
>> -; P8LE-NEXT:    sub r3, r5, r4
>> -; P8LE-NEXT:    mtfprd f1, r6
>> -; P8LE-NEXT:    mtfprd f5, r3
>> -; P8LE-NEXT:    xxswapd v5, vs4
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    xxswapd v3, vs0
>> -; P8LE-NEXT:    xxswapd v4, vs1
>> -; P8LE-NEXT:    xxswapd v0, vs5
>> -; P8LE-NEXT:    vmrglh v3, v4, v3
>> -; P8LE-NEXT:    vmrglh v4, v0, v5
>> -; P8LE-NEXT:    vmrglh v5, v6, v1
>> -; P8LE-NEXT:    vmrglw v3, v4, v3
>> -; P8LE-NEXT:    vmrglw v2, v5, v2
>> +; P8LE-NEXT:    mulli r9, r8, 95
>> +; P8LE-NEXT:    srwi r3, r3, 6
>> +; P8LE-NEXT:    mtvsrd v3, r8
>> +; P8LE-NEXT:    mulli r8, r10, 95
>> +; P8LE-NEXT:    mtvsrd v4, r10
>> +; P8LE-NEXT:    mulli r10, r3, 95
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>> +; P8LE-NEXT:    sub r5, r5, r12
>> +; P8LE-NEXT:    sub r6, r6, r9
>> +; P8LE-NEXT:    mtvsrd v3, r5
>> +; P8LE-NEXT:    mtvsrd v5, r6
>> +; P8LE-NEXT:    sub r5, r7, r8
>> +; P8LE-NEXT:    sub r4, r4, r10
>> +; P8LE-NEXT:    mtvsrd v0, r5
>> +; P8LE-NEXT:    mtvsrd v1, r4
>> +; P8LE-NEXT:    vmrghh v3, v5, v3
>> +; P8LE-NEXT:    mtvsrd v5, r3
>> +; P8LE-NEXT:    vmrghh v0, v1, v0
>> +; P8LE-NEXT:    vmrghh v4, v5, v4
>> +; P8LE-NEXT:    vmrglw v3, v0, v3
>> +; P8LE-NEXT:    vmrglw v2, v4, v2
>>  ; P8LE-NEXT:    vadduhm v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>> @@ -742,34 +710,30 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
>> i16> %x) {
>>  ; P9LE-NEXT:    li r3, 0
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    clrlwi r3, r3, 26
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    clrlwi r3, r3, 27
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, 22765
>> -; P9LE-NEXT:    ori r5, r5, 8969
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r5, r4, r5
>> -; P9LE-NEXT:    sub r4, r4, r5
>> -; P9LE-NEXT:    srwi r4, r4, 1
>> -; P9LE-NEXT:    add r4, r4, r5
>> +; P9LE-NEXT:    lis r4, 22765
>> +; P9LE-NEXT:    ori r4, r4, 8969
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    mulhwu r4, r3, r4
>> +; P9LE-NEXT:    sub r5, r3, r4
>> +; P9LE-NEXT:    srwi r5, r5, 1
>> +; P9LE-NEXT:    add r4, r5, r4
>>  ; P9LE-NEXT:    srwi r4, r4, 6
>>  ; P9LE-NEXT:    mulli r4, r4, 95
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>>  ; P9LE-NEXT:    clrlwi r3, r3, 29
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v2, v4, v2
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    vmrghh v2, v4, v2
>>  ; P9LE-NEXT:    vmrglw v2, v2, v3
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -817,9 +781,9 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
>> i16> %x) {
>>  ; P8LE-NEXT:    mffprd r4, f0
>>  ; P8LE-NEXT:    rldicl r5, r4, 16, 48
>>  ; P8LE-NEXT:    rldicl r7, r4, 48, 48
>> -; P8LE-NEXT:    clrlwi r6, r5, 16
>> -; P8LE-NEXT:    mulhwu r3, r6, r3
>> -; P8LE-NEXT:    sub r6, r6, r3
>> +; P8LE-NEXT:    clrlwi r5, r5, 16
>> +; P8LE-NEXT:    mulhwu r3, r5, r3
>> +; P8LE-NEXT:    sub r6, r5, r3
>>  ; P8LE-NEXT:    srwi r6, r6, 1
>>  ; P8LE-NEXT:    add r3, r6, r3
>>  ; P8LE-NEXT:    clrldi r6, r4, 48
>> @@ -827,19 +791,15 @@ define <4 x i16> @dont_fold_urem_power_of_two(<4 x
>> i16> %x) {
>>  ; P8LE-NEXT:    clrlwi r6, r6, 26
>>  ; P8LE-NEXT:    mulli r3, r3, 95
>>  ; P8LE-NEXT:    rldicl r4, r4, 32, 48
>> -; P8LE-NEXT:    mtfprd f0, r6
>> +; P8LE-NEXT:    mtvsrd v2, r6
>>  ; P8LE-NEXT:    clrlwi r6, r7, 27
>>  ; P8LE-NEXT:    clrlwi r4, r4, 29
>> -; P8LE-NEXT:    mtfprd f1, r6
>> -; P8LE-NEXT:    mtfprd f3, r4
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> +; P8LE-NEXT:    mtvsrd v3, r6
>> +; P8LE-NEXT:    mtvsrd v5, r4
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>>  ; P8LE-NEXT:    sub r3, r5, r3
>> -; P8LE-NEXT:    xxswapd v5, vs3
>> -; P8LE-NEXT:    mtfprd f2, r3
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    vmrglh v3, v4, v5
>> +; P8LE-NEXT:    mtvsrd v4, r3
>> +; P8LE-NEXT:    vmrghh v3, v4, v5
>>  ; P8LE-NEXT:    vmrglw v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>> @@ -885,40 +845,39 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) {
>>  ; P9LE:       # %bb.0:
>>  ; P9LE-NEXT:    li r3, 4
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    lis r5, -19946
>> -; P9LE-NEXT:    ori r5, r5, 17097
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r4, r4, r5
>> -; P9LE-NEXT:    lis r5, 24749
>> -; P9LE-NEXT:    ori r5, r5, 47143
>> +; P9LE-NEXT:    lis r4, -19946
>> +; P9LE-NEXT:    ori r4, r4, 17097
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    mulhwu r4, r3, r4
>>  ; P9LE-NEXT:    srwi r4, r4, 4
>>  ; P9LE-NEXT:    mulli r4, r4, 23
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    lis r4, 24749
>> +; P9LE-NEXT:    mtvsrd v3, r3
>>  ; P9LE-NEXT:    li r3, 6
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    clrlwi r4, r3, 16
>> -; P9LE-NEXT:    mulhwu r4, r4, r5
>> -; P9LE-NEXT:    lis r5, -14230
>> -; P9LE-NEXT:    ori r5, r5, 30865
>> +; P9LE-NEXT:    clrlwi r3, r3, 16
>> +; P9LE-NEXT:    ori r4, r4, 47143
>> +; P9LE-NEXT:    mulhwu r4, r3, r4
>>  ; P9LE-NEXT:    srwi r4, r4, 11
>>  ; P9LE-NEXT:    mulli r4, r4, 5423
>>  ; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v3, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> +; P9LE-NEXT:    mtvsrd v4, r3
>>  ; P9LE-NEXT:    li r3, 2
>>  ; P9LE-NEXT:    vextuhrx r3, r3, v2
>> -; P9LE-NEXT:    rlwinm r4, r3, 31, 17, 31
>> -; P9LE-NEXT:    mulhwu r4, r4, r5
>> -; P9LE-NEXT:    srwi r4, r4, 8
>> -; P9LE-NEXT:    mulli r4, r4, 654
>> -; P9LE-NEXT:    sub r3, r3, r4
>> -; P9LE-NEXT:    xxswapd v4, vs0
>> -; P9LE-NEXT:    mtfprd f0, r3
>> -; P9LE-NEXT:    xxswapd v2, vs0
>> -; P9LE-NEXT:    vmrglh v3, v4, v3
>> -; P9LE-NEXT:    xxlxor v4, v4, v4
>> -; P9LE-NEXT:    vmrglh v2, v2, v4
>> +; P9LE-NEXT:    lis r5, -14230
>> +; P9LE-NEXT:    ori r5, r5, 30865
>> +; P9LE-NEXT:    vmrghh v3, v4, v3
>> +; P9LE-NEXT:    clrlwi r4, r3, 16
>> +; P9LE-NEXT:    rlwinm r3, r3, 31, 17, 31
>> +; P9LE-NEXT:    mulhwu r3, r3, r5
>> +; P9LE-NEXT:    srwi r3, r3, 8
>> +; P9LE-NEXT:    mulli r3, r3, 654
>> +; P9LE-NEXT:    sub r3, r4, r3
>> +; P9LE-NEXT:    mtvsrd v2, r3
>> +; P9LE-NEXT:    li r3, 0
>> +; P9LE-NEXT:    mtvsrd v4, r3
>> +; P9LE-NEXT:    vmrghh v2, v2, v4
>>  ; P9LE-NEXT:    vmrglw v2, v3, v2
>>  ; P9LE-NEXT:    blr
>>  ;
>> @@ -969,41 +928,40 @@ define <4 x i16> @dont_fold_urem_one(<4 x i16> %x) {
>>  ; P8LE-LABEL: dont_fold_urem_one:
>>  ; P8LE:       # %bb.0:
>>  ; P8LE-NEXT:    xxswapd vs0, v2
>> -; P8LE-NEXT:    lis r3, -19946
>> -; P8LE-NEXT:    lis r7, 24749
>> -; P8LE-NEXT:    lis r9, -14230
>> -; P8LE-NEXT:    xxlxor v5, v5, v5
>> -; P8LE-NEXT:    ori r3, r3, 17097
>> -; P8LE-NEXT:    ori r7, r7, 47143
>> -; P8LE-NEXT:    ori r9, r9, 30865
>> +; P8LE-NEXT:    lis r3, -14230
>> +; P8LE-NEXT:    lis r7, -19946
>> +; P8LE-NEXT:    lis r9, 24749
>> +; P8LE-NEXT:    ori r3, r3, 30865
>> +; P8LE-NEXT:    ori r7, r7, 17097
>>  ; P8LE-NEXT:    mffprd r4, f0
>> -; P8LE-NEXT:    rldicl r5, r4, 32, 48
>> -; P8LE-NEXT:    rldicl r6, r4, 16, 48
>> -; P8LE-NEXT:    clrlwi r8, r5, 16
>> -; P8LE-NEXT:    rldicl r4, r4, 48, 48
>> +; P8LE-NEXT:    rldicl r5, r4, 48, 48
>> +; P8LE-NEXT:    rldicl r6, r4, 32, 48
>> +; P8LE-NEXT:    rldicl r4, r4, 16, 48
>> +; P8LE-NEXT:    rlwinm r8, r5, 31, 17, 31
>> +; P8LE-NEXT:    clrlwi r6, r6, 16
>> +; P8LE-NEXT:    clrlwi r5, r5, 16
>>  ; P8LE-NEXT:    mulhwu r3, r8, r3
>> -; P8LE-NEXT:    clrlwi r8, r6, 16
>> -; P8LE-NEXT:    mulhwu r7, r8, r7
>> -; P8LE-NEXT:    rlwinm r8, r4, 31, 17, 31
>> -; P8LE-NEXT:    mulhwu r8, r8, r9
>> -; P8LE-NEXT:    srwi r3, r3, 4
>> -; P8LE-NEXT:    srwi r7, r7, 11
>> -; P8LE-NEXT:    mulli r3, r3, 23
>> -; P8LE-NEXT:    srwi r8, r8, 8
>> -; P8LE-NEXT:    mulli r7, r7, 5423
>> -; P8LE-NEXT:    mulli r8, r8, 654
>> +; P8LE-NEXT:    ori r8, r9, 47143
>> +; P8LE-NEXT:    clrlwi r4, r4, 16
>> +; P8LE-NEXT:    li r9, 0
>> +; P8LE-NEXT:    mulhwu r7, r6, r7
>> +; P8LE-NEXT:    mulhwu r8, r4, r8
>> +; P8LE-NEXT:    mtvsrd v2, r9
>> +; P8LE-NEXT:    srwi r3, r3, 8
>> +; P8LE-NEXT:    srwi r7, r7, 4
>> +; P8LE-NEXT:    mulli r3, r3, 654
>> +; P8LE-NEXT:    srwi r8, r8, 11
>> +; P8LE-NEXT:    mulli r7, r7, 23
>> +; P8LE-NEXT:    mulli r8, r8, 5423
>>  ; P8LE-NEXT:    sub r3, r5, r3
>>  ; P8LE-NEXT:    sub r5, r6, r7
>> -; P8LE-NEXT:    mtfprd f0, r3
>> +; P8LE-NEXT:    mtvsrd v3, r3
>>  ; P8LE-NEXT:    sub r3, r4, r8
>> -; P8LE-NEXT:    mtfprd f1, r5
>> -; P8LE-NEXT:    mtfprd f2, r3
>> -; P8LE-NEXT:    xxswapd v2, vs0
>> -; P8LE-NEXT:    xxswapd v3, vs1
>> -; P8LE-NEXT:    xxswapd v4, vs2
>> -; P8LE-NEXT:    vmrglh v2, v3, v2
>> -; P8LE-NEXT:    vmrglh v3, v4, v5
>> -; P8LE-NEXT:    vmrglw v2, v2, v3
>> +; P8LE-NEXT:    mtvsrd v4, r5
>> +; P8LE-NEXT:    mtvsrd v5, r3
>> +; P8LE-NEXT:    vmrghh v2, v3, v2
>> +; P8LE-NEXT:    vmrghh v3, v5, v4
>> +; P8LE-NEXT:    vmrglw v2, v3, v2
>>  ; P8LE-NEXT:    blr
>>  ;
>>  ; P8BE-LABEL: dont_fold_urem_one:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> index 239b38e2ec70..48b62f57c1c9 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i16_elts.ll
>> @@ -20,12 +20,10 @@ define i32 @test2elt(i64 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -40,13 +38,11 @@ define i32 @test2elt(i64 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs1
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    li r3, 0
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P9-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -90,20 +86,16 @@ define i64 @test4elt(<4 x float> %a)
>> local_unnamed_addr #1 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    mtfprd f3, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs3
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglh v3, v4, v5
>> -; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r3
>> +; CHECK-P8-NEXT:    vmrghh v3, v4, v3
>> +; CHECK-P8-NEXT:    vmrghh v2, v2, v5
>> +; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -114,27 +106,23 @@ define i64 @test4elt(<4 x float> %a)
>> local_unnamed_addr #1 {
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, v2, v2, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglh v2, v4, v2
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>> +; CHECK-P9-NEXT:    vmrghh v2, v4, v2
>>  ; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -180,59 +168,51 @@ define <8 x i16> @test8elt(<8 x float>* nocapture
>> readonly) local_unnamed_addr #
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    lvx v2, 0, r3
>>  ; CHECK-P8-NEXT:    li r4, 16
>> -; CHECK-P8-NEXT:    lvx v5, r3, r4
>> -; CHECK-P8-NEXT:    xxswapd vs1, v2
>> +; CHECK-P8-NEXT:    lvx v3, r3, r4
>>  ; CHECK-P8-NEXT:    xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT:    xxsldwi vs2, v5, v5, 3
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v5
>> -; CHECK-P8-NEXT:    xxswapd vs3, v5
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xxswapd vs1, v2
>> +; CHECK-P8-NEXT:    xscvspdpn f2, v2
>> +; CHECK-P8-NEXT:    xxsldwi vs4, v2, v2, 1
>> +; CHECK-P8-NEXT:    xxsldwi vs5, v3, v3, 3
>> +; CHECK-P8-NEXT:    xscvspdpn f3, v3
>>  ; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xscvspdpn f4, vs4
>>  ; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    mffprwz r6, f1
>> -; CHECK-P8-NEXT:    mffprwz r5, f0
>> -; CHECK-P8-NEXT:    mtfprd f1, r6
>> -; CHECK-P8-NEXT:    mtfprd f0, r5
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    xscvspdpn f0, v2
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v1, vs4
>> -; CHECK-P8-NEXT:    vmrglh v2, v4, v3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    xxswapd v5, vs2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    xxswapd vs0, v3
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xxsldwi vs1, v3, v3, 1
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    mffprwz r3, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f4
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghh v2, v4, v2
>> +; CHECK-P8-NEXT:    mffprwz r4, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v6, vs3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs0
>> -; CHECK-P8-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P8-NEXT:    vmrglh v4, v0, v5
>> -; CHECK-P8-NEXT:    vmrglh v5, v1, v6
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    vmrghh v3, v3, v4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    vmrghh v5, v0, v5
>> +; CHECK-P8-NEXT:    mtvsrd v1, r3
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>> +; CHECK-P8-NEXT:    vmrghh v4, v4, v1
>> +; CHECK-P8-NEXT:    vmrglw v3, v4, v5
>>  ; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>> @@ -244,53 +224,45 @@ define <8 x i16> @test8elt(<8 x float>* nocapture
>> readonly) local_unnamed_addr #
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    lxv vs0, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghh v3, v4, v3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld v2, v3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -363,116 +335,100 @@ define void @test16elt(<16 x i16>* noalias
>> nocapture sret %agg.result, <16 x flo
>>  ; CHECK-P8-LABEL: test16elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    lvx v5, 0, r4
>> -; CHECK-P8-NEXT:    li r6, 32
>>  ; CHECK-P8-NEXT:    li r5, 16
>> -; CHECK-P8-NEXT:    lvx v2, r4, r6
>> +; CHECK-P8-NEXT:    li r6, 32
>>  ; CHECK-P8-NEXT:    lvx v3, r4, r5
>> +; CHECK-P8-NEXT:    lvx v2, r4, r6
>>  ; CHECK-P8-NEXT:    li r6, 48
>> -; CHECK-P8-NEXT:    xscvspdpn f0, v5
>> -; CHECK-P8-NEXT:    xxsldwi vs1, v5, v5, 3
>> +; CHECK-P8-NEXT:    xxsldwi vs0, v5, v5, 3
>> +; CHECK-P8-NEXT:    xscvspdpn f1, v5
>>  ; CHECK-P8-NEXT:    lvx v4, r4, r6
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v2
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs3, v5
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    xxswapd vs8, v3
>> -; CHECK-P8-NEXT:    xscvspdpn f6, v4
>> +; CHECK-P8-NEXT:    xxsldwi vs5, v5, v5, 1
>>  ; CHECK-P8-NEXT:    xxsldwi vs7, v3, v3, 3
>> +; CHECK-P8-NEXT:    xxswapd vs8, v3
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>>  ; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xxsldwi vs10, v2, v2, 3
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT:    xscvspdpn f7, vs7
>> +; CHECK-P8-NEXT:    xscvspdpn f8, vs8
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    xxsldwi vs9, v3, v3, 1
>> +; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT:    xscvspdpn f2, v3
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f5
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    xxsldwi vs0, v3, v3, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f4, v2
>> +; CHECK-P8-NEXT:    xscvdpsxws f5, f7
>> +; CHECK-P8-NEXT:    xxsldwi vs7, v4, v4, 3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    xxsldwi vs3, v2, v2, 3
>> +; CHECK-P8-NEXT:    xscvspdpn f6, v4
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f8
>> +; CHECK-P8-NEXT:    xxswapd vs8, v4
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f5
>> +; CHECK-P8-NEXT:    xxswapd vs5, v2
>>  ; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT:    xxsldwi vs12, v2, v2, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f8, vs8
>> -; CHECK-P8-NEXT:    xxswapd vs11, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT:    xxswapd v2, v4
>> +; CHECK-P8-NEXT:    vmrghh v3, v0, v3
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xscvdpsxws f6, f6
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs5
>> +; CHECK-P8-NEXT:    xxsldwi vs5, v2, v2, 1
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghh v2, v5, v1
>> +; CHECK-P8-NEXT:    vmrghh v5, v6, v0
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f4
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f3
>> +; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f6
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>>  ; CHECK-P8-NEXT:    xscvspdpn f7, vs7
>> -; CHECK-P8-NEXT:    xxsldwi vs13, v4, v4, 3
>> -; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT:    xxsldwi v3, v4, v4, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f10, vs10
>> +; CHECK-P8-NEXT:    mtvsrd v7, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f2
>> +; CHECK-P8-NEXT:    xxsldwi vs2, v4, v4, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f8, vs8
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs2
>> +; CHECK-P8-NEXT:    xscvdpsxws f3, f7
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f8
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xscvspdpn f9, vs9
>> -; CHECK-P8-NEXT:    xscvdpsxws f6, f6
>> -; CHECK-P8-NEXT:    xscvspdpn f12, vs12
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> +; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    vmrghh v0, v0, v7
>> +; CHECK-P8-NEXT:    mtvsrd v7, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    xscvspdpn f11, vs11
>> -; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xscvspdpn v2, v2
>> -; CHECK-P8-NEXT:    xscvdpsxws f8, f8
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT:    mffprwz r6, f2
>> -; CHECK-P8-NEXT:    xscvspdpn f13, vs13
>> -; CHECK-P8-NEXT:    xscvspdpn v3, v3
>> -; CHECK-P8-NEXT:    xscvdpsxws f10, f10
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> +; CHECK-P8-NEXT:    vmrghh v4, v8, v4
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT:    mtfprd f2, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f6
>> -; CHECK-P8-NEXT:    xscvdpsxws f12, f12
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f6, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f3
>> -; CHECK-P8-NEXT:    xscvdpsxws v2, v2
>> -; CHECK-P8-NEXT:    xxswapd v9, vs6
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f8
>> -; CHECK-P8-NEXT:    mtfprd f3, r6
>> -; CHECK-P8-NEXT:    xxswapd v0, vs5
>> -; CHECK-P8-NEXT:    mffprwz r6, f7
>> -; CHECK-P8-NEXT:    xscvdpsxws f13, f13
>> -; CHECK-P8-NEXT:    xxswapd v5, vs3
>> -; CHECK-P8-NEXT:    xscvdpsxws v3, v3
>> -; CHECK-P8-NEXT:    mtfprd f8, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f10
>> -; CHECK-P8-NEXT:    mtfprd f7, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f9
>> -; CHECK-P8-NEXT:    mtfprd f10, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f12
>> -; CHECK-P8-NEXT:    mtfprd f9, r6
>> -; CHECK-P8-NEXT:    xxswapd v6, vs10
>> -; CHECK-P8-NEXT:    mffprwz r6, f11
>> -; CHECK-P8-NEXT:    mtfprd f12, r4
>> -; CHECK-P8-NEXT:    xxswapd v1, vs9
>> -; CHECK-P8-NEXT:    mfvsrwz r4, v2
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    mtfprd f11, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f13
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    xxswapd v7, vs11
>> -; CHECK-P8-NEXT:    mfvsrwz r4, v3
>> -; CHECK-P8-NEXT:    vmrglh v3, v5, v4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs7
>> -; CHECK-P8-NEXT:    vmrglh v2, v2, v0
>> -; CHECK-P8-NEXT:    xxswapd v5, vs8
>> -; CHECK-P8-NEXT:    xxswapd v0, vs2
>> -; CHECK-P8-NEXT:    mtfprd f13, r6
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>> -; CHECK-P8-NEXT:    vmrglh v4, v5, v4
>> -; CHECK-P8-NEXT:    vmrglh v5, v0, v1
>> -; CHECK-P8-NEXT:    xxswapd v1, vs4
>> -; CHECK-P8-NEXT:    vmrglh v0, v7, v6
>> -; CHECK-P8-NEXT:    xxswapd v6, vs12
>> -; CHECK-P8-NEXT:    xxswapd v7, vs13
>> -; CHECK-P8-NEXT:    xxswapd v10, vs1
>> +; CHECK-P8-NEXT:    vmrghh v1, v1, v9
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>> +; CHECK-P8-NEXT:    vmrghh v7, v8, v7
>> +; CHECK-P8-NEXT:    vmrghh v6, v6, v9
>>  ; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>> -; CHECK-P8-NEXT:    vmrglh v1, v1, v6
>> -; CHECK-P8-NEXT:    vmrglh v6, v8, v7
>> -; CHECK-P8-NEXT:    vmrglh v7, v9, v10
>> -; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>> -; CHECK-P8-NEXT:    vmrglw v4, v1, v0
>> -; CHECK-P8-NEXT:    vmrglw v5, v7, v6
>> +; CHECK-P8-NEXT:    vmrglw v3, v0, v5
>> +; CHECK-P8-NEXT:    vmrglw v4, v1, v4
>> +; CHECK-P8-NEXT:    vmrglw v5, v6, v7
>>  ; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>>  ; CHECK-P8-NEXT:    stvx v2, 0, r3
>>  ; CHECK-P8-NEXT:    xxmrgld v3, v5, v4
>> @@ -481,118 +437,102 @@ define void @test16elt(<16 x i16>* noalias
>> nocapture sret %agg.result, <16 x flo
>>  ;
>>  ; CHECK-P9-LABEL: test16elt:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    lxv vs1, 0(r4)
>> -; CHECK-P9-NEXT:    lxv vs3, 16(r4)
>> -; CHECK-P9-NEXT:    xscvspdpn f5, vs1
>> -; CHECK-P9-NEXT:    xxsldwi vs2, vs1, vs1, 3
>> -; CHECK-P9-NEXT:    xscvspdpn f8, vs3
>> -; CHECK-P9-NEXT:    xxswapd vs4, vs1
>> -; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    lxv vs2, 0(r4)
>> +; CHECK-P9-NEXT:    xxsldwi vs3, vs2, vs2, 3
>> +; CHECK-P9-NEXT:    xxswapd vs4, vs2
>> +; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>>  ; CHECK-P9-NEXT:    xscvspdpn f4, vs4
>> -; CHECK-P9-NEXT:    xscvdpsxws f5, f5
>> +; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    xscvspdpn f5, vs2
>> +; CHECK-P9-NEXT:    xxsldwi vs2, vs2, vs2, 1
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT:    xscvdpsxws f8, f8
>> -; CHECK-P9-NEXT:    xxsldwi vs6, vs3, vs3, 3
>> -; CHECK-P9-NEXT:    xxswapd vs7, vs3
>> -; CHECK-P9-NEXT:    xscvspdpn f6, vs6
>> -; CHECK-P9-NEXT:    xxsldwi vs3, vs3, vs3, 1
>> -; CHECK-P9-NEXT:    xscvspdpn f7, vs7
>> -; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    mffprwz r5, f3
>> +; CHECK-P9-NEXT:    lxv vs1, 16(r4)
>> +; CHECK-P9-NEXT:    xxsldwi vs6, vs1, vs1, 3
>> +; CHECK-P9-NEXT:    xxswapd vs3, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v2, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f4
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f5
>> +; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r5
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT:    mffprwz r5, f4
>> +; CHECK-P9-NEXT:    xscvspdpn f4, vs6
>> +; CHECK-P9-NEXT:    mtvsrd v3, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f2
>> +; CHECK-P9-NEXT:    xscvspdpn f2, vs1
>> +; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P9-NEXT:    xscvdpsxws f6, f6
>> -; CHECK-P9-NEXT:    mffprwz r5, f5
>> -; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    xscvdpsxws f7, f7
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT:    mtfprd f5, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f8
>> -; CHECK-P9-NEXT:    mtfprd f8, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f2
>>  ; CHECK-P9-NEXT:    lxv vs0, 32(r4)
>> -; CHECK-P9-NEXT:    xxsldwi vs9, vs0, vs0, 3
>> -; CHECK-P9-NEXT:    xxswapd vs10, vs0
>> -; CHECK-P9-NEXT:    xscvspdpn f9, vs9
>> -; CHECK-P9-NEXT:    xscvspdpn f10, vs10
>> -; CHECK-P9-NEXT:    xscvdpsxws f9, f9
>> -; CHECK-P9-NEXT:    xscvdpsxws f10, f10
>> -; CHECK-P9-NEXT:    mtfprd f2, r5
>> +; CHECK-P9-NEXT:    mtvsrd v4, r5
>> +; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>> +; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r5, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r5
>> +; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f3
>> +; CHECK-P9-NEXT:    xxsldwi vs3, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f2
>> +; CHECK-P9-NEXT:    xscvspdpn f2, vs3
>> +; CHECK-P9-NEXT:    vmrghh v4, v5, v4
>> +; CHECK-P9-NEXT:    mtvsrd v5, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f6
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs4
>> +; CHECK-P9-NEXT:    xxswapd vs1, vs0
>> +; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>> +; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    vmrghh v5, v5, v0
>> +; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrglw v3, v5, v4
>> +; CHECK-P9-NEXT:    mffprwz r5, f2
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    mtfprd f6, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f7
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> +; CHECK-P9-NEXT:    mffprwz r5, f1
>>  ; CHECK-P9-NEXT:    lxv vs1, 48(r4)
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs5
>> -; CHECK-P9-NEXT:    mtfprd f7, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f3
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs6
>> -; CHECK-P9-NEXT:    xxswapd v5, vs7
>> -; CHECK-P9-NEXT:    mtfprd f3, r5
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT:    xxswapd v0, vs3
>> -; CHECK-P9-NEXT:    vmrglh v4, v5, v4
>> -; CHECK-P9-NEXT:    xxswapd v5, vs8
>> -; CHECK-P9-NEXT:    vmrglh v5, v5, v0
>> +; CHECK-P9-NEXT:    mtvsrd v1, r5
>> +; CHECK-P9-NEXT:    vmrghh v0, v1, v0
>>  ; CHECK-P9-NEXT:    mffprwz r4, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r4
>> -; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    vmrglw v3, v5, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>>  ; CHECK-P9-NEXT:    xxmrgld vs2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>> +; CHECK-P9-NEXT:    mffprwz r4, f0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs1, vs1, 3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P9-NEXT:    vmrghh v2, v4, v2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrglw v2, v2, v0
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    vmrglh v2, v4, v2
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT:    mffprwz r5, f9
>> -; CHECK-P9-NEXT:    mtfprd f9, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f10
>> -; CHECK-P9-NEXT:    mtfprd f10, r5
>> -; CHECK-P9-NEXT:    xxswapd v0, vs9
>> -; CHECK-P9-NEXT:    xxswapd v1, vs10
>> -; CHECK-P9-NEXT:    vmrglh v0, v1, v0
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v0
>> -; CHECK-P9-NEXT:    stxv vs2, 0(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r4
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld vs0, v3, v2
>>  ; CHECK-P9-NEXT:    stxv vs0, 16(r3)
>> +; CHECK-P9-NEXT:    stxv vs2, 0(r3)
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>>  ; CHECK-BE-LABEL: test16elt:
>> @@ -728,12 +668,10 @@ define i32 @test2elt_signed(i64 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -748,13 +686,11 @@ define i32 @test2elt_signed(i64 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs1
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    li r3, 0
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P9-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -798,20 +734,16 @@ define i64 @test4elt_signed(<4 x float> %a)
>> local_unnamed_addr #1 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    mtfprd f3, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs3
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglh v3, v4, v5
>> -; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r3
>> +; CHECK-P8-NEXT:    vmrghh v3, v4, v3
>> +; CHECK-P8-NEXT:    vmrghh v2, v2, v5
>> +; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -822,27 +754,23 @@ define i64 @test4elt_signed(<4 x float> %a)
>> local_unnamed_addr #1 {
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, v2, v2, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglh v2, v4, v2
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>> +; CHECK-P9-NEXT:    vmrghh v2, v4, v2
>>  ; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -888,59 +816,51 @@ define <8 x i16> @test8elt_signed(<8 x float>*
>> nocapture readonly) local_unnamed
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    lvx v2, 0, r3
>>  ; CHECK-P8-NEXT:    li r4, 16
>> -; CHECK-P8-NEXT:    lvx v5, r3, r4
>> -; CHECK-P8-NEXT:    xxswapd vs1, v2
>> +; CHECK-P8-NEXT:    lvx v3, r3, r4
>>  ; CHECK-P8-NEXT:    xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT:    xxsldwi vs2, v5, v5, 3
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v5
>> -; CHECK-P8-NEXT:    xxswapd vs3, v5
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xxswapd vs1, v2
>> +; CHECK-P8-NEXT:    xscvspdpn f2, v2
>> +; CHECK-P8-NEXT:    xxsldwi vs4, v2, v2, 1
>> +; CHECK-P8-NEXT:    xxsldwi vs5, v3, v3, 3
>> +; CHECK-P8-NEXT:    xscvspdpn f3, v3
>>  ; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xscvspdpn f4, vs4
>>  ; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    mffprwz r6, f1
>> -; CHECK-P8-NEXT:    mffprwz r5, f0
>> -; CHECK-P8-NEXT:    mtfprd f1, r6
>> -; CHECK-P8-NEXT:    mtfprd f0, r5
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    xscvspdpn f0, v2
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v1, vs4
>> -; CHECK-P8-NEXT:    vmrglh v2, v4, v3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    xxswapd v5, vs2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    xxswapd vs0, v3
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xxsldwi vs1, v3, v3, 1
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    mffprwz r3, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f4
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghh v2, v4, v2
>> +; CHECK-P8-NEXT:    mffprwz r4, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v6, vs3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs0
>> -; CHECK-P8-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P8-NEXT:    vmrglh v4, v0, v5
>> -; CHECK-P8-NEXT:    vmrglh v5, v1, v6
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    vmrghh v3, v3, v4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    vmrghh v5, v0, v5
>> +; CHECK-P8-NEXT:    mtvsrd v1, r3
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>> +; CHECK-P8-NEXT:    vmrghh v4, v4, v1
>> +; CHECK-P8-NEXT:    vmrglw v3, v4, v5
>>  ; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>> @@ -952,53 +872,45 @@ define <8 x i16> @test8elt_signed(<8 x float>*
>> nocapture readonly) local_unnamed
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    lxv vs0, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghh v3, v4, v3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld v2, v3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -1071,116 +983,100 @@ define void @test16elt_signed(<16 x i16>*
>> noalias nocapture sret %agg.result, <1
>>  ; CHECK-P8-LABEL: test16elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    lvx v5, 0, r4
>> -; CHECK-P8-NEXT:    li r6, 32
>>  ; CHECK-P8-NEXT:    li r5, 16
>> -; CHECK-P8-NEXT:    lvx v2, r4, r6
>> +; CHECK-P8-NEXT:    li r6, 32
>>  ; CHECK-P8-NEXT:    lvx v3, r4, r5
>> +; CHECK-P8-NEXT:    lvx v2, r4, r6
>>  ; CHECK-P8-NEXT:    li r6, 48
>> -; CHECK-P8-NEXT:    xscvspdpn f0, v5
>> -; CHECK-P8-NEXT:    xxsldwi vs1, v5, v5, 3
>> +; CHECK-P8-NEXT:    xxsldwi vs0, v5, v5, 3
>> +; CHECK-P8-NEXT:    xscvspdpn f1, v5
>>  ; CHECK-P8-NEXT:    lvx v4, r4, r6
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v2
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs3, v5
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    xxswapd vs8, v3
>> -; CHECK-P8-NEXT:    xscvspdpn f6, v4
>> +; CHECK-P8-NEXT:    xxsldwi vs5, v5, v5, 1
>>  ; CHECK-P8-NEXT:    xxsldwi vs7, v3, v3, 3
>> +; CHECK-P8-NEXT:    xxswapd vs8, v3
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>>  ; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xxsldwi vs10, v2, v2, 3
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT:    xscvspdpn f7, vs7
>> +; CHECK-P8-NEXT:    xscvspdpn f8, vs8
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    xxsldwi vs9, v3, v3, 1
>> +; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT:    xscvspdpn f2, v3
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f5
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    xxsldwi vs0, v3, v3, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f4, v2
>> +; CHECK-P8-NEXT:    xscvdpsxws f5, f7
>> +; CHECK-P8-NEXT:    xxsldwi vs7, v4, v4, 3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    xxsldwi vs3, v2, v2, 3
>> +; CHECK-P8-NEXT:    xscvspdpn f6, v4
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f8
>> +; CHECK-P8-NEXT:    xxswapd vs8, v4
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f5
>> +; CHECK-P8-NEXT:    xxswapd vs5, v2
>>  ; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT:    xxsldwi vs12, v2, v2, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f8, vs8
>> -; CHECK-P8-NEXT:    xxswapd vs11, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT:    xxswapd v2, v4
>> +; CHECK-P8-NEXT:    vmrghh v3, v0, v3
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xscvdpsxws f6, f6
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs5
>> +; CHECK-P8-NEXT:    xxsldwi vs5, v2, v2, 1
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghh v2, v5, v1
>> +; CHECK-P8-NEXT:    vmrghh v5, v6, v0
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f4
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f3
>> +; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f6
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>>  ; CHECK-P8-NEXT:    xscvspdpn f7, vs7
>> -; CHECK-P8-NEXT:    xxsldwi vs13, v4, v4, 3
>> -; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT:    xxsldwi v3, v4, v4, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f10, vs10
>> +; CHECK-P8-NEXT:    mtvsrd v7, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f2
>> +; CHECK-P8-NEXT:    xxsldwi vs2, v4, v4, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f8, vs8
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs2
>> +; CHECK-P8-NEXT:    xscvdpsxws f3, f7
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f8
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xscvspdpn f9, vs9
>> -; CHECK-P8-NEXT:    xscvdpsxws f6, f6
>> -; CHECK-P8-NEXT:    xscvspdpn f12, vs12
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> +; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    vmrghh v0, v0, v7
>> +; CHECK-P8-NEXT:    mtvsrd v7, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    xscvspdpn f11, vs11
>> -; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xscvspdpn v2, v2
>> -; CHECK-P8-NEXT:    xscvdpsxws f8, f8
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT:    mffprwz r6, f2
>> -; CHECK-P8-NEXT:    xscvspdpn f13, vs13
>> -; CHECK-P8-NEXT:    xscvspdpn v3, v3
>> -; CHECK-P8-NEXT:    xscvdpsxws f10, f10
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> +; CHECK-P8-NEXT:    vmrghh v4, v8, v4
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT:    mtfprd f2, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f6
>> -; CHECK-P8-NEXT:    xscvdpsxws f12, f12
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f6, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f3
>> -; CHECK-P8-NEXT:    xscvdpsxws v2, v2
>> -; CHECK-P8-NEXT:    xxswapd v9, vs6
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f8
>> -; CHECK-P8-NEXT:    mtfprd f3, r6
>> -; CHECK-P8-NEXT:    xxswapd v0, vs5
>> -; CHECK-P8-NEXT:    mffprwz r6, f7
>> -; CHECK-P8-NEXT:    xscvdpsxws f13, f13
>> -; CHECK-P8-NEXT:    xxswapd v5, vs3
>> -; CHECK-P8-NEXT:    xscvdpsxws v3, v3
>> -; CHECK-P8-NEXT:    mtfprd f8, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f10
>> -; CHECK-P8-NEXT:    mtfprd f7, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f9
>> -; CHECK-P8-NEXT:    mtfprd f10, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f12
>> -; CHECK-P8-NEXT:    mtfprd f9, r6
>> -; CHECK-P8-NEXT:    xxswapd v6, vs10
>> -; CHECK-P8-NEXT:    mffprwz r6, f11
>> -; CHECK-P8-NEXT:    mtfprd f12, r4
>> -; CHECK-P8-NEXT:    xxswapd v1, vs9
>> -; CHECK-P8-NEXT:    mfvsrwz r4, v2
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    mtfprd f11, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f13
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    xxswapd v7, vs11
>> -; CHECK-P8-NEXT:    mfvsrwz r4, v3
>> -; CHECK-P8-NEXT:    vmrglh v3, v5, v4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs7
>> -; CHECK-P8-NEXT:    vmrglh v2, v2, v0
>> -; CHECK-P8-NEXT:    xxswapd v5, vs8
>> -; CHECK-P8-NEXT:    xxswapd v0, vs2
>> -; CHECK-P8-NEXT:    mtfprd f13, r6
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>> -; CHECK-P8-NEXT:    vmrglh v4, v5, v4
>> -; CHECK-P8-NEXT:    vmrglh v5, v0, v1
>> -; CHECK-P8-NEXT:    xxswapd v1, vs4
>> -; CHECK-P8-NEXT:    vmrglh v0, v7, v6
>> -; CHECK-P8-NEXT:    xxswapd v6, vs12
>> -; CHECK-P8-NEXT:    xxswapd v7, vs13
>> -; CHECK-P8-NEXT:    xxswapd v10, vs1
>> +; CHECK-P8-NEXT:    vmrghh v1, v1, v9
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>> +; CHECK-P8-NEXT:    vmrghh v7, v8, v7
>> +; CHECK-P8-NEXT:    vmrghh v6, v6, v9
>>  ; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>> -; CHECK-P8-NEXT:    vmrglh v1, v1, v6
>> -; CHECK-P8-NEXT:    vmrglh v6, v8, v7
>> -; CHECK-P8-NEXT:    vmrglh v7, v9, v10
>> -; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>> -; CHECK-P8-NEXT:    vmrglw v4, v1, v0
>> -; CHECK-P8-NEXT:    vmrglw v5, v7, v6
>> +; CHECK-P8-NEXT:    vmrglw v3, v0, v5
>> +; CHECK-P8-NEXT:    vmrglw v4, v1, v4
>> +; CHECK-P8-NEXT:    vmrglw v5, v6, v7
>>  ; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>>  ; CHECK-P8-NEXT:    stvx v2, 0, r3
>>  ; CHECK-P8-NEXT:    xxmrgld v3, v5, v4
>> @@ -1189,118 +1085,102 @@ define void @test16elt_signed(<16 x i16>*
>> noalias nocapture sret %agg.result, <1
>>  ;
>>  ; CHECK-P9-LABEL: test16elt_signed:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    lxv vs1, 0(r4)
>> -; CHECK-P9-NEXT:    lxv vs3, 16(r4)
>> -; CHECK-P9-NEXT:    xscvspdpn f5, vs1
>> -; CHECK-P9-NEXT:    xxsldwi vs2, vs1, vs1, 3
>> -; CHECK-P9-NEXT:    xscvspdpn f8, vs3
>> -; CHECK-P9-NEXT:    xxswapd vs4, vs1
>> -; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    lxv vs2, 0(r4)
>> +; CHECK-P9-NEXT:    xxsldwi vs3, vs2, vs2, 3
>> +; CHECK-P9-NEXT:    xxswapd vs4, vs2
>> +; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>>  ; CHECK-P9-NEXT:    xscvspdpn f4, vs4
>> -; CHECK-P9-NEXT:    xscvdpsxws f5, f5
>> +; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    xscvspdpn f5, vs2
>> +; CHECK-P9-NEXT:    xxsldwi vs2, vs2, vs2, 1
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT:    xscvdpsxws f8, f8
>> -; CHECK-P9-NEXT:    xxsldwi vs6, vs3, vs3, 3
>> -; CHECK-P9-NEXT:    xxswapd vs7, vs3
>> -; CHECK-P9-NEXT:    xscvspdpn f6, vs6
>> -; CHECK-P9-NEXT:    xxsldwi vs3, vs3, vs3, 1
>> -; CHECK-P9-NEXT:    xscvspdpn f7, vs7
>> -; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    mffprwz r5, f3
>> +; CHECK-P9-NEXT:    lxv vs1, 16(r4)
>> +; CHECK-P9-NEXT:    xxsldwi vs6, vs1, vs1, 3
>> +; CHECK-P9-NEXT:    xxswapd vs3, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v2, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f4
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f5
>> +; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r5
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT:    mffprwz r5, f4
>> +; CHECK-P9-NEXT:    xscvspdpn f4, vs6
>> +; CHECK-P9-NEXT:    mtvsrd v3, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f2
>> +; CHECK-P9-NEXT:    xscvspdpn f2, vs1
>> +; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P9-NEXT:    xscvdpsxws f6, f6
>> -; CHECK-P9-NEXT:    mffprwz r5, f5
>> -; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    xscvdpsxws f7, f7
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT:    mtfprd f5, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f8
>> -; CHECK-P9-NEXT:    mtfprd f8, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f2
>>  ; CHECK-P9-NEXT:    lxv vs0, 32(r4)
>> -; CHECK-P9-NEXT:    xxsldwi vs9, vs0, vs0, 3
>> -; CHECK-P9-NEXT:    xxswapd vs10, vs0
>> -; CHECK-P9-NEXT:    xscvspdpn f9, vs9
>> -; CHECK-P9-NEXT:    xscvspdpn f10, vs10
>> -; CHECK-P9-NEXT:    xscvdpsxws f9, f9
>> -; CHECK-P9-NEXT:    xscvdpsxws f10, f10
>> -; CHECK-P9-NEXT:    mtfprd f2, r5
>> +; CHECK-P9-NEXT:    mtvsrd v4, r5
>> +; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>> +; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r5, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r5
>> +; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f3
>> +; CHECK-P9-NEXT:    xxsldwi vs3, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f2
>> +; CHECK-P9-NEXT:    xscvspdpn f2, vs3
>> +; CHECK-P9-NEXT:    vmrghh v4, v5, v4
>> +; CHECK-P9-NEXT:    mtvsrd v5, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f6
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs4
>> +; CHECK-P9-NEXT:    xxswapd vs1, vs0
>> +; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>> +; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    vmrghh v5, v5, v0
>> +; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrglw v3, v5, v4
>> +; CHECK-P9-NEXT:    mffprwz r5, f2
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    mtfprd f6, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f7
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> +; CHECK-P9-NEXT:    mffprwz r5, f1
>>  ; CHECK-P9-NEXT:    lxv vs1, 48(r4)
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs5
>> -; CHECK-P9-NEXT:    mtfprd f7, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f3
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs6
>> -; CHECK-P9-NEXT:    xxswapd v5, vs7
>> -; CHECK-P9-NEXT:    mtfprd f3, r5
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT:    xxswapd v0, vs3
>> -; CHECK-P9-NEXT:    vmrglh v4, v5, v4
>> -; CHECK-P9-NEXT:    xxswapd v5, vs8
>> -; CHECK-P9-NEXT:    vmrglh v5, v5, v0
>> +; CHECK-P9-NEXT:    mtvsrd v1, r5
>> +; CHECK-P9-NEXT:    vmrghh v0, v1, v0
>>  ; CHECK-P9-NEXT:    mffprwz r4, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r4
>> -; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    vmrglw v3, v5, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>>  ; CHECK-P9-NEXT:    xxmrgld vs2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>> +; CHECK-P9-NEXT:    mffprwz r4, f0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs1, vs1, 3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P9-NEXT:    vmrghh v2, v4, v2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrglw v2, v2, v0
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    vmrglh v2, v4, v2
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT:    mffprwz r5, f9
>> -; CHECK-P9-NEXT:    mtfprd f9, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f10
>> -; CHECK-P9-NEXT:    mtfprd f10, r5
>> -; CHECK-P9-NEXT:    xxswapd v0, vs9
>> -; CHECK-P9-NEXT:    xxswapd v1, vs10
>> -; CHECK-P9-NEXT:    vmrglh v0, v1, v0
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v0
>> -; CHECK-P9-NEXT:    stxv vs2, 0(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r4
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld vs0, v3, v2
>>  ; CHECK-P9-NEXT:    stxv vs0, 16(r3)
>> +; CHECK-P9-NEXT:    stxv vs2, 0(r3)
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>>  ; CHECK-BE-LABEL: test16elt_signed:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> index 1f95eda2b1b5..928a19f3a55c 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp32_to_i8_elts.ll
>> @@ -20,12 +20,10 @@ define i16 @test2elt(i64 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    vmrglb v2, v3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    clrldi r3, r3, 48
>> @@ -43,13 +41,11 @@ define i16 @test2elt(i64 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    addi r3, r1, -2
>> -; CHECK-P9-NEXT:    xxswapd v2, vs1
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>> -; CHECK-P9-NEXT:    vmrglb v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P9-NEXT:    vsldoi v2, v2, v2, 8
>>  ; CHECK-P9-NEXT:    stxsihx v2, 0, r3
>>  ; CHECK-P9-NEXT:    lhz r3, -2(r1)
>> @@ -97,20 +93,16 @@ define i32 @test4elt(<4 x float> %a)
>> local_unnamed_addr #1 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    mtfprd f3, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs3
>> -; CHECK-P8-NEXT:    vmrglb v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglb v3, v4, v5
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r3
>> +; CHECK-P8-NEXT:    vmrghb v3, v4, v3
>> +; CHECK-P8-NEXT:    vmrghb v2, v2, v5
>> +; CHECK-P8-NEXT:    vmrglh v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -121,28 +113,24 @@ define i32 @test4elt(<4 x float> %a)
>> local_unnamed_addr #1 {
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghb v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, v2, v2, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    li r3, 0
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglb v2, v4, v2
>> +; CHECK-P9-NEXT:    vmrghb v2, v4, v2
>>  ; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>>  ; CHECK-P9-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -189,59 +177,51 @@ define i64 @test8elt(<8 x float>* nocapture
>> readonly) local_unnamed_addr #2 {
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    lvx v2, 0, r3
>>  ; CHECK-P8-NEXT:    li r4, 16
>> -; CHECK-P8-NEXT:    lvx v5, r3, r4
>> -; CHECK-P8-NEXT:    xxswapd vs1, v2
>> +; CHECK-P8-NEXT:    lvx v3, r3, r4
>>  ; CHECK-P8-NEXT:    xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT:    xxsldwi vs2, v5, v5, 3
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v5
>> -; CHECK-P8-NEXT:    xxswapd vs3, v5
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xxswapd vs1, v2
>> +; CHECK-P8-NEXT:    xscvspdpn f2, v2
>> +; CHECK-P8-NEXT:    xxsldwi vs4, v2, v2, 1
>> +; CHECK-P8-NEXT:    xxsldwi vs5, v3, v3, 3
>> +; CHECK-P8-NEXT:    xscvspdpn f3, v3
>>  ; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xscvspdpn f4, vs4
>>  ; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    mffprwz r6, f1
>> -; CHECK-P8-NEXT:    mffprwz r5, f0
>> -; CHECK-P8-NEXT:    mtfprd f1, r6
>> -; CHECK-P8-NEXT:    mtfprd f0, r5
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    xscvspdpn f0, v2
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v1, vs4
>> -; CHECK-P8-NEXT:    vmrglb v2, v4, v3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    xxswapd v5, vs2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    xxswapd vs0, v3
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xxsldwi vs1, v3, v3, 1
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    mffprwz r3, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f4
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghb v2, v4, v2
>> +; CHECK-P8-NEXT:    mffprwz r4, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v6, vs3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs0
>> -; CHECK-P8-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P8-NEXT:    vmrglb v4, v0, v5
>> -; CHECK-P8-NEXT:    vmrglb v5, v1, v6
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    vmrghb v3, v3, v4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    vmrghb v5, v0, v5
>> +; CHECK-P8-NEXT:    mtvsrd v1, r3
>>  ; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglh v3, v5, v4
>> +; CHECK-P8-NEXT:    vmrghb v4, v4, v1
>> +; CHECK-P8-NEXT:    vmrglh v3, v4, v5
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>> @@ -255,53 +235,45 @@ define i64 @test8elt(<8 x float>* nocapture
>> readonly) local_unnamed_addr #2 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    lxv vs0, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    vmrglb v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghb v3, v4, v3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>> @@ -376,117 +348,101 @@ entry:
>>  define <16 x i8> @test16elt(<16 x float>* nocapture readonly)
>> local_unnamed_addr #3 {
>>  ; CHECK-P8-LABEL: test16elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    lvx v2, 0, r3
>> +; CHECK-P8-NEXT:    lvx v4, 0, r3
>>  ; CHECK-P8-NEXT:    li r4, 16
>> +; CHECK-P8-NEXT:    li r5, 32
>>  ; CHECK-P8-NEXT:    lvx v3, r3, r4
>> -; CHECK-P8-NEXT:    li r4, 32
>> -; CHECK-P8-NEXT:    xscvspdpn f2, v2
>> -; CHECK-P8-NEXT:    xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v3
>> -; CHECK-P8-NEXT:    xxswapd vs1, v2
>> -; CHECK-P8-NEXT:    xxsldwi vs3, v2, v2, 1
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v3, v3, 3
>> -; CHECK-P8-NEXT:    lvx v2, r3, r4
>> +; CHECK-P8-NEXT:    lvx v2, r3, r5
>> +; CHECK-P8-NEXT:    xxsldwi vs0, v4, v4, 3
>> +; CHECK-P8-NEXT:    xxswapd vs2, v4
>> +; CHECK-P8-NEXT:    xxsldwi vs4, v4, v4, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f1, v4
>> +; CHECK-P8-NEXT:    xscvspdpn f3, v3
>> +; CHECK-P8-NEXT:    xxsldwi vs6, v3, v3, 3
>>  ; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT:    xxswapd vs6, v3
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    xxsldwi vs7, v3, v3, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT:    xxsldwi vs8, v2, v2, 3
>> -; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT:    xxswapd vs9, v2
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    xxswapd vs7, v3
>> +; CHECK-P8-NEXT:    xscvspdpn f2, vs2
>> +; CHECK-P8-NEXT:    xxsldwi vs8, v3, v3, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f4, vs4
>> +; CHECK-P8-NEXT:    xxsldwi vs9, v2, v2, 3
>>  ; CHECK-P8-NEXT:    xscvspdpn f6, vs6
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    mffprwz r4, f2
>>  ; CHECK-P8-NEXT:    xscvspdpn f7, vs7
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>>  ; CHECK-P8-NEXT:    xscvspdpn f8, vs8
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f5
>> -; CHECK-P8-NEXT:    xxswapd v0, vs4
>> +; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>>  ; CHECK-P8-NEXT:    xscvspdpn f9, vs9
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    xxswapd vs0, v2
>> +; CHECK-P8-NEXT:    mffprwz r5, f2
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    mtvsrd v4, r5
>> +; CHECK-P8-NEXT:    mffprwz r5, f4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f6
>> -; CHECK-P8-NEXT:    xxswapd v3, vs5
>> -; CHECK-P8-NEXT:    mtfprd f6, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    vmrghb v3, v4, v3
>> +; CHECK-P8-NEXT:    mtvsrd v4, r5
>> +; CHECK-P8-NEXT:    mffprwz r5, f3
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f7
>> -; CHECK-P8-NEXT:    xxswapd v4, vs6
>> -; CHECK-P8-NEXT:    mtfprd f7, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f8
>> -; CHECK-P8-NEXT:    xxswapd v5, vs7
>> -; CHECK-P8-NEXT:    mtfprd f8, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    xscvdpsxws f1, f9
>> -; CHECK-P8-NEXT:    xxswapd v1, vs8
>> -; CHECK-P8-NEXT:    mtfprd f9, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P8-NEXT:    xxswapd v4, vs2
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    xxswapd v6, vs9
>> -; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    xscvspdpn f0, v2
>> -; CHECK-P8-NEXT:    xxswapd v7, vs3
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    vmrglb v4, v4, v5
>> -; CHECK-P8-NEXT:    xxswapd v5, vs5
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f8
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>>  ; CHECK-P8-NEXT:    li r4, 48
>> -; CHECK-P8-NEXT:    lvx v9, r3, r4
>> -; CHECK-P8-NEXT:    vmrglb v1, v6, v1
>> -; CHECK-P8-NEXT:    xxswapd v8, vs1
>> +; CHECK-P8-NEXT:    lvx v0, r3, r4
>> +; CHECK-P8-NEXT:    mffprwz r3, f1
>>  ; CHECK-P8-NEXT:    xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT:    xxsldwi vs2, v9, v9, 3
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v9
>> -; CHECK-P8-NEXT:    xxswapd vs3, v9
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v9, v9, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f5, v2
>> +; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    xxsldwi vs3, v0, v0, 3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>> +; CHECK-P8-NEXT:    xxswapd vs4, v0
>>  ; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    xscvspdpn f2, vs2
>> +; CHECK-P8-NEXT:    mtvsrd v7, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f0
>> +; CHECK-P8-NEXT:    xxsldwi vs0, v0, v0, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f2, v0
>>  ; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P8-NEXT:    xscvdpsxws f6, f9
>> +; CHECK-P8-NEXT:    xscvspdpn f4, vs4
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f6
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghb v2, v6, v1
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f5
>> +; CHECK-P8-NEXT:    mtvsrd v6, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    vmrghb v4, v5, v4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r5
>> +; CHECK-P8-NEXT:    vmrghb v0, v6, v1
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v9, vs4
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    mtvsrd v6, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> -; CHECK-P8-NEXT:    xxswapd v6, vs1
>> -; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    vmrglb v2, v0, v7
>> -; CHECK-P8-NEXT:    xxswapd v0, vs0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v7, vs2
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    vmrglb v5, v8, v5
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>> -; CHECK-P8-NEXT:    xxswapd v10, vs3
>> -; CHECK-P8-NEXT:    vmrglb v0, v0, v6
>> +; CHECK-P8-NEXT:    vmrghb v5, v5, v7
>> +; CHECK-P8-NEXT:    vmrghb v1, v1, v6
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f4
>> +; CHECK-P8-NEXT:    mtvsrd v7, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f0
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mtvsrd v9, r3
>> +; CHECK-P8-NEXT:    vmrghb v7, v8, v7
>> +; CHECK-P8-NEXT:    vmrghb v6, v6, v9
>>  ; CHECK-P8-NEXT:    vmrglh v3, v4, v3
>> -; CHECK-P8-NEXT:    vmrglb v6, v8, v7
>> -; CHECK-P8-NEXT:    vmrglb v7, v9, v10
>> -; CHECK-P8-NEXT:    vmrglh v2, v2, v1
>> -; CHECK-P8-NEXT:    vmrglh v4, v0, v5
>> -; CHECK-P8-NEXT:    vmrglh v5, v7, v6
>> +; CHECK-P8-NEXT:    vmrglh v2, v5, v2
>> +; CHECK-P8-NEXT:    vmrglh v4, v1, v0
>> +; CHECK-P8-NEXT:    vmrglh v5, v6, v7
>>  ; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>>  ; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>>  ; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>> @@ -494,114 +450,98 @@ define <16 x i8> @test16elt(<16 x float>*
>> nocapture readonly) local_unnamed_addr
>>  ;
>>  ; CHECK-P9-LABEL: test16elt:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    lxv vs2, 0(r3)
>> +; CHECK-P9-NEXT:    lxv vs3, 0(r3)
>> +; CHECK-P9-NEXT:    xxsldwi vs4, vs3, vs3, 3
>> +; CHECK-P9-NEXT:    xscvspdpn f4, vs4
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    lxv vs0, 48(r3)
>> +; CHECK-P9-NEXT:    lxv vs1, 32(r3)
>> +; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>> +; CHECK-P9-NEXT:    mffprwz r3, f4
>> +; CHECK-P9-NEXT:    xxswapd vs4, vs3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>> +; CHECK-P9-NEXT:    xscvspdpn f4, vs4
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    mffprwz r3, f4
>> +; CHECK-P9-NEXT:    xscvspdpn f4, vs3
>> +; CHECK-P9-NEXT:    xxsldwi vs3, vs3, vs3, 1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>> +; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    mffprwz r3, f4
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    mffprwz r3, f3
>>  ; CHECK-P9-NEXT:    xxsldwi vs3, vs2, vs2, 3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT:    lxv vs0, 48(r3)
>> -; CHECK-P9-NEXT:    lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT:    lxv vs4, 16(r3)
>> +; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs3
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>>  ; CHECK-P9-NEXT:    xscvspdpn f3, vs2
>>  ; CHECK-P9-NEXT:    xxsldwi vs2, vs2, vs2, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    vmrghb v3, v4, v3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    xxsldwi vs2, vs4, vs4, 3
>> -; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    vmrglb v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    xxswapd vs2, vs4
>> -; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    xscvspdpn f2, vs4
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    xxsldwi vs2, vs4, vs4, 1
>> -; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs2
>>  ; CHECK-P9-NEXT:    xxsldwi vs2, vs1, vs1, 3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghb v3, v4, v3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghb v4, v5, v4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v4, v5, v4
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>> -; CHECK-P9-NEXT:    xxswapd v0, vs0
>> -; CHECK-P9-NEXT:    vmrglb v5, v5, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r3
>> +; CHECK-P9-NEXT:    vmrghb v5, v5, v0
>>  ; CHECK-P9-NEXT:    vmrglh v4, v5, v4
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld v2, v3, v2
>> @@ -738,12 +678,10 @@ define i16 @test2elt_signed(i64 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    vmrglb v2, v3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    clrldi r3, r3, 48
>> @@ -761,13 +699,11 @@ define i16 @test2elt_signed(i64 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    addi r3, r1, -2
>> -; CHECK-P9-NEXT:    xxswapd v2, vs1
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>> -; CHECK-P9-NEXT:    vmrglb v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P9-NEXT:    vsldoi v2, v2, v2, 8
>>  ; CHECK-P9-NEXT:    stxsihx v2, 0, r3
>>  ; CHECK-P9-NEXT:    lhz r3, -2(r1)
>> @@ -815,20 +751,16 @@ define i32 @test4elt_signed(<4 x float> %a)
>> local_unnamed_addr #1 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    mtfprd f3, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs3
>> -; CHECK-P8-NEXT:    vmrglb v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglb v3, v4, v5
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r3
>> +; CHECK-P8-NEXT:    vmrghb v3, v4, v3
>> +; CHECK-P8-NEXT:    vmrghb v2, v2, v5
>> +; CHECK-P8-NEXT:    vmrglh v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -839,28 +771,24 @@ define i32 @test4elt_signed(<4 x float> %a)
>> local_unnamed_addr #1 {
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghb v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, v2, v2, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    li r3, 0
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglb v2, v4, v2
>> +; CHECK-P9-NEXT:    vmrghb v2, v4, v2
>>  ; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>>  ; CHECK-P9-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -907,59 +835,51 @@ define i64 @test8elt_signed(<8 x float>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    lvx v2, 0, r3
>>  ; CHECK-P8-NEXT:    li r4, 16
>> -; CHECK-P8-NEXT:    lvx v5, r3, r4
>> -; CHECK-P8-NEXT:    xxswapd vs1, v2
>> +; CHECK-P8-NEXT:    lvx v3, r3, r4
>>  ; CHECK-P8-NEXT:    xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT:    xxsldwi vs2, v5, v5, 3
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v5
>> -; CHECK-P8-NEXT:    xxswapd vs3, v5
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v5, v5, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xxswapd vs1, v2
>> +; CHECK-P8-NEXT:    xscvspdpn f2, v2
>> +; CHECK-P8-NEXT:    xxsldwi vs4, v2, v2, 1
>> +; CHECK-P8-NEXT:    xxsldwi vs5, v3, v3, 3
>> +; CHECK-P8-NEXT:    xscvspdpn f3, v3
>>  ; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xscvspdpn f4, vs4
>>  ; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    mffprwz r6, f1
>> -; CHECK-P8-NEXT:    mffprwz r5, f0
>> -; CHECK-P8-NEXT:    mtfprd f1, r6
>> -; CHECK-P8-NEXT:    mtfprd f0, r5
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    xscvspdpn f0, v2
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v1, vs4
>> -; CHECK-P8-NEXT:    vmrglb v2, v4, v3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    xxswapd v5, vs2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mffprwz r3, f1
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    xxswapd vs0, v3
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    xxsldwi vs1, v3, v3, 1
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    mffprwz r3, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f4
>> +; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghb v2, v4, v2
>> +; CHECK-P8-NEXT:    mffprwz r4, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v6, vs3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs0
>> -; CHECK-P8-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P8-NEXT:    vmrglb v4, v0, v5
>> -; CHECK-P8-NEXT:    vmrglb v5, v1, v6
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    vmrghb v3, v3, v4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    vmrghb v5, v0, v5
>> +; CHECK-P8-NEXT:    mtvsrd v1, r3
>>  ; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglh v3, v5, v4
>> +; CHECK-P8-NEXT:    vmrghb v4, v4, v1
>> +; CHECK-P8-NEXT:    vmrglh v3, v4, v5
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>> @@ -973,53 +893,45 @@ define i64 @test8elt_signed(<8 x float>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    lxv vs0, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    vmrglb v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghb v3, v4, v3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>> @@ -1094,117 +1006,101 @@ entry:
>>  define <16 x i8> @test16elt_signed(<16 x float>* nocapture readonly)
>> local_unnamed_addr #3 {
>>  ; CHECK-P8-LABEL: test16elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    lvx v2, 0, r3
>> +; CHECK-P8-NEXT:    lvx v4, 0, r3
>>  ; CHECK-P8-NEXT:    li r4, 16
>> +; CHECK-P8-NEXT:    li r5, 32
>>  ; CHECK-P8-NEXT:    lvx v3, r3, r4
>> -; CHECK-P8-NEXT:    li r4, 32
>> -; CHECK-P8-NEXT:    xscvspdpn f2, v2
>> -; CHECK-P8-NEXT:    xxsldwi vs0, v2, v2, 3
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v3
>> -; CHECK-P8-NEXT:    xxswapd vs1, v2
>> -; CHECK-P8-NEXT:    xxsldwi vs3, v2, v2, 1
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v3, v3, 3
>> -; CHECK-P8-NEXT:    lvx v2, r3, r4
>> +; CHECK-P8-NEXT:    lvx v2, r3, r5
>> +; CHECK-P8-NEXT:    xxsldwi vs0, v4, v4, 3
>> +; CHECK-P8-NEXT:    xxswapd vs2, v4
>> +; CHECK-P8-NEXT:    xxsldwi vs4, v4, v4, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f1, v4
>> +; CHECK-P8-NEXT:    xscvspdpn f3, v3
>> +; CHECK-P8-NEXT:    xxsldwi vs6, v3, v3, 3
>>  ; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> -; CHECK-P8-NEXT:    xxswapd vs6, v3
>> -; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    xxsldwi vs7, v3, v3, 1
>> -; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT:    xxsldwi vs8, v2, v2, 3
>> -; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT:    xxswapd vs9, v2
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    xxswapd vs7, v3
>> +; CHECK-P8-NEXT:    xscvspdpn f2, vs2
>> +; CHECK-P8-NEXT:    xxsldwi vs8, v3, v3, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f4, vs4
>> +; CHECK-P8-NEXT:    xxsldwi vs9, v2, v2, 3
>>  ; CHECK-P8-NEXT:    xscvspdpn f6, vs6
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    mffprwz r4, f2
>>  ; CHECK-P8-NEXT:    xscvspdpn f7, vs7
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>>  ; CHECK-P8-NEXT:    xscvspdpn f8, vs8
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f5
>> -; CHECK-P8-NEXT:    xxswapd v0, vs4
>> +; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>>  ; CHECK-P8-NEXT:    xscvspdpn f9, vs9
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    xxswapd vs0, v2
>> +; CHECK-P8-NEXT:    mffprwz r5, f2
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> +; CHECK-P8-NEXT:    mtvsrd v4, r5
>> +; CHECK-P8-NEXT:    mffprwz r5, f4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f6
>> -; CHECK-P8-NEXT:    xxswapd v3, vs5
>> -; CHECK-P8-NEXT:    mtfprd f6, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    vmrghb v3, v4, v3
>> +; CHECK-P8-NEXT:    mtvsrd v4, r5
>> +; CHECK-P8-NEXT:    mffprwz r5, f3
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f7
>> -; CHECK-P8-NEXT:    xxswapd v4, vs6
>> -; CHECK-P8-NEXT:    mtfprd f7, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f8
>> -; CHECK-P8-NEXT:    xxswapd v5, vs7
>> -; CHECK-P8-NEXT:    mtfprd f8, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    xscvdpsxws f1, f9
>> -; CHECK-P8-NEXT:    xxswapd v1, vs8
>> -; CHECK-P8-NEXT:    mtfprd f9, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P8-NEXT:    xxswapd v4, vs2
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    xxswapd v6, vs9
>> -; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    xscvspdpn f0, v2
>> -; CHECK-P8-NEXT:    xxswapd v7, vs3
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    vmrglb v4, v4, v5
>> -; CHECK-P8-NEXT:    xxswapd v5, vs5
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f8
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>>  ; CHECK-P8-NEXT:    li r4, 48
>> -; CHECK-P8-NEXT:    lvx v9, r3, r4
>> -; CHECK-P8-NEXT:    vmrglb v1, v6, v1
>> -; CHECK-P8-NEXT:    xxswapd v8, vs1
>> +; CHECK-P8-NEXT:    lvx v0, r3, r4
>> +; CHECK-P8-NEXT:    mffprwz r3, f1
>>  ; CHECK-P8-NEXT:    xxsldwi vs1, v2, v2, 1
>> -; CHECK-P8-NEXT:    xxsldwi vs2, v9, v9, 3
>> -; CHECK-P8-NEXT:    xscvspdpn f4, v9
>> -; CHECK-P8-NEXT:    xxswapd vs3, v9
>> -; CHECK-P8-NEXT:    xxsldwi vs5, v9, v9, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f5, v2
>> +; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    xxsldwi vs3, v0, v0, 3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>> +; CHECK-P8-NEXT:    xxswapd vs4, v0
>>  ; CHECK-P8-NEXT:    xscvspdpn f1, vs1
>> -; CHECK-P8-NEXT:    xscvspdpn f2, vs2
>> +; CHECK-P8-NEXT:    mtvsrd v7, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f0
>> +; CHECK-P8-NEXT:    xxsldwi vs0, v0, v0, 1
>> +; CHECK-P8-NEXT:    xscvspdpn f2, v0
>>  ; CHECK-P8-NEXT:    xscvspdpn f3, vs3
>> -; CHECK-P8-NEXT:    xscvspdpn f5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P8-NEXT:    xscvdpsxws f6, f9
>> +; CHECK-P8-NEXT:    xscvspdpn f4, vs4
>> +; CHECK-P8-NEXT:    xscvspdpn f0, vs0
>> +; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f6
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghb v2, v6, v1
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f5
>> +; CHECK-P8-NEXT:    mtvsrd v6, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    vmrghb v4, v5, v4
>> +; CHECK-P8-NEXT:    mtvsrd v5, r5
>> +; CHECK-P8-NEXT:    vmrghb v0, v6, v1
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    xxswapd v9, vs4
>> -; CHECK-P8-NEXT:    mtfprd f1, r3
>> +; CHECK-P8-NEXT:    mtvsrd v6, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> -; CHECK-P8-NEXT:    xxswapd v6, vs1
>> -; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    vmrglb v2, v0, v7
>> -; CHECK-P8-NEXT:    xxswapd v0, vs0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v7, vs2
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    vmrglb v5, v8, v5
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>> -; CHECK-P8-NEXT:    xxswapd v10, vs3
>> -; CHECK-P8-NEXT:    vmrglb v0, v0, v6
>> +; CHECK-P8-NEXT:    vmrghb v5, v5, v7
>> +; CHECK-P8-NEXT:    vmrghb v1, v1, v6
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f4
>> +; CHECK-P8-NEXT:    mtvsrd v7, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f0
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mtvsrd v9, r3
>> +; CHECK-P8-NEXT:    vmrghb v7, v8, v7
>> +; CHECK-P8-NEXT:    vmrghb v6, v6, v9
>>  ; CHECK-P8-NEXT:    vmrglh v3, v4, v3
>> -; CHECK-P8-NEXT:    vmrglb v6, v8, v7
>> -; CHECK-P8-NEXT:    vmrglb v7, v9, v10
>> -; CHECK-P8-NEXT:    vmrglh v2, v2, v1
>> -; CHECK-P8-NEXT:    vmrglh v4, v0, v5
>> -; CHECK-P8-NEXT:    vmrglh v5, v7, v6
>> +; CHECK-P8-NEXT:    vmrglh v2, v5, v2
>> +; CHECK-P8-NEXT:    vmrglh v4, v1, v0
>> +; CHECK-P8-NEXT:    vmrglh v5, v6, v7
>>  ; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>>  ; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>>  ; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>> @@ -1212,114 +1108,98 @@ define <16 x i8> @test16elt_signed(<16 x float>*
>> nocapture readonly) local_unnam
>>  ;
>>  ; CHECK-P9-LABEL: test16elt_signed:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    lxv vs2, 0(r3)
>> +; CHECK-P9-NEXT:    lxv vs3, 0(r3)
>> +; CHECK-P9-NEXT:    xxsldwi vs4, vs3, vs3, 3
>> +; CHECK-P9-NEXT:    xscvspdpn f4, vs4
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    lxv vs0, 48(r3)
>> +; CHECK-P9-NEXT:    lxv vs1, 32(r3)
>> +; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>> +; CHECK-P9-NEXT:    mffprwz r3, f4
>> +; CHECK-P9-NEXT:    xxswapd vs4, vs3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>> +; CHECK-P9-NEXT:    xscvspdpn f4, vs4
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    mffprwz r3, f4
>> +; CHECK-P9-NEXT:    xscvspdpn f4, vs3
>> +; CHECK-P9-NEXT:    xxsldwi vs3, vs3, vs3, 1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>> +; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    mffprwz r3, f4
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    mffprwz r3, f3
>>  ; CHECK-P9-NEXT:    xxsldwi vs3, vs2, vs2, 3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT:    lxv vs0, 48(r3)
>> -; CHECK-P9-NEXT:    lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT:    lxv vs4, 16(r3)
>> +; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs3
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>>  ; CHECK-P9-NEXT:    xscvspdpn f3, vs2
>>  ; CHECK-P9-NEXT:    xxsldwi vs2, vs2, vs2, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    vmrghb v3, v4, v3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    xxsldwi vs2, vs4, vs4, 3
>> -; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    vmrglb v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    xxswapd vs2, vs4
>> -; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    xscvspdpn f2, vs4
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    xxsldwi vs2, vs4, vs4, 1
>> -; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs2
>>  ; CHECK-P9-NEXT:    xxsldwi vs2, vs1, vs1, 3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>>  ; CHECK-P9-NEXT:    xscvspdpn f2, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs1, vs1, 1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghb v3, v4, v3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>>  ; CHECK-P9-NEXT:    xxsldwi vs1, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    vmrglb v3, v4, v3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>>  ; CHECK-P9-NEXT:    xscvspdpn f1, vs0
>>  ; CHECK-P9-NEXT:    xxsldwi vs0, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvspdpn f0, vs0
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghb v4, v5, v4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v4, v5, v4
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>> -; CHECK-P9-NEXT:    xxswapd v0, vs0
>> -; CHECK-P9-NEXT:    vmrglb v5, v5, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r3
>> +; CHECK-P9-NEXT:    vmrghb v5, v5, v0
>>  ; CHECK-P9-NEXT:    vmrglh v4, v5, v4
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld v2, v3, v2
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> index c7d66ae784a0..dbc2774fed8c 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i16_elts.ll
>> @@ -16,12 +16,10 @@ define i32 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    vmrglh v2, v2, v3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -30,15 +28,13 @@ define i32 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9:       # %bb.0: # %entry
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    li r3, 0
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P9-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -77,18 +73,14 @@ define i64 @test4elt(<4 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    xxswapd v2, vs2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    xxswapd v4, vs3
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    xxswapd v5, vs1
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglh v3, v5, v4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>> +; CHECK-P8-NEXT:    vmrghh v2, v4, v2
>> +; CHECK-P8-NEXT:    vmrghh v3, v5, v3
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>> @@ -102,22 +94,18 @@ define i64 @test4elt(<4 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    lxv vs0, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -176,36 +164,28 @@ define <8 x i16> @test8elt(<8 x double>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P8-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    mtfprd f4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f6
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs4
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f7
>> -; CHECK-P8-NEXT:    mtfprd f6, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mtfprd f7, r4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs6
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v1, vs7
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v5, vs0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    xxswapd v6, vs2
>> -; CHECK-P8-NEXT:    vmrglh v2, v5, v2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs0
>> -; CHECK-P8-NEXT:    vmrglh v3, v0, v3
>> -; CHECK-P8-NEXT:    vmrglh v4, v6, v4
>> -; CHECK-P8-NEXT:    vmrglh v5, v5, v1
>> +; CHECK-P8-NEXT:    vmrghh v2, v0, v2
>> +; CHECK-P8-NEXT:    vmrghh v3, v1, v3
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    vmrghh v4, v0, v4
>> +; CHECK-P8-NEXT:    vmrghh v5, v1, v5
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>>  ; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>> @@ -217,47 +197,39 @@ define <8 x i16> @test8elt(<8 x double>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>>  ; CHECK-P9-NEXT:    lxv vs0, 48(r3)
>>  ; CHECK-P9-NEXT:    lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs4
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    mffprwz r3, f1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld v2, v3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -321,209 +293,177 @@ entry:
>>  define void @test16elt(<16 x i16>* noalias nocapture sret %agg.result,
>> <16 x double>* nocapture readonly) local_unnamed_addr #3 {
>>  ; CHECK-P8-LABEL: test16elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    lxvd2x vs0, 0, r4
>>  ; CHECK-P8-NEXT:    li r5, 16
>> +; CHECK-P8-NEXT:    lxvd2x vs0, 0, r4
>>  ; CHECK-P8-NEXT:    li r6, 32
>> +; CHECK-P8-NEXT:    li r7, 48
>>  ; CHECK-P8-NEXT:    lxvd2x vs1, r4, r5
>>  ; CHECK-P8-NEXT:    lxvd2x vs2, r4, r6
>> -; CHECK-P8-NEXT:    li r6, 48
>> -; CHECK-P8-NEXT:    lxvd2x vs3, r4, r6
>>  ; CHECK-P8-NEXT:    li r6, 64
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f0
>> +; CHECK-P8-NEXT:    lxvd2x vs3, r4, r7
>>  ; CHECK-P8-NEXT:    lxvd2x vs5, r4, r6
>> -; CHECK-P8-NEXT:    li r6, 80
>> +; CHECK-P8-NEXT:    li r7, 80
>> +; CHECK-P8-NEXT:    li r6, 96
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f0
>> +; CHECK-P8-NEXT:    lxvd2x vs7, r4, r7
>> +; CHECK-P8-NEXT:    lxvd2x vs10, r4, r6
>> +; CHECK-P8-NEXT:    li r6, 112
>>  ; CHECK-P8-NEXT:    xxswapd vs0, vs0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f6, f1
>> -; CHECK-P8-NEXT:    lxvd2x vs7, r4, r6
>> -; CHECK-P8-NEXT:    li r6, 96
>>  ; CHECK-P8-NEXT:    xxswapd vs1, vs1
>>  ; CHECK-P8-NEXT:    xscvdpsxws f8, f2
>> -; CHECK-P8-NEXT:    lxvd2x vs9, r4, r6
>> -; CHECK-P8-NEXT:    li r6, 112
>>  ; CHECK-P8-NEXT:    xxswapd vs2, vs2
>> -; CHECK-P8-NEXT:    xscvdpsxws f10, f3
>> -; CHECK-P8-NEXT:    lxvd2x vs11, r4, r6
>> +; CHECK-P8-NEXT:    xscvdpsxws f9, f3
>>  ; CHECK-P8-NEXT:    xxswapd vs3, vs3
>> -; CHECK-P8-NEXT:    xscvdpsxws f12, f5
>> +; CHECK-P8-NEXT:    xscvdpsxws f11, f5
>>  ; CHECK-P8-NEXT:    xxswapd vs5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f13, f7
>> +; CHECK-P8-NEXT:    xscvdpsxws f12, f7
>>  ; CHECK-P8-NEXT:    xxswapd vs7, vs7
>> -; CHECK-P8-NEXT:    xscvdpsxws v2, f9
>> -; CHECK-P8-NEXT:    xxswapd vs9, vs9
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws v3, f11
>> -; CHECK-P8-NEXT:    xxswapd vs11, vs11
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    mffprwz r6, f6
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> +; CHECK-P8-NEXT:    mffprwz r7, f4
>> +; CHECK-P8-NEXT:    lxvd2x vs4, r4, r6
>> +; CHECK-P8-NEXT:    mffprwz r4, f6
>> +; CHECK-P8-NEXT:    xscvdpsxws f13, f10
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f8
>> +; CHECK-P8-NEXT:    xscvdpsxws f6, f4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f9
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f11
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xxswapd v4, vs4
>> -; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT:    mtfprd f6, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f10
>> -; CHECK-P8-NEXT:    mtfprd f8, r4
>> -; CHECK-P8-NEXT:    xxswapd v5, vs6
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f12
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    xxswapd v0, vs8
>> -; CHECK-P8-NEXT:    mtfprd f10, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f13
>> -; CHECK-P8-NEXT:    mtfprd f12, r4
>> -; CHECK-P8-NEXT:    xxswapd v1, vs10
>> -; CHECK-P8-NEXT:    mfvsrwz r4, v2
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f13
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xxswapd v6, vs12
>> -; CHECK-P8-NEXT:    xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT:    mtfprd f13, r6
>> -; CHECK-P8-NEXT:    mfvsrwz r6, v3
>> -; CHECK-P8-NEXT:    mtvsrd v2, r4
>> -; CHECK-P8-NEXT:    xxswapd v7, vs13
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f6
>> +; CHECK-P8-NEXT:    xxswapd vs6, vs10
>> +; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> +; CHECK-P8-NEXT:    mtvsrd v7, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    xxswapd vs0, vs4
>> +; CHECK-P8-NEXT:    mtvsrd v2, r7
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>>  ; CHECK-P8-NEXT:    xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT:    xxswapd v2, v2
>> -; CHECK-P8-NEXT:    xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT:    mtvsrd v3, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f1
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    xxswapd v3, v3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    mtfprd f1, r6
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f6
>> +; CHECK-P8-NEXT:    vmrghh v2, v8, v2
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghh v3, v9, v3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    xxswapd v9, vs1
>> -; CHECK-P8-NEXT:    mffprwz r6, f3
>> -; CHECK-P8-NEXT:    xxswapd v10, vs2
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f9
>> -; CHECK-P8-NEXT:    mtfprd f3, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f7
>> -; CHECK-P8-NEXT:    mtfprd f9, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f11
>> -; CHECK-P8-NEXT:    vmrglh v4, v8, v4
>> -; CHECK-P8-NEXT:    xxswapd v8, vs3
>> -; CHECK-P8-NEXT:    vmrglh v5, v9, v5
>> -; CHECK-P8-NEXT:    xxswapd v9, vs5
>> -; CHECK-P8-NEXT:    mtfprd f7, r6
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    vmrglh v0, v10, v0
>> -; CHECK-P8-NEXT:    xxswapd v10, vs7
>> -; CHECK-P8-NEXT:    vmrglh v1, v8, v1
>> -; CHECK-P8-NEXT:    xxswapd v8, vs9
>> -; CHECK-P8-NEXT:    vmrglh v6, v9, v6
>> -; CHECK-P8-NEXT:    xxswapd v9, vs0
>> -; CHECK-P8-NEXT:    vmrglh v7, v10, v7
>> -; CHECK-P8-NEXT:    vmrglh v2, v8, v2
>> -; CHECK-P8-NEXT:    vmrglh v3, v9, v3
>> -; CHECK-P8-NEXT:    vmrglw v4, v5, v4
>> -; CHECK-P8-NEXT:    vmrglw v5, v1, v0
>> -; CHECK-P8-NEXT:    vmrglw v0, v7, v6
>> +; CHECK-P8-NEXT:    vmrghh v4, v8, v4
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f7
>> +; CHECK-P8-NEXT:    vmrghh v5, v9, v5
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f4
>> +; CHECK-P8-NEXT:    vmrghh v0, v8, v0
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    vmrghh v1, v9, v1
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>> +; CHECK-P8-NEXT:    vmrghh v6, v8, v6
>> +; CHECK-P8-NEXT:    vmrghh v7, v9, v7
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>> +; CHECK-P8-NEXT:    vmrglw v4, v1, v0
>> +; CHECK-P8-NEXT:    vmrglw v5, v7, v6
>> +; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>> +; CHECK-P8-NEXT:    stvx v2, 0, r3
>>  ; CHECK-P8-NEXT:    xxmrgld v3, v5, v4
>> -; CHECK-P8-NEXT:    stvx v3, 0, r3
>> -; CHECK-P8-NEXT:    xxmrgld v2, v2, v0
>> -; CHECK-P8-NEXT:    stvx v2, r3, r5
>> +; CHECK-P8-NEXT:    stvx v3, r3, r5
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: test16elt:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    lxv vs4, 0(r4)
>> -; CHECK-P9-NEXT:    lxv vs3, 16(r4)
>> -; CHECK-P9-NEXT:    lxv vs2, 32(r4)
>> -; CHECK-P9-NEXT:    xscvdpsxws f5, f4
>> -; CHECK-P9-NEXT:    lxv vs1, 48(r4)
>> -; CHECK-P9-NEXT:    xscvdpsxws f6, f3
>> -; CHECK-P9-NEXT:    lxv vs0, 64(r4)
>> -; CHECK-P9-NEXT:    xscvdpsxws f7, f2
>> -; CHECK-P9-NEXT:    xscvdpsxws f8, f1
>> -; CHECK-P9-NEXT:    xxswapd vs4, vs4
>> -; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P9-NEXT:    mffprwz r5, f5
>> -; CHECK-P9-NEXT:    xscvdpsxws f9, f0
>> +; CHECK-P9-NEXT:    lxv vs3, 0(r4)
>> +; CHECK-P9-NEXT:    lxv vs2, 16(r4)
>> +; CHECK-P9-NEXT:    lxv vs1, 32(r4)
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>> +; CHECK-P9-NEXT:    lxv vs0, 48(r4)
>> +; CHECK-P9-NEXT:    xscvdpsxws f5, f2
>> +; CHECK-P9-NEXT:    xscvdpsxws f6, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs3
>> +; CHECK-P9-NEXT:    xscvdpsxws f7, f0
>> +; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    mffprwz r5, f4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT:    mtfprd f5, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f6
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    mtfprd f6, r5
>> +; CHECK-P9-NEXT:    mtvsrd v2, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f5
>> +; CHECK-P9-NEXT:    mtvsrd v3, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f6
>> +; CHECK-P9-NEXT:    mtvsrd v4, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f7
>> -; CHECK-P9-NEXT:    mtfprd f7, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f8
>> -; CHECK-P9-NEXT:    mtfprd f8, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f9
>> -; CHECK-P9-NEXT:    mtfprd f9, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f3
>> +; CHECK-P9-NEXT:    lxv vs3, 64(r4)
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    xxswapd v2, vs5
>> -; CHECK-P9-NEXT:    xxswapd v5, vs8
>> -; CHECK-P9-NEXT:    xxswapd v0, vs9
>> -; CHECK-P9-NEXT:    mtfprd f3, r5
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r5
>> -; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> -; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT:    xxswapd v1, vs2
>>  ; CHECK-P9-NEXT:    lxv vs2, 80(r4)
>> -; CHECK-P9-NEXT:    xxswapd v3, vs4
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs6
>> -; CHECK-P9-NEXT:    xxswapd v4, vs3
>> -; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>> -; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    vmrghh v2, v2, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f1
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs7
>> -; CHECK-P9-NEXT:    mtfprd f1, r5
>> +; CHECK-P9-NEXT:    lxv vs1, 96(r4)
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>> +; CHECK-P9-NEXT:    xxswapd vs3, vs3
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v1
>> -; CHECK-P9-NEXT:    xxswapd v1, vs1
>> -; CHECK-P9-NEXT:    mtfprd f0, r5
>> -; CHECK-P9-NEXT:    vmrglh v5, v5, v1
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    xxswapd v1, vs0
>>  ; CHECK-P9-NEXT:    lxv vs0, 112(r4)
>> -; CHECK-P9-NEXT:    lxv vs1, 96(r4)
>> +; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>> +; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghh v5, v5, v0
>> +; CHECK-P9-NEXT:    mffprwz r4, f4
>> +; CHECK-P9-NEXT:    vmrglw v4, v5, v4
>> +; CHECK-P9-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r4
>> +; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>> +; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    xxmrgld vs4, v4, v2
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>> +; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT:    stxv vs4, 0(r3)
>> +; CHECK-P9-NEXT:    mffprwz r4, f3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f2
>> -; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT:    vmrglw v3, v5, v4
>> -; CHECK-P9-NEXT:    xxmrgld vs4, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v2, vs3
>> -; CHECK-P9-NEXT:    vmrglh v0, v0, v1
>> -; CHECK-P9-NEXT:    mtfprd f2, r4
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r4
>> +; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r4
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r4
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld vs0, v3, v2
>>  ; CHECK-P9-NEXT:    stxv vs0, 16(r3)
>> -; CHECK-P9-NEXT:    stxv vs4, 0(r3)
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>>  ; CHECK-BE-LABEL: test16elt:
>> @@ -639,12 +579,10 @@ define i32 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    vmrglh v2, v2, v3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -653,15 +591,13 @@ define i32 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9:       # %bb.0: # %entry
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    li r3, 0
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P9-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -700,18 +636,14 @@ define i64 @test4elt_signed(<4 x double>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    xxswapd v2, vs2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    xxswapd v4, vs3
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    xxswapd v5, vs1
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglh v3, v5, v4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>> +; CHECK-P8-NEXT:    vmrghh v2, v4, v2
>> +; CHECK-P8-NEXT:    vmrghh v3, v5, v3
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>> @@ -725,22 +657,18 @@ define i64 @test4elt_signed(<4 x double>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    lxv vs0, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -799,36 +727,28 @@ define <8 x i16> @test8elt_signed(<8 x double>*
>> nocapture readonly) local_unname
>>  ; CHECK-P8-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    mtfprd f4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f6
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs4
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f7
>> -; CHECK-P8-NEXT:    mtfprd f6, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mtfprd f7, r4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs6
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v1, vs7
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v5, vs0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    xxswapd v6, vs2
>> -; CHECK-P8-NEXT:    vmrglh v2, v5, v2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs0
>> -; CHECK-P8-NEXT:    vmrglh v3, v0, v3
>> -; CHECK-P8-NEXT:    vmrglh v4, v6, v4
>> -; CHECK-P8-NEXT:    vmrglh v5, v5, v1
>> +; CHECK-P8-NEXT:    vmrghh v2, v0, v2
>> +; CHECK-P8-NEXT:    vmrghh v3, v1, v3
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    vmrghh v4, v0, v4
>> +; CHECK-P8-NEXT:    vmrghh v5, v1, v5
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>>  ; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>> @@ -840,47 +760,39 @@ define <8 x i16> @test8elt_signed(<8 x double>*
>> nocapture readonly) local_unname
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>>  ; CHECK-P9-NEXT:    lxv vs0, 48(r3)
>>  ; CHECK-P9-NEXT:    lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs4
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    mffprwz r3, f1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld v2, v3, v2
>>  ; CHECK-P9-NEXT:    blr
>> @@ -944,209 +856,177 @@ entry:
>>  define void @test16elt_signed(<16 x i16>* noalias nocapture sret
>> %agg.result, <16 x double>* nocapture readonly) local_unnamed_addr #3 {
>>  ; CHECK-P8-LABEL: test16elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    lxvd2x vs0, 0, r4
>>  ; CHECK-P8-NEXT:    li r5, 16
>> +; CHECK-P8-NEXT:    lxvd2x vs0, 0, r4
>>  ; CHECK-P8-NEXT:    li r6, 32
>> +; CHECK-P8-NEXT:    li r7, 48
>>  ; CHECK-P8-NEXT:    lxvd2x vs1, r4, r5
>>  ; CHECK-P8-NEXT:    lxvd2x vs2, r4, r6
>> -; CHECK-P8-NEXT:    li r6, 48
>> -; CHECK-P8-NEXT:    lxvd2x vs3, r4, r6
>>  ; CHECK-P8-NEXT:    li r6, 64
>> -; CHECK-P8-NEXT:    xscvdpsxws f4, f0
>> +; CHECK-P8-NEXT:    lxvd2x vs3, r4, r7
>>  ; CHECK-P8-NEXT:    lxvd2x vs5, r4, r6
>> -; CHECK-P8-NEXT:    li r6, 80
>> +; CHECK-P8-NEXT:    li r7, 80
>> +; CHECK-P8-NEXT:    li r6, 96
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f0
>> +; CHECK-P8-NEXT:    lxvd2x vs7, r4, r7
>> +; CHECK-P8-NEXT:    lxvd2x vs10, r4, r6
>> +; CHECK-P8-NEXT:    li r6, 112
>>  ; CHECK-P8-NEXT:    xxswapd vs0, vs0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f6, f1
>> -; CHECK-P8-NEXT:    lxvd2x vs7, r4, r6
>> -; CHECK-P8-NEXT:    li r6, 96
>>  ; CHECK-P8-NEXT:    xxswapd vs1, vs1
>>  ; CHECK-P8-NEXT:    xscvdpsxws f8, f2
>> -; CHECK-P8-NEXT:    lxvd2x vs9, r4, r6
>> -; CHECK-P8-NEXT:    li r6, 112
>>  ; CHECK-P8-NEXT:    xxswapd vs2, vs2
>> -; CHECK-P8-NEXT:    xscvdpsxws f10, f3
>> -; CHECK-P8-NEXT:    lxvd2x vs11, r4, r6
>> +; CHECK-P8-NEXT:    xscvdpsxws f9, f3
>>  ; CHECK-P8-NEXT:    xxswapd vs3, vs3
>> -; CHECK-P8-NEXT:    xscvdpsxws f12, f5
>> +; CHECK-P8-NEXT:    xscvdpsxws f11, f5
>>  ; CHECK-P8-NEXT:    xxswapd vs5, vs5
>> -; CHECK-P8-NEXT:    xscvdpsxws f13, f7
>> +; CHECK-P8-NEXT:    xscvdpsxws f12, f7
>>  ; CHECK-P8-NEXT:    xxswapd vs7, vs7
>> -; CHECK-P8-NEXT:    xscvdpsxws v2, f9
>> -; CHECK-P8-NEXT:    xxswapd vs9, vs9
>> -; CHECK-P8-NEXT:    mffprwz r4, f4
>> -; CHECK-P8-NEXT:    xscvdpsxws v3, f11
>> -; CHECK-P8-NEXT:    xxswapd vs11, vs11
>> -; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    mffprwz r6, f6
>> -; CHECK-P8-NEXT:    mtfprd f4, r4
>> +; CHECK-P8-NEXT:    mffprwz r7, f4
>> +; CHECK-P8-NEXT:    lxvd2x vs4, r4, r6
>> +; CHECK-P8-NEXT:    mffprwz r4, f6
>> +; CHECK-P8-NEXT:    xscvdpsxws f13, f10
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f8
>> +; CHECK-P8-NEXT:    xscvdpsxws f6, f4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f9
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f11
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xxswapd v4, vs4
>> -; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT:    mtfprd f6, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f10
>> -; CHECK-P8-NEXT:    mtfprd f8, r4
>> -; CHECK-P8-NEXT:    xxswapd v5, vs6
>> +; CHECK-P8-NEXT:    mtvsrd v0, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f12
>> -; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    xxswapd v0, vs8
>> -; CHECK-P8-NEXT:    mtfprd f10, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f13
>> -; CHECK-P8-NEXT:    mtfprd f12, r4
>> -; CHECK-P8-NEXT:    xxswapd v1, vs10
>> -; CHECK-P8-NEXT:    mfvsrwz r4, v2
>> +; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f13
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xxswapd v6, vs12
>> -; CHECK-P8-NEXT:    xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT:    mtfprd f13, r6
>> -; CHECK-P8-NEXT:    mfvsrwz r6, v3
>> -; CHECK-P8-NEXT:    mtvsrd v2, r4
>> -; CHECK-P8-NEXT:    xxswapd v7, vs13
>> +; CHECK-P8-NEXT:    mtvsrd v6, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f6
>> +; CHECK-P8-NEXT:    xxswapd vs6, vs10
>> +; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> +; CHECK-P8-NEXT:    mtvsrd v7, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    xxswapd vs0, vs4
>> +; CHECK-P8-NEXT:    mtvsrd v2, r7
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f1
>>  ; CHECK-P8-NEXT:    xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT:    xxswapd v2, v2
>> -; CHECK-P8-NEXT:    xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT:    mtvsrd v3, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f1
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    xxswapd v3, v3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f2
>> -; CHECK-P8-NEXT:    mtfprd f1, r6
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>> -; CHECK-P8-NEXT:    mtfprd f2, r4
>> +; CHECK-P8-NEXT:    xscvdpsxws f4, f6
>> +; CHECK-P8-NEXT:    vmrghh v2, v8, v2
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f3
>> +; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P8-NEXT:    vmrghh v3, v9, v3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    xxswapd v9, vs1
>> -; CHECK-P8-NEXT:    mffprwz r6, f3
>> -; CHECK-P8-NEXT:    xxswapd v10, vs2
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f9
>> -; CHECK-P8-NEXT:    mtfprd f3, r6
>> -; CHECK-P8-NEXT:    mffprwz r6, f7
>> -; CHECK-P8-NEXT:    mtfprd f9, r4
>> -; CHECK-P8-NEXT:    mffprwz r4, f11
>> -; CHECK-P8-NEXT:    vmrglh v4, v8, v4
>> -; CHECK-P8-NEXT:    xxswapd v8, vs3
>> -; CHECK-P8-NEXT:    vmrglh v5, v9, v5
>> -; CHECK-P8-NEXT:    xxswapd v9, vs5
>> -; CHECK-P8-NEXT:    mtfprd f7, r6
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    vmrglh v0, v10, v0
>> -; CHECK-P8-NEXT:    xxswapd v10, vs7
>> -; CHECK-P8-NEXT:    vmrglh v1, v8, v1
>> -; CHECK-P8-NEXT:    xxswapd v8, vs9
>> -; CHECK-P8-NEXT:    vmrglh v6, v9, v6
>> -; CHECK-P8-NEXT:    xxswapd v9, vs0
>> -; CHECK-P8-NEXT:    vmrglh v7, v10, v7
>> -; CHECK-P8-NEXT:    vmrglh v2, v8, v2
>> -; CHECK-P8-NEXT:    vmrglh v3, v9, v3
>> -; CHECK-P8-NEXT:    vmrglw v4, v5, v4
>> -; CHECK-P8-NEXT:    vmrglw v5, v1, v0
>> -; CHECK-P8-NEXT:    vmrglw v0, v7, v6
>> +; CHECK-P8-NEXT:    vmrghh v4, v8, v4
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f7
>> +; CHECK-P8-NEXT:    vmrghh v5, v9, v5
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f4
>> +; CHECK-P8-NEXT:    vmrghh v0, v8, v0
>> +; CHECK-P8-NEXT:    mtvsrd v8, r4
>> +; CHECK-P8-NEXT:    mffprwz r4, f0
>> +; CHECK-P8-NEXT:    vmrghh v1, v9, v1
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>> +; CHECK-P8-NEXT:    vmrghh v6, v8, v6
>> +; CHECK-P8-NEXT:    vmrghh v7, v9, v7
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>> +; CHECK-P8-NEXT:    vmrglw v4, v1, v0
>> +; CHECK-P8-NEXT:    vmrglw v5, v7, v6
>> +; CHECK-P8-NEXT:    xxmrgld v2, v3, v2
>> +; CHECK-P8-NEXT:    stvx v2, 0, r3
>>  ; CHECK-P8-NEXT:    xxmrgld v3, v5, v4
>> -; CHECK-P8-NEXT:    stvx v3, 0, r3
>> -; CHECK-P8-NEXT:    xxmrgld v2, v2, v0
>> -; CHECK-P8-NEXT:    stvx v2, r3, r5
>> +; CHECK-P8-NEXT:    stvx v3, r3, r5
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: test16elt_signed:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    lxv vs4, 0(r4)
>> -; CHECK-P9-NEXT:    lxv vs3, 16(r4)
>> -; CHECK-P9-NEXT:    lxv vs2, 32(r4)
>> -; CHECK-P9-NEXT:    xscvdpsxws f5, f4
>> -; CHECK-P9-NEXT:    lxv vs1, 48(r4)
>> -; CHECK-P9-NEXT:    xscvdpsxws f6, f3
>> -; CHECK-P9-NEXT:    lxv vs0, 64(r4)
>> -; CHECK-P9-NEXT:    xscvdpsxws f7, f2
>> -; CHECK-P9-NEXT:    xscvdpsxws f8, f1
>> -; CHECK-P9-NEXT:    xxswapd vs4, vs4
>> -; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> -; CHECK-P9-NEXT:    mffprwz r5, f5
>> -; CHECK-P9-NEXT:    xscvdpsxws f9, f0
>> +; CHECK-P9-NEXT:    lxv vs3, 0(r4)
>> +; CHECK-P9-NEXT:    lxv vs2, 16(r4)
>> +; CHECK-P9-NEXT:    lxv vs1, 32(r4)
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>> +; CHECK-P9-NEXT:    lxv vs0, 48(r4)
>> +; CHECK-P9-NEXT:    xscvdpsxws f5, f2
>> +; CHECK-P9-NEXT:    xscvdpsxws f6, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs3
>> +; CHECK-P9-NEXT:    xscvdpsxws f7, f0
>> +; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    mffprwz r5, f4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT:    mtfprd f5, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f6
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    mtfprd f6, r5
>> +; CHECK-P9-NEXT:    mtvsrd v2, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f5
>> +; CHECK-P9-NEXT:    mtvsrd v3, r5
>> +; CHECK-P9-NEXT:    mffprwz r5, f6
>> +; CHECK-P9-NEXT:    mtvsrd v4, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f7
>> -; CHECK-P9-NEXT:    mtfprd f7, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f8
>> -; CHECK-P9-NEXT:    mtfprd f8, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f9
>> -; CHECK-P9-NEXT:    mtfprd f9, r5
>> -; CHECK-P9-NEXT:    mffprwz r5, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f3
>> +; CHECK-P9-NEXT:    lxv vs3, 64(r4)
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    xxswapd v2, vs5
>> -; CHECK-P9-NEXT:    xxswapd v5, vs8
>> -; CHECK-P9-NEXT:    xxswapd v0, vs9
>> -; CHECK-P9-NEXT:    mtfprd f3, r5
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r5
>> -; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> -; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P9-NEXT:    xxswapd v1, vs2
>>  ; CHECK-P9-NEXT:    lxv vs2, 80(r4)
>> -; CHECK-P9-NEXT:    xxswapd v3, vs4
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs6
>> -; CHECK-P9-NEXT:    xxswapd v4, vs3
>> -; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>> -; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    vmrghh v2, v2, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f1
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs7
>> -; CHECK-P9-NEXT:    mtfprd f1, r5
>> +; CHECK-P9-NEXT:    lxv vs1, 96(r4)
>> +; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>> +; CHECK-P9-NEXT:    xxswapd vs3, vs3
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>>  ; CHECK-P9-NEXT:    mffprwz r5, f0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v1
>> -; CHECK-P9-NEXT:    xxswapd v1, vs1
>> -; CHECK-P9-NEXT:    mtfprd f0, r5
>> -; CHECK-P9-NEXT:    vmrglh v5, v5, v1
>> -; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P9-NEXT:    xxswapd v1, vs0
>>  ; CHECK-P9-NEXT:    lxv vs0, 112(r4)
>> -; CHECK-P9-NEXT:    lxv vs1, 96(r4)
>> +; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r5
>> +; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghh v5, v5, v0
>> +; CHECK-P9-NEXT:    mffprwz r4, f4
>> +; CHECK-P9-NEXT:    vmrglw v4, v5, v4
>> +; CHECK-P9-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r4
>> +; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>> +; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    xxmrgld vs4, v4, v2
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>> +; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT:    stxv vs4, 0(r3)
>> +; CHECK-P9-NEXT:    mffprwz r4, f3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f2
>> -; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT:    vmrglw v3, v5, v4
>> -; CHECK-P9-NEXT:    xxmrgld vs4, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v2, vs3
>> -; CHECK-P9-NEXT:    vmrglh v0, v0, v1
>> -; CHECK-P9-NEXT:    mtfprd f2, r4
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r4
>> +; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghh v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r4
>> +; CHECK-P9-NEXT:    mtvsrd v4, r4
>>  ; CHECK-P9-NEXT:    mffprwz r4, f0
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    vmrglh v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v0
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglh v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r4
>> +; CHECK-P9-NEXT:    vmrghh v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld vs0, v3, v2
>>  ; CHECK-P9-NEXT:    stxv vs0, 16(r3)
>> -; CHECK-P9-NEXT:    stxv vs4, 0(r3)
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>>  ; CHECK-BE-LABEL: test16elt_signed:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> index 369fb3f10100..173ced964ad6 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i32_elts.ll
>> @@ -16,12 +16,10 @@ define i64 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpuxws f1, v2
>>  ; CHECK-P8-NEXT:    xscvdpuxws f0, f0
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    mtvsrwz v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>> +; CHECK-P8-NEXT:    mtvsrwz v3, r4
>> +; CHECK-P8-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -35,7 +33,7 @@ define i64 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpuxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>>  ; CHECK-P9-NEXT:    mtvsrws v2, r3
>> -; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -310,12 +308,10 @@ define i64 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    mtvsrwz v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    vmrglw v2, v2, v3
>> +; CHECK-P8-NEXT:    mtvsrwz v3, r4
>> +; CHECK-P8-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -329,7 +325,7 @@ define i64 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>>  ; CHECK-P9-NEXT:    mtvsrws v2, r3
>> -; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> index fb13d1bd71f5..fd28d9a1afdc 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_fp64_to_i8_elts.ll
>> @@ -16,12 +16,10 @@ define i16 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    vmrglb v2, v2, v3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    clrldi r3, r3, 48
>> @@ -33,15 +31,13 @@ define i16 @test2elt(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9:       # %bb.0: # %entry
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    addi r3, r1, -2
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglb v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P9-NEXT:    vsldoi v2, v2, v2, 8
>>  ; CHECK-P9-NEXT:    stxsihx v2, 0, r3
>>  ; CHECK-P9-NEXT:    lhz r3, -2(r1)
>> @@ -84,18 +80,14 @@ define i32 @test4elt(<4 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    xxswapd v2, vs2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    xxswapd v4, vs3
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    xxswapd v5, vs1
>> -; CHECK-P8-NEXT:    vmrglb v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglb v3, v5, v4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>> +; CHECK-P8-NEXT:    vmrghb v2, v4, v2
>> +; CHECK-P8-NEXT:    vmrghb v3, v5, v3
>>  ; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> @@ -109,24 +101,20 @@ define i32 @test4elt(<4 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    lxv vs0, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    li r3, 0
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>> +; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P9-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -185,36 +173,28 @@ define i64 @test8elt(<8 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>>  ; CHECK-P8-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    mtfprd f4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f6
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs4
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f7
>> -; CHECK-P8-NEXT:    mtfprd f6, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mtfprd f7, r4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs6
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v1, vs7
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v5, vs0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    xxswapd v6, vs2
>> -; CHECK-P8-NEXT:    vmrglb v2, v5, v2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs0
>> -; CHECK-P8-NEXT:    vmrglb v3, v0, v3
>> -; CHECK-P8-NEXT:    vmrglb v4, v6, v4
>> -; CHECK-P8-NEXT:    vmrglb v5, v5, v1
>> +; CHECK-P8-NEXT:    vmrghb v2, v0, v2
>> +; CHECK-P8-NEXT:    vmrghb v3, v1, v3
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    vmrghb v4, v0, v4
>> +; CHECK-P8-NEXT:    vmrghb v5, v1, v5
>>  ; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P8-NEXT:    vmrglh v3, v5, v4
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> @@ -228,47 +208,39 @@ define i64 @test8elt(<8 x double>* nocapture
>> readonly) local_unnamed_addr #1 {
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>>  ; CHECK-P9-NEXT:    lxv vs0, 48(r3)
>>  ; CHECK-P9-NEXT:    lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs4
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    mffprwz r3, f1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>> @@ -364,79 +336,63 @@ define <16 x i8> @test16elt(<16 x double>*
>> nocapture readonly) local_unnamed_add
>>  ; CHECK-P8-NEXT:    xxswapd vs7, vs7
>>  ; CHECK-P8-NEXT:    xscvdpsxws v2, f9
>>  ; CHECK-P8-NEXT:    xxswapd vs9, vs9
>> -; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    xscvdpsxws v3, f11
>>  ; CHECK-P8-NEXT:    xxswapd vs11, vs11
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f6
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    mtfprd f4, r3
>> -; CHECK-P8-NEXT:    mffprwz r3, f8
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xxswapd v4, vs4
>> -; CHECK-P8-NEXT:    mtfprd f6, r4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f8
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f10
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs6
>> -; CHECK-P8-NEXT:    mtfprd f8, r3
>> -; CHECK-P8-NEXT:    mffprwz r3, f12
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs8
>> -; CHECK-P8-NEXT:    mtfprd f10, r4
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f12
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f13
>>  ; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    xxswapd v1, vs10
>> -; CHECK-P8-NEXT:    mtfprd f12, r3
>> -; CHECK-P8-NEXT:    mfvsrwz r3, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT:    xxswapd v6, vs12
>> -; CHECK-P8-NEXT:    mtfprd f13, r4
>> +; CHECK-P8-NEXT:    mtvsrd v6, r3
>> +; CHECK-P8-NEXT:    mfvsrwz r3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P8-NEXT:    mfvsrwz r4, v3
>> -; CHECK-P8-NEXT:    mtvsrd v2, r3
>> -; CHECK-P8-NEXT:    xxswapd v7, vs13
>> -; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT:    xxswapd v2, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>> +; CHECK-P8-NEXT:    mtvsrd v7, r4
>> +; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, v3
>> +; CHECK-P8-NEXT:    mtvsrd v8, r3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    xxswapd v9, vs1
>> +; CHECK-P8-NEXT:    vmrghb v4, v8, v4
>> +; CHECK-P8-NEXT:    vmrghb v5, v9, v5
>> +; CHECK-P8-NEXT:    mtvsrd v8, r3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f5
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    xxswapd v10, vs2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f7
>> -; CHECK-P8-NEXT:    mtfprd f5, r3
>> +; CHECK-P8-NEXT:    vmrghb v0, v8, v0
>> +; CHECK-P8-NEXT:    vmrghb v1, v9, v1
>> +; CHECK-P8-NEXT:    mtvsrd v8, r3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f9
>> -; CHECK-P8-NEXT:    mtfprd f7, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f11
>> -; CHECK-P8-NEXT:    vmrglb v4, v8, v4
>> -; CHECK-P8-NEXT:    xxswapd v8, vs3
>> -; CHECK-P8-NEXT:    vmrglb v5, v9, v5
>> -; CHECK-P8-NEXT:    xxswapd v9, vs5
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    vmrglb v0, v10, v0
>> -; CHECK-P8-NEXT:    xxswapd v10, vs7
>> -; CHECK-P8-NEXT:    vmrglb v1, v8, v1
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>> -; CHECK-P8-NEXT:    vmrglb v6, v9, v6
>> -; CHECK-P8-NEXT:    xxswapd v9, vs1
>> -; CHECK-P8-NEXT:    vmrglb v7, v10, v7
>> -; CHECK-P8-NEXT:    vmrglb v2, v8, v2
>> -; CHECK-P8-NEXT:    vmrglb v3, v9, v3
>> +; CHECK-P8-NEXT:    vmrghb v6, v8, v6
>> +; CHECK-P8-NEXT:    vmrghb v2, v9, v2
>> +; CHECK-P8-NEXT:    mtvsrd v8, r3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>> +; CHECK-P8-NEXT:    vmrghb v3, v8, v3
>> +; CHECK-P8-NEXT:    vmrghb v7, v9, v7
>>  ; CHECK-P8-NEXT:    vmrglh v4, v5, v4
>>  ; CHECK-P8-NEXT:    vmrglh v5, v1, v0
>> -; CHECK-P8-NEXT:    vmrglh v0, v7, v6
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>> -; CHECK-P8-NEXT:    vmrglw v2, v2, v0
>> -; CHECK-P8-NEXT:    xxmrgld v2, v2, v3
>> +; CHECK-P8-NEXT:    vmrglh v2, v2, v6
>> +; CHECK-P8-NEXT:    vmrglh v3, v7, v3
>> +; CHECK-P8-NEXT:    vmrglw v4, v5, v4
>> +; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    xxmrgld v2, v2, v4
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: test16elt:
>> @@ -445,94 +401,78 @@ define <16 x i8> @test16elt(<16 x double>*
>> nocapture readonly) local_unnamed_add
>>  ; CHECK-P9-NEXT:    xscvdpsxws f8, f7
>>  ; CHECK-P9-NEXT:    xxswapd vs7, vs7
>>  ; CHECK-P9-NEXT:    xscvdpsxws f7, f7
>> +; CHECK-P9-NEXT:    lxv vs6, 16(r3)
>>  ; CHECK-P9-NEXT:    lxv vs0, 112(r3)
>>  ; CHECK-P9-NEXT:    lxv vs1, 96(r3)
>>  ; CHECK-P9-NEXT:    lxv vs2, 80(r3)
>>  ; CHECK-P9-NEXT:    lxv vs3, 64(r3)
>>  ; CHECK-P9-NEXT:    lxv vs4, 48(r3)
>>  ; CHECK-P9-NEXT:    lxv vs5, 32(r3)
>> -; CHECK-P9-NEXT:    lxv vs6, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f8
>> -; CHECK-P9-NEXT:    mtfprd f8, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f7
>> -; CHECK-P9-NEXT:    xxswapd v2, vs8
>> -; CHECK-P9-NEXT:    mtfprd f7, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs7
>>  ; CHECK-P9-NEXT:    xscvdpsxws f7, f6
>>  ; CHECK-P9-NEXT:    xxswapd vs6, vs6
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f6, f6
>> +; CHECK-P9-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f7
>> -; CHECK-P9-NEXT:    mtfprd f7, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f6
>> -; CHECK-P9-NEXT:    mtfprd f6, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs6
>>  ; CHECK-P9-NEXT:    xscvdpsxws f6, f5
>>  ; CHECK-P9-NEXT:    xxswapd vs5, vs5
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f5, f5
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f6
>> -; CHECK-P9-NEXT:    mtfprd f6, r3
>> -; CHECK-P9-NEXT:    mffprwz r3, f5
>> -; CHECK-P9-NEXT:    vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs7
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs6
>> -; CHECK-P9-NEXT:    mtfprd f5, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs5
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    mffprwz r3, f5
>>  ; CHECK-P9-NEXT:    xscvdpsxws f5, f4
>>  ; CHECK-P9-NEXT:    xxswapd vs4, vs4
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f5
>> -; CHECK-P9-NEXT:    mtfprd f5, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs5
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r3
>> +; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs4
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs3
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> -; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>> -; CHECK-P9-NEXT:    xxswapd v0, vs0
>> -; CHECK-P9-NEXT:    vmrglb v5, v5, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r3
>> +; CHECK-P9-NEXT:    vmrghb v5, v5, v0
>>  ; CHECK-P9-NEXT:    vmrglh v4, v5, v4
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld v2, v3, v2
>> @@ -649,12 +589,10 @@ define i16 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    mffprwz r3, f1
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r4, f0
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    xxswapd v3, vs1
>> -; CHECK-P8-NEXT:    vmrglb v2, v2, v3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    clrldi r3, r3, 48
>> @@ -666,15 +604,13 @@ define i16 @test2elt_signed(<2 x double> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9:       # %bb.0: # %entry
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, v2
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    addi r3, r1, -2
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglb v2, v3, v2
>> +; CHECK-P9-NEXT:    vmrghb v2, v3, v2
>>  ; CHECK-P9-NEXT:    vsldoi v2, v2, v2, 8
>>  ; CHECK-P9-NEXT:    stxsihx v2, 0, r3
>>  ; CHECK-P9-NEXT:    lhz r3, -2(r1)
>> @@ -717,18 +653,14 @@ define i32 @test4elt_signed(<4 x double>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    xxswapd v2, vs2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    xxswapd v4, vs3
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    xxswapd v5, vs1
>> -; CHECK-P8-NEXT:    vmrglb v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglb v3, v5, v4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>> +; CHECK-P8-NEXT:    vmrghb v2, v4, v2
>> +; CHECK-P8-NEXT:    vmrghb v3, v5, v3
>>  ; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> @@ -742,24 +674,20 @@ define i32 @test4elt_signed(<4 x double>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>>  ; CHECK-P9-NEXT:    lxv vs0, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    xxswapd v2, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs1
>> -; CHECK-P9-NEXT:    xxswapd v4, vs0
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    li r3, 0
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>> +; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P9-NEXT:    vextuwrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -818,36 +746,28 @@ define i64 @test8elt_signed(<8 x double>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P8-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f5
>> -; CHECK-P8-NEXT:    mtfprd f4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    mffprwz r3, f6
>> -; CHECK-P8-NEXT:    mtfprd f5, r4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs4
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f7
>> -; CHECK-P8-NEXT:    mtfprd f6, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, vs5
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f0
>> -; CHECK-P8-NEXT:    mtfprd f7, r4
>> -; CHECK-P8-NEXT:    xxswapd v4, vs6
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v1, vs7
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v5, vs0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs1
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    xxswapd v6, vs2
>> -; CHECK-P8-NEXT:    vmrglb v2, v5, v2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs0
>> -; CHECK-P8-NEXT:    vmrglb v3, v0, v3
>> -; CHECK-P8-NEXT:    vmrglb v4, v6, v4
>> -; CHECK-P8-NEXT:    vmrglb v5, v5, v1
>> +; CHECK-P8-NEXT:    vmrghb v2, v0, v2
>> +; CHECK-P8-NEXT:    vmrghb v3, v1, v3
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>> +; CHECK-P8-NEXT:    vmrghb v4, v0, v4
>> +; CHECK-P8-NEXT:    vmrghb v5, v1, v5
>>  ; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>>  ; CHECK-P8-NEXT:    vmrglh v3, v5, v4
>>  ; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> @@ -861,47 +781,39 @@ define i64 @test8elt_signed(<8 x double>* nocapture
>> readonly) local_unnamed_addr
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> +; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>>  ; CHECK-P9-NEXT:    lxv vs0, 48(r3)
>>  ; CHECK-P9-NEXT:    lxv vs1, 32(r3)
>> -; CHECK-P9-NEXT:    lxv vs2, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs4
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs3
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    mffprwz r3, f1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs1
>> -; CHECK-P9-NEXT:    xxswapd v5, vs0
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>> @@ -997,79 +909,63 @@ define <16 x i8> @test16elt_signed(<16 x double>*
>> nocapture readonly) local_unna
>>  ; CHECK-P8-NEXT:    xxswapd vs7, vs7
>>  ; CHECK-P8-NEXT:    xscvdpsxws v2, f9
>>  ; CHECK-P8-NEXT:    xxswapd vs9, vs9
>> -; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    xscvdpsxws v3, f11
>>  ; CHECK-P8-NEXT:    xxswapd vs11, vs11
>> +; CHECK-P8-NEXT:    mffprwz r3, f4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f6
>>  ; CHECK-P8-NEXT:    xscvdpsxws f0, f0
>> -; CHECK-P8-NEXT:    mtfprd f4, r3
>> -; CHECK-P8-NEXT:    mffprwz r3, f8
>>  ; CHECK-P8-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P8-NEXT:    xxswapd v4, vs4
>> -; CHECK-P8-NEXT:    mtfprd f6, r4
>> +; CHECK-P8-NEXT:    mtvsrd v4, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f8
>> +; CHECK-P8-NEXT:    mtvsrd v5, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f10
>>  ; CHECK-P8-NEXT:    xscvdpsxws f2, f2
>> -; CHECK-P8-NEXT:    xxswapd v5, vs6
>> -; CHECK-P8-NEXT:    mtfprd f8, r3
>> -; CHECK-P8-NEXT:    mffprwz r3, f12
>>  ; CHECK-P8-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P8-NEXT:    xxswapd v0, vs8
>> -; CHECK-P8-NEXT:    mtfprd f10, r4
>> +; CHECK-P8-NEXT:    mtvsrd v0, r3
>> +; CHECK-P8-NEXT:    mffprwz r3, f12
>> +; CHECK-P8-NEXT:    mtvsrd v1, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f13
>>  ; CHECK-P8-NEXT:    xscvdpsxws f5, f5
>> -; CHECK-P8-NEXT:    xxswapd v1, vs10
>> -; CHECK-P8-NEXT:    mtfprd f12, r3
>> -; CHECK-P8-NEXT:    mfvsrwz r3, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f7, f7
>> -; CHECK-P8-NEXT:    xxswapd v6, vs12
>> -; CHECK-P8-NEXT:    mtfprd f13, r4
>> +; CHECK-P8-NEXT:    mtvsrd v6, r3
>> +; CHECK-P8-NEXT:    mfvsrwz r3, v2
>> +; CHECK-P8-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P8-NEXT:    mfvsrwz r4, v3
>> -; CHECK-P8-NEXT:    mtvsrd v2, r3
>> -; CHECK-P8-NEXT:    xxswapd v7, vs13
>> -; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    xscvdpsxws f9, f9
>> -; CHECK-P8-NEXT:    xxswapd v2, v2
>>  ; CHECK-P8-NEXT:    xscvdpsxws f11, f11
>> -; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>> +; CHECK-P8-NEXT:    mtvsrd v7, r4
>> +; CHECK-P8-NEXT:    mffprwz r3, f0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f1
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    xxswapd v3, v3
>> +; CHECK-P8-NEXT:    mtvsrd v8, r3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f2
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>>  ; CHECK-P8-NEXT:    mffprwz r4, f3
>> -; CHECK-P8-NEXT:    mtfprd f2, r3
>> -; CHECK-P8-NEXT:    xxswapd v9, vs1
>> +; CHECK-P8-NEXT:    vmrghb v4, v8, v4
>> +; CHECK-P8-NEXT:    vmrghb v5, v9, v5
>> +; CHECK-P8-NEXT:    mtvsrd v8, r3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f5
>> -; CHECK-P8-NEXT:    mtfprd f3, r4
>> -; CHECK-P8-NEXT:    xxswapd v10, vs2
>>  ; CHECK-P8-NEXT:    mffprwz r4, f7
>> -; CHECK-P8-NEXT:    mtfprd f5, r3
>> +; CHECK-P8-NEXT:    vmrghb v0, v8, v0
>> +; CHECK-P8-NEXT:    vmrghb v1, v9, v1
>> +; CHECK-P8-NEXT:    mtvsrd v8, r3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>>  ; CHECK-P8-NEXT:    mffprwz r3, f9
>> -; CHECK-P8-NEXT:    mtfprd f7, r4
>>  ; CHECK-P8-NEXT:    mffprwz r4, f11
>> -; CHECK-P8-NEXT:    vmrglb v4, v8, v4
>> -; CHECK-P8-NEXT:    xxswapd v8, vs3
>> -; CHECK-P8-NEXT:    vmrglb v5, v9, v5
>> -; CHECK-P8-NEXT:    xxswapd v9, vs5
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    mtfprd f1, r4
>> -; CHECK-P8-NEXT:    vmrglb v0, v10, v0
>> -; CHECK-P8-NEXT:    xxswapd v10, vs7
>> -; CHECK-P8-NEXT:    vmrglb v1, v8, v1
>> -; CHECK-P8-NEXT:    xxswapd v8, vs0
>> -; CHECK-P8-NEXT:    vmrglb v6, v9, v6
>> -; CHECK-P8-NEXT:    xxswapd v9, vs1
>> -; CHECK-P8-NEXT:    vmrglb v7, v10, v7
>> -; CHECK-P8-NEXT:    vmrglb v2, v8, v2
>> -; CHECK-P8-NEXT:    vmrglb v3, v9, v3
>> +; CHECK-P8-NEXT:    vmrghb v6, v8, v6
>> +; CHECK-P8-NEXT:    vmrghb v2, v9, v2
>> +; CHECK-P8-NEXT:    mtvsrd v8, r3
>> +; CHECK-P8-NEXT:    mtvsrd v9, r4
>> +; CHECK-P8-NEXT:    vmrghb v3, v8, v3
>> +; CHECK-P8-NEXT:    vmrghb v7, v9, v7
>>  ; CHECK-P8-NEXT:    vmrglh v4, v5, v4
>>  ; CHECK-P8-NEXT:    vmrglh v5, v1, v0
>> -; CHECK-P8-NEXT:    vmrglh v0, v7, v6
>> -; CHECK-P8-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P8-NEXT:    vmrglw v3, v5, v4
>> -; CHECK-P8-NEXT:    vmrglw v2, v2, v0
>> -; CHECK-P8-NEXT:    xxmrgld v2, v2, v3
>> +; CHECK-P8-NEXT:    vmrglh v2, v2, v6
>> +; CHECK-P8-NEXT:    vmrglh v3, v7, v3
>> +; CHECK-P8-NEXT:    vmrglw v4, v5, v4
>> +; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    xxmrgld v2, v2, v4
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: test16elt_signed:
>> @@ -1078,94 +974,78 @@ define <16 x i8> @test16elt_signed(<16 x double>*
>> nocapture readonly) local_unna
>>  ; CHECK-P9-NEXT:    xscvdpsxws f8, f7
>>  ; CHECK-P9-NEXT:    xxswapd vs7, vs7
>>  ; CHECK-P9-NEXT:    xscvdpsxws f7, f7
>> +; CHECK-P9-NEXT:    lxv vs6, 16(r3)
>>  ; CHECK-P9-NEXT:    lxv vs0, 112(r3)
>>  ; CHECK-P9-NEXT:    lxv vs1, 96(r3)
>>  ; CHECK-P9-NEXT:    lxv vs2, 80(r3)
>>  ; CHECK-P9-NEXT:    lxv vs3, 64(r3)
>>  ; CHECK-P9-NEXT:    lxv vs4, 48(r3)
>>  ; CHECK-P9-NEXT:    lxv vs5, 32(r3)
>> -; CHECK-P9-NEXT:    lxv vs6, 16(r3)
>>  ; CHECK-P9-NEXT:    mffprwz r3, f8
>> -; CHECK-P9-NEXT:    mtfprd f8, r3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f7
>> -; CHECK-P9-NEXT:    xxswapd v2, vs8
>> -; CHECK-P9-NEXT:    mtfprd f7, r3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs7
>>  ; CHECK-P9-NEXT:    xscvdpsxws f7, f6
>>  ; CHECK-P9-NEXT:    xxswapd vs6, vs6
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f6, f6
>> +; CHECK-P9-NEXT:    vmrghb v2, v2, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f7
>> -; CHECK-P9-NEXT:    mtfprd f7, r3
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f6
>> -; CHECK-P9-NEXT:    mtfprd f6, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs6
>>  ; CHECK-P9-NEXT:    xscvdpsxws f6, f5
>>  ; CHECK-P9-NEXT:    xxswapd vs5, vs5
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f5, f5
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f6
>> -; CHECK-P9-NEXT:    mtfprd f6, r3
>> -; CHECK-P9-NEXT:    mffprwz r3, f5
>> -; CHECK-P9-NEXT:    vmrglb v2, v2, v3
>> -; CHECK-P9-NEXT:    xxswapd v3, vs7
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>>  ; CHECK-P9-NEXT:    vmrglh v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs6
>> -; CHECK-P9-NEXT:    mtfprd f5, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs5
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>> +; CHECK-P9-NEXT:    mffprwz r3, f5
>>  ; CHECK-P9-NEXT:    xscvdpsxws f5, f4
>>  ; CHECK-P9-NEXT:    xxswapd vs4, vs4
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f4
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f5
>> -; CHECK-P9-NEXT:    mtfprd f5, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs4
>>  ; CHECK-P9-NEXT:    xscvdpsxws f4, f3
>>  ; CHECK-P9-NEXT:    xxswapd vs3, vs3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f3
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs5
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f4
>> -; CHECK-P9-NEXT:    mtfprd f4, r3
>> +; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P9-NEXT:    mtvsrd v3, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> -; CHECK-P9-NEXT:    xxswapd v4, vs3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f3, f2
>>  ; CHECK-P9-NEXT:    xxswapd vs2, vs2
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f2
>> +; CHECK-P9-NEXT:    vmrghb v3, v3, v4
>>  ; CHECK-P9-NEXT:    mffprwz r3, f3
>> -; CHECK-P9-NEXT:    mtfprd f3, r3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs2
>>  ; CHECK-P9-NEXT:    xscvdpsxws f2, f1
>>  ; CHECK-P9-NEXT:    xxswapd vs1, vs1
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f1
>> -; CHECK-P9-NEXT:    vmrglw v2, v3, v2
>> -; CHECK-P9-NEXT:    xxswapd v3, vs4
>> -; CHECK-P9-NEXT:    vmrglb v3, v3, v4
>> -; CHECK-P9-NEXT:    xxswapd v4, vs3
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> -; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    mffprwz r3, f2
>> -; CHECK-P9-NEXT:    mtfprd f2, r3
>> +; CHECK-P9-NEXT:    vmrglh v3, v4, v3
>> +; CHECK-P9-NEXT:    mtvsrd v4, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    xxswapd v4, vs2
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>>  ; CHECK-P9-NEXT:    xscvdpsxws f1, f0
>>  ; CHECK-P9-NEXT:    xxswapd vs0, vs0
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    xscvdpsxws f0, f0
>> +; CHECK-P9-NEXT:    vmrghb v4, v4, v5
>>  ; CHECK-P9-NEXT:    mffprwz r3, f1
>> -; CHECK-P9-NEXT:    mtfprd f1, r3
>> +; CHECK-P9-NEXT:    mtvsrd v5, r3
>>  ; CHECK-P9-NEXT:    mffprwz r3, f0
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    vmrglb v4, v4, v5
>> -; CHECK-P9-NEXT:    xxswapd v5, vs1
>> -; CHECK-P9-NEXT:    xxswapd v0, vs0
>> -; CHECK-P9-NEXT:    vmrglb v5, v5, v0
>> +; CHECK-P9-NEXT:    mtvsrd v0, r3
>> +; CHECK-P9-NEXT:    vmrghb v5, v5, v0
>>  ; CHECK-P9-NEXT:    vmrglh v4, v5, v4
>>  ; CHECK-P9-NEXT:    vmrglw v3, v4, v3
>>  ; CHECK-P9-NEXT:    xxmrgld v2, v3, v2
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> index e51af62cb128..5ecd34941b39 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp32_elts.ll
>> @@ -24,9 +24,9 @@ define i64 @test2elt(i32 %a.coerce) local_unnamed_addr
>> #0 {
>>  ; CHECK-P8-NEXT:    xscvuxdsp f1, f1
>>  ; CHECK-P8-NEXT:    xscvdpspn vs0, f0
>>  ; CHECK-P8-NEXT:    xscvdpspn vs1, f1
>> -; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -43,12 +43,12 @@ define i64 @test2elt(i32 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>>  ; CHECK-P9-NEXT:    vextuhrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    clrlwi r3, r3, 16
>> -; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 3
>>  ; CHECK-P9-NEXT:    mtfprwz f0, r3
>>  ; CHECK-P9-NEXT:    xscvuxdsp f0, f0
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -80,25 +80,17 @@ entry:
>>  define <4 x float> @test4elt(i64 %a.coerce) local_unnamed_addr #1 {
>>  ; CHECK-P8-LABEL: test4elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    addi r3, r4, .LCPI1_0 at toc@l
>> -; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    lvx v3, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v4, v2, v3
>> +; CHECK-P8-NEXT:    xxlxor v2, v2, v2
>> +; CHECK-P8-NEXT:    mtvsrd v3, r3
>> +; CHECK-P8-NEXT:    vmrghh v2, v2, v3
>>  ; CHECK-P8-NEXT:    xvcvuxwsp v2, v2
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: test4elt:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    addis r3, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r3, r3, .LCPI1_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v3, 0, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P9-NEXT:    vperm v2, v4, v2, v3
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>> +; CHECK-P9-NEXT:    xxlxor v3, v3, v3
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>>  ; CHECK-P9-NEXT:    xvcvuxwsp v2, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -121,17 +113,11 @@ entry:
>>  define void @test8elt(<8 x float>* noalias nocapture sret %agg.result,
>> <8 x i16> %a) local_unnamed_addr #2 {
>>  ; CHECK-P8-LABEL: test8elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI2_0 at toc@ha
>> -; CHECK-P8-NEXT:    addis r5, r2, .LCPI2_1 at toc@ha
>> -; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI2_0 at toc@l
>> -; CHECK-P8-NEXT:    lvx v3, 0, r4
>> -; CHECK-P8-NEXT:    addi r4, r5, .LCPI2_1 at toc@l
>> -; CHECK-P8-NEXT:    lvx v5, 0, r4
>> +; CHECK-P8-NEXT:    xxlxor v3, v3, v3
>>  ; CHECK-P8-NEXT:    li r4, 16
>> -; CHECK-P8-NEXT:    vperm v3, v4, v2, v3
>> -; CHECK-P8-NEXT:    vperm v2, v4, v2, v5
>> -; CHECK-P8-NEXT:    xvcvuxwsp v3, v3
>> +; CHECK-P8-NEXT:    vmrglh v4, v3, v2
>> +; CHECK-P8-NEXT:    vmrghh v2, v3, v2
>> +; CHECK-P8-NEXT:    xvcvuxwsp v3, v4
>>  ; CHECK-P8-NEXT:    xvcvuxwsp v2, v2
>>  ; CHECK-P8-NEXT:    stvx v3, 0, r3
>>  ; CHECK-P8-NEXT:    stvx v2, r3, r4
>> @@ -139,19 +125,13 @@ define void @test8elt(<8 x float>* noalias
>> nocapture sret %agg.result, <8 x i16>
>>  ;
>>  ; CHECK-P9-LABEL: test8elt:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    addis r4, r2, .LCPI2_0 at toc@ha
>> -; CHECK-P9-NEXT:    addi r4, r4, .LCPI2_0 at toc@l
>> -; CHECK-P9-NEXT:    lxvx v3, 0, r4
>> -; CHECK-P9-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P9-NEXT:    addis r4, r2, .LCPI2_1 at toc@ha
>> -; CHECK-P9-NEXT:    addi r4, r4, .LCPI2_1 at toc@l
>> -; CHECK-P9-NEXT:    vperm v3, v4, v2, v3
>> -; CHECK-P9-NEXT:    xvcvuxwsp vs0, v3
>> -; CHECK-P9-NEXT:    lxvx v3, 0, r4
>> -; CHECK-P9-NEXT:    vperm v2, v4, v2, v3
>> -; CHECK-P9-NEXT:    stxv vs0, 0(r3)
>> +; CHECK-P9-NEXT:    xxlxor v3, v3, v3
>> +; CHECK-P9-NEXT:    vmrglh v4, v3, v2
>> +; CHECK-P9-NEXT:    vmrghh v2, v3, v2
>> +; CHECK-P9-NEXT:    xvcvuxwsp vs0, v4
>>  ; CHECK-P9-NEXT:    xvcvuxwsp vs1, v2
>>  ; CHECK-P9-NEXT:    stxv vs1, 16(r3)
>> +; CHECK-P9-NEXT:    stxv vs0, 0(r3)
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>>  ; CHECK-BE-LABEL: test8elt:
>> @@ -276,9 +256,9 @@ define i64 @test2elt_signed(i32 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvsxdsp f1, f1
>>  ; CHECK-P8-NEXT:    xscvdpspn vs0, f0
>>  ; CHECK-P8-NEXT:    xscvdpspn vs1, f1
>> -; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -295,12 +275,12 @@ define i64 @test2elt_signed(i32 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>>  ; CHECK-P9-NEXT:    vextuhrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    extsh r3, r3
>> -; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 3
>>  ; CHECK-P9-NEXT:    mtfprwa f0, r3
>>  ; CHECK-P9-NEXT:    xscvsxdsp f0, f0
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -332,11 +312,10 @@ entry:
>>  define <4 x float> @test4elt_signed(i64 %a.coerce) local_unnamed_addr #1
>> {
>>  ; CHECK-P8-LABEL: test4elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> +; CHECK-P8-NEXT:    mtvsrd v2, r3
>>  ; CHECK-P8-NEXT:    vspltisw v3, 8
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> +; CHECK-P8-NEXT:    vmrghh v2, v2, v2
>>  ; CHECK-P8-NEXT:    vadduwm v3, v3, v3
>> -; CHECK-P8-NEXT:    vmrglh v2, v2, v2
>>  ; CHECK-P8-NEXT:    vslw v2, v2, v3
>>  ; CHECK-P8-NEXT:    vsraw v2, v2, v3
>>  ; CHECK-P8-NEXT:    xvcvsxwsp v2, v2
>> @@ -344,9 +323,8 @@ define <4 x float> @test4elt_signed(i64 %a.coerce)
>> local_unnamed_addr #1 {
>>  ;
>>  ; CHECK-P9-LABEL: test4elt_signed:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    mtfprd f0, r3
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vmrglh v2, v2, v2
>> +; CHECK-P9-NEXT:    mtvsrd v2, r3
>> +; CHECK-P9-NEXT:    vmrghh v2, v2, v2
>>  ; CHECK-P9-NEXT:    vextsh2w v2, v2
>>  ; CHECK-P9-NEXT:    xvcvsxwsp v2, v2
>>  ; CHECK-P9-NEXT:    blr
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> index faec95831816..ea8ede3af22a 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
>> @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i32 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: test2elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r4, r2, .LCPI0_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    addi r3, r4, .LCPI0_0 at toc@l
>> +; CHECK-P8-NEXT:    mtvsrwz v2, r3
>> +; CHECK-P8-NEXT:    addi r4, r4, .LCPI0_0 at toc@l
>>  ; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    lvx v3, 0, r3
>> +; CHECK-P8-NEXT:    lvx v3, 0, r4
>>  ; CHECK-P8-NEXT:    vperm v2, v4, v2, v3
>>  ; CHECK-P8-NEXT:    xvcvuxddp v2, v2
>>  ; CHECK-P8-NEXT:    blr
>> @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture
>> sret %agg.result, i64 %a.c
>>  ; CHECK-P8-LABEL: test4elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r5, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI1_1 at toc@ha
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI1_1 at toc@ha
>> +; CHECK-P8-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P8-NEXT:    addi r5, r5, .LCPI1_0 at toc@l
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI1_1 at toc@l
>> +; CHECK-P8-NEXT:    addi r4, r6, .LCPI1_1 at toc@l
>>  ; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    lvx v2, 0, r5
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> +; CHECK-P8-NEXT:    lvx v3, 0, r5
>>  ; CHECK-P8-NEXT:    lvx v5, 0, r4
>>  ; CHECK-P8-NEXT:    li r4, 16
>> -; CHECK-P8-NEXT:    vperm v2, v4, v3, v2
>> -; CHECK-P8-NEXT:    vperm v3, v4, v3, v5
>> -; CHECK-P8-NEXT:    xvcvuxddp vs0, v2
>> -; CHECK-P8-NEXT:    xvcvuxddp vs1, v3
>> +; CHECK-P8-NEXT:    vperm v3, v4, v2, v3
>> +; CHECK-P8-NEXT:    vperm v2, v4, v2, v5
>> +; CHECK-P8-NEXT:    xvcvuxddp vs0, v3
>> +; CHECK-P8-NEXT:    xvcvuxddp vs1, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, vs0
>>  ; CHECK-P8-NEXT:    xxswapd vs1, vs1
>>  ; CHECK-P8-NEXT:    stxvd2x vs1, r3, r4
>> @@ -74,11 +72,10 @@ define void @test4elt(<4 x double>* noalias nocapture
>> sret %agg.result, i64 %a.c
>>  ;
>>  ; CHECK-P9-LABEL: test4elt:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI1_0 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI1_0 at toc@l
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>>  ; CHECK-P9-NEXT:    xxlxor v4, v4, v4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI1_1 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI1_1 at toc@l
>> @@ -370,14 +367,13 @@ define <2 x double> @test2elt_signed(i32 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: test2elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r4, r2, .LCPI4_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    addi r3, r4, .LCPI4_0 at toc@l
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    lvx v3, 0, r3
>> +; CHECK-P8-NEXT:    mtvsrwz v3, r3
>>  ; CHECK-P8-NEXT:    addis r3, r2, .LCPI4_1 at toc@ha
>> +; CHECK-P8-NEXT:    addi r4, r4, .LCPI4_0 at toc@l
>>  ; CHECK-P8-NEXT:    addi r3, r3, .LCPI4_1 at toc@l
>> +; CHECK-P8-NEXT:    lvx v2, 0, r4
>>  ; CHECK-P8-NEXT:    lxvd2x vs0, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v2, v2, v3
>> +; CHECK-P8-NEXT:    vperm v2, v3, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P8-NEXT:    vsld v2, v2, v3
>>  ; CHECK-P8-NEXT:    vsrad v2, v2, v3
>> @@ -415,17 +411,16 @@ define void @test4elt_signed(<4 x double>* noalias
>> nocapture sret %agg.result, i
>>  ; CHECK-P8-LABEL: test4elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r5, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI5_2 at toc@ha
>> -; CHECK-P8-NEXT:    addi r5, r5, .LCPI5_0 at toc@l
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI5_2 at toc@l
>> -; CHECK-P8-NEXT:    lvx v2, 0, r5
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    lvx v4, 0, r4
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI5_2 at toc@ha
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    addis r4, r2, .LCPI5_1 at toc@ha
>> +; CHECK-P8-NEXT:    addi r5, r5, .LCPI5_0 at toc@l
>>  ; CHECK-P8-NEXT:    addi r4, r4, .LCPI5_1 at toc@l
>> +; CHECK-P8-NEXT:    lvx v2, 0, r5
>> +; CHECK-P8-NEXT:    addi r5, r6, .LCPI5_2 at toc@l
>>  ; CHECK-P8-NEXT:    lxvd2x vs0, 0, r4
>>  ; CHECK-P8-NEXT:    li r4, 16
>> +; CHECK-P8-NEXT:    lvx v4, 0, r5
>>  ; CHECK-P8-NEXT:    vperm v2, v3, v3, v2
>>  ; CHECK-P8-NEXT:    vperm v3, v3, v3, v4
>>  ; CHECK-P8-NEXT:    xxswapd v4, vs0
>> @@ -443,14 +438,13 @@ define void @test4elt_signed(<4 x double>* noalias
>> nocapture sret %agg.result, i
>>  ;
>>  ; CHECK-P9-LABEL: test4elt_signed:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI5_0 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI5_0 at toc@l
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vperm v3, v2, v2, v3
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI5_1 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI5_1 at toc@l
>> +; CHECK-P9-NEXT:    vperm v3, v2, v2, v3
>>  ; CHECK-P9-NEXT:    vextsh2d v3, v3
>>  ; CHECK-P9-NEXT:    xvcvsxddp vs0, v3
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> index 6f046f69ecca..f152c2b008ff 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i64_to_fp32_elts.ll
>> @@ -18,9 +18,9 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr
>> #0 {
>>  ; CHECK-P8-NEXT:    xscvuxdsp f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpspn vs1, f1
>>  ; CHECK-P8-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -30,12 +30,12 @@ define i64 @test2elt(<2 x i64> %a) local_unnamed_addr
>> #0 {
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P9-NEXT:    xscvuxdsp f0, f0
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 3
>>  ; CHECK-P9-NEXT:    xxlor vs0, v2, v2
>>  ; CHECK-P9-NEXT:    xscvuxdsp f0, f0
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -311,9 +311,9 @@ define i64 @test2elt_signed(<2 x i64> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvsxdsp f0, f0
>>  ; CHECK-P8-NEXT:    xscvdpspn vs1, f1
>>  ; CHECK-P8-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -323,12 +323,12 @@ define i64 @test2elt_signed(<2 x i64> %a)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P9-NEXT:    xscvsxdsp f0, f0
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 3
>>  ; CHECK-P9-NEXT:    xxlor vs0, v2, v2
>>  ; CHECK-P9-NEXT:    xscvsxdsp f0, f0
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> index ce97ed67baa1..f2cb9f5f45fb 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp32_elts.ll
>> @@ -24,9 +24,9 @@ define i64 @test2elt(i16 %a.coerce) local_unnamed_addr
>> #0 {
>>  ; CHECK-P8-NEXT:    xscvuxdsp f1, f1
>>  ; CHECK-P8-NEXT:    xscvdpspn vs0, f0
>>  ; CHECK-P8-NEXT:    xscvdpspn vs1, f1
>> -; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -43,12 +43,12 @@ define i64 @test2elt(i16 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>>  ; CHECK-P9-NEXT:    vextubrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    clrlwi r3, r3, 24
>> -; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 3
>>  ; CHECK-P9-NEXT:    mtfprwz f0, r3
>>  ; CHECK-P9-NEXT:    xscvuxdsp f0, f0
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -81,11 +81,10 @@ define <4 x float> @test4elt(i32 %a.coerce)
>> local_unnamed_addr #1 {
>>  ; CHECK-P8-LABEL: test4elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r4, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    addi r3, r4, .LCPI1_0 at toc@l
>> +; CHECK-P8-NEXT:    mtvsrwz v2, r3
>> +; CHECK-P8-NEXT:    addi r4, r4, .LCPI1_0 at toc@l
>>  ; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    lvx v3, 0, r3
>> +; CHECK-P8-NEXT:    lvx v3, 0, r4
>>  ; CHECK-P8-NEXT:    vperm v2, v4, v2, v3
>>  ; CHECK-P8-NEXT:    xvcvuxwsp v2, v2
>>  ; CHECK-P8-NEXT:    blr
>> @@ -121,30 +120,28 @@ define void @test8elt(<8 x float>* noalias
>> nocapture sret %agg.result, i64 %a.co
>>  ; CHECK-P8-LABEL: test8elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r5, r2, .LCPI2_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI2_1 at toc@ha
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI2_1 at toc@ha
>> +; CHECK-P8-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P8-NEXT:    addi r5, r5, .LCPI2_0 at toc@l
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI2_1 at toc@l
>> +; CHECK-P8-NEXT:    addi r4, r6, .LCPI2_1 at toc@l
>>  ; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    lvx v2, 0, r5
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> +; CHECK-P8-NEXT:    lvx v3, 0, r5
>>  ; CHECK-P8-NEXT:    lvx v5, 0, r4
>>  ; CHECK-P8-NEXT:    li r4, 16
>> -; CHECK-P8-NEXT:    vperm v2, v4, v3, v2
>> -; CHECK-P8-NEXT:    vperm v3, v4, v3, v5
>> -; CHECK-P8-NEXT:    xvcvuxwsp v2, v2
>> +; CHECK-P8-NEXT:    vperm v3, v4, v2, v3
>> +; CHECK-P8-NEXT:    vperm v2, v4, v2, v5
>>  ; CHECK-P8-NEXT:    xvcvuxwsp v3, v3
>> -; CHECK-P8-NEXT:    stvx v2, 0, r3
>> -; CHECK-P8-NEXT:    stvx v3, r3, r4
>> +; CHECK-P8-NEXT:    xvcvuxwsp v2, v2
>> +; CHECK-P8-NEXT:    stvx v3, 0, r3
>> +; CHECK-P8-NEXT:    stvx v2, r3, r4
>>  ; CHECK-P8-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: test8elt:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI2_0 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI2_0 at toc@l
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>>  ; CHECK-P9-NEXT:    xxlxor v4, v4, v4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI2_1 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI2_1 at toc@l
>> @@ -292,9 +289,9 @@ define i64 @test2elt_signed(i16 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-NEXT:    xscvsxdsp f1, f1
>>  ; CHECK-P8-NEXT:    xscvdpspn vs0, f0
>>  ; CHECK-P8-NEXT:    xscvdpspn vs1, f1
>> -; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 1
>> -; CHECK-P8-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-P8-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P8-NEXT:    xxsldwi v3, vs1, vs1, 3
>> +; CHECK-P8-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, v2
>>  ; CHECK-P8-NEXT:    mffprd r3, f0
>>  ; CHECK-P8-NEXT:    blr
>> @@ -311,12 +308,12 @@ define i64 @test2elt_signed(i16 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>>  ; CHECK-P9-NEXT:    vextubrx r3, r3, v2
>>  ; CHECK-P9-NEXT:    extsb r3, r3
>> -; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 1
>> +; CHECK-P9-NEXT:    xxsldwi v3, vs0, vs0, 3
>>  ; CHECK-P9-NEXT:    mtfprwa f0, r3
>>  ; CHECK-P9-NEXT:    xscvsxdsp f0, f0
>>  ; CHECK-P9-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-P9-NEXT:    vmrglw v2, v2, v3
>> +; CHECK-P9-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-P9-NEXT:    vmrghw v2, v2, v3
>>  ; CHECK-P9-NEXT:    mfvsrld r3, v2
>>  ; CHECK-P9-NEXT:    blr
>>  ;
>> @@ -349,11 +346,10 @@ define <4 x float> @test4elt_signed(i32 %a.coerce)
>> local_unnamed_addr #1 {
>>  ; CHECK-P8-LABEL: test4elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r4, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    addi r3, r4, .LCPI5_0 at toc@l
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    lvx v3, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v2, v2, v3
>> +; CHECK-P8-NEXT:    mtvsrwz v3, r3
>> +; CHECK-P8-NEXT:    addi r4, r4, .LCPI5_0 at toc@l
>> +; CHECK-P8-NEXT:    lvx v2, 0, r4
>> +; CHECK-P8-NEXT:    vperm v2, v3, v3, v2
>>  ; CHECK-P8-NEXT:    vspltisw v3, 12
>>  ; CHECK-P8-NEXT:    vadduwm v3, v3, v3
>>  ; CHECK-P8-NEXT:    vslw v2, v2, v3
>> @@ -392,15 +388,14 @@ define void @test8elt_signed(<8 x float>* noalias
>> nocapture sret %agg.result, i6
>>  ; CHECK-P8-LABEL: test8elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r5, r2, .LCPI6_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI6_1 at toc@ha
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI6_1 at toc@ha
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>>  ; CHECK-P8-NEXT:    vspltisw v5, 12
>> +; CHECK-P8-NEXT:    li r4, 16
>>  ; CHECK-P8-NEXT:    addi r5, r5, .LCPI6_0 at toc@l
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI6_1 at toc@l
>>  ; CHECK-P8-NEXT:    lvx v2, 0, r5
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    lvx v4, 0, r4
>> -; CHECK-P8-NEXT:    li r4, 16
>> +; CHECK-P8-NEXT:    addi r5, r6, .LCPI6_1 at toc@l
>> +; CHECK-P8-NEXT:    lvx v4, 0, r5
>>  ; CHECK-P8-NEXT:    vperm v2, v3, v3, v2
>>  ; CHECK-P8-NEXT:    vperm v3, v3, v3, v4
>>  ; CHECK-P8-NEXT:    vadduwm v4, v5, v5
>> @@ -416,14 +411,13 @@ define void @test8elt_signed(<8 x float>* noalias
>> nocapture sret %agg.result, i6
>>  ;
>>  ; CHECK-P9-LABEL: test8elt_signed:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI6_0 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI6_0 at toc@l
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vperm v3, v2, v2, v3
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI6_1 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI6_1 at toc@l
>> +; CHECK-P9-NEXT:    vperm v3, v2, v2, v3
>>  ; CHECK-P9-NEXT:    vextsb2w v3, v3
>>  ; CHECK-P9-NEXT:    xvcvsxwsp vs0, v3
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> index b4582e844f30..268fc9b7d4cc 100644
>> --- a/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vec_conv_i8_to_fp64_elts.ll
>> @@ -13,11 +13,10 @@ define <2 x double> @test2elt(i16 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: test2elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r4, r2, .LCPI0_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    addi r3, r4, .LCPI0_0 at toc@l
>> +; CHECK-P8-NEXT:    mtvsrwz v2, r3
>> +; CHECK-P8-NEXT:    addi r4, r4, .LCPI0_0 at toc@l
>>  ; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    lvx v3, 0, r3
>> +; CHECK-P8-NEXT:    lvx v3, 0, r4
>>  ; CHECK-P8-NEXT:    vperm v2, v4, v2, v3
>>  ; CHECK-P8-NEXT:    xvcvuxddp v2, v2
>>  ; CHECK-P8-NEXT:    blr
>> @@ -53,19 +52,18 @@ define void @test4elt(<4 x double>* noalias nocapture
>> sret %agg.result, i32 %a.c
>>  ; CHECK-P8-LABEL: test4elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r5, r2, .LCPI1_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI1_1 at toc@ha
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI1_1 at toc@ha
>> +; CHECK-P8-NEXT:    mtvsrwz v2, r4
>>  ; CHECK-P8-NEXT:    addi r5, r5, .LCPI1_0 at toc@l
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI1_1 at toc@l
>> +; CHECK-P8-NEXT:    addi r4, r6, .LCPI1_1 at toc@l
>>  ; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    lvx v2, 0, r5
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> +; CHECK-P8-NEXT:    lvx v3, 0, r5
>>  ; CHECK-P8-NEXT:    lvx v5, 0, r4
>>  ; CHECK-P8-NEXT:    li r4, 16
>> -; CHECK-P8-NEXT:    vperm v2, v4, v3, v2
>> -; CHECK-P8-NEXT:    vperm v3, v4, v3, v5
>> -; CHECK-P8-NEXT:    xvcvuxddp vs0, v2
>> -; CHECK-P8-NEXT:    xvcvuxddp vs1, v3
>> +; CHECK-P8-NEXT:    vperm v3, v4, v2, v3
>> +; CHECK-P8-NEXT:    vperm v2, v4, v2, v5
>> +; CHECK-P8-NEXT:    xvcvuxddp vs0, v3
>> +; CHECK-P8-NEXT:    xvcvuxddp vs1, v2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, vs0
>>  ; CHECK-P8-NEXT:    xxswapd vs1, vs1
>>  ; CHECK-P8-NEXT:    stxvd2x vs1, r3, r4
>> @@ -118,33 +116,32 @@ define void @test8elt(<8 x double>* noalias
>> nocapture sret %agg.result, i64 %a.c
>>  ; CHECK-P8-LABEL: test8elt:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r5, r2, .LCPI2_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI2_2 at toc@ha
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI2_2 at toc@ha
>> +; CHECK-P8-NEXT:    mtvsrd v2, r4
>> +; CHECK-P8-NEXT:    addis r4, r2, .LCPI2_3 at toc@ha
>>  ; CHECK-P8-NEXT:    addi r5, r5, .LCPI2_0 at toc@l
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI2_2 at toc@l
>> +; CHECK-P8-NEXT:    addi r4, r4, .LCPI2_3 at toc@l
>>  ; CHECK-P8-NEXT:    xxlxor v4, v4, v4
>> -; CHECK-P8-NEXT:    lvx v2, 0, r5
>> -; CHECK-P8-NEXT:    addis r5, r2, .LCPI2_3 at toc@ha
>> -; CHECK-P8-NEXT:    lvx v5, 0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI2_1 at toc@ha
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    addi r5, r5, .LCPI2_3 at toc@l
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI2_1 at toc@l
>> -; CHECK-P8-NEXT:    lvx v0, 0, r5
>> -; CHECK-P8-NEXT:    lvx v1, 0, r4
>> +; CHECK-P8-NEXT:    lvx v3, 0, r5
>> +; CHECK-P8-NEXT:    addi r5, r6, .LCPI2_2 at toc@l
>> +; CHECK-P8-NEXT:    lvx v0, 0, r4
>>  ; CHECK-P8-NEXT:    li r4, 48
>> +; CHECK-P8-NEXT:    lvx v5, 0, r5
>> +; CHECK-P8-NEXT:    addis r5, r2, .LCPI2_1 at toc@ha
>> +; CHECK-P8-NEXT:    addi r5, r5, .LCPI2_1 at toc@l
>> +; CHECK-P8-NEXT:    lvx v1, 0, r5
>> +; CHECK-P8-NEXT:    vperm v0, v4, v2, v0
>>  ; CHECK-P8-NEXT:    li r5, 32
>> -; CHECK-P8-NEXT:    vperm v2, v4, v3, v2
>> -; CHECK-P8-NEXT:    vperm v5, v4, v3, v5
>> -; CHECK-P8-NEXT:    vperm v0, v4, v3, v0
>> -; CHECK-P8-NEXT:    vperm v3, v4, v3, v1
>> -; CHECK-P8-NEXT:    xvcvuxddp vs0, v2
>> -; CHECK-P8-NEXT:    xvcvuxddp vs1, v5
>> +; CHECK-P8-NEXT:    vperm v3, v4, v2, v3
>> +; CHECK-P8-NEXT:    vperm v5, v4, v2, v5
>> +; CHECK-P8-NEXT:    vperm v2, v4, v2, v1
>>  ; CHECK-P8-NEXT:    xvcvuxddp vs2, v0
>> -; CHECK-P8-NEXT:    xvcvuxddp vs3, v3
>> +; CHECK-P8-NEXT:    xvcvuxddp vs0, v3
>> +; CHECK-P8-NEXT:    xvcvuxddp vs1, v5
>> +; CHECK-P8-NEXT:    xvcvuxddp vs3, v2
>> +; CHECK-P8-NEXT:    xxswapd vs2, vs2
>>  ; CHECK-P8-NEXT:    xxswapd vs0, vs0
>>  ; CHECK-P8-NEXT:    xxswapd vs1, vs1
>> -; CHECK-P8-NEXT:    xxswapd vs2, vs2
>>  ; CHECK-P8-NEXT:    xxswapd vs3, vs3
>>  ; CHECK-P8-NEXT:    stxvd2x vs2, r3, r4
>>  ; CHECK-P8-NEXT:    li r4, 16
>> @@ -155,11 +152,10 @@ define void @test8elt(<8 x double>* noalias
>> nocapture sret %agg.result, i64 %a.c
>>  ;
>>  ; CHECK-P9-LABEL: test8elt:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI2_0 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI2_0 at toc@l
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>>  ; CHECK-P9-NEXT:    xxlxor v4, v4, v4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI2_1 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI2_1 at toc@l
>> @@ -404,14 +400,13 @@ define <2 x double> @test2elt_signed(i16 %a.coerce)
>> local_unnamed_addr #0 {
>>  ; CHECK-P8-LABEL: test2elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r4, r2, .LCPI4_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r3
>> -; CHECK-P8-NEXT:    addi r3, r4, .LCPI4_0 at toc@l
>> -; CHECK-P8-NEXT:    xxswapd v2, vs0
>> -; CHECK-P8-NEXT:    lvx v3, 0, r3
>> +; CHECK-P8-NEXT:    mtvsrwz v3, r3
>>  ; CHECK-P8-NEXT:    addis r3, r2, .LCPI4_1 at toc@ha
>> +; CHECK-P8-NEXT:    addi r4, r4, .LCPI4_0 at toc@l
>>  ; CHECK-P8-NEXT:    addi r3, r3, .LCPI4_1 at toc@l
>> +; CHECK-P8-NEXT:    lvx v2, 0, r4
>>  ; CHECK-P8-NEXT:    lxvd2x vs0, 0, r3
>> -; CHECK-P8-NEXT:    vperm v2, v2, v2, v3
>> +; CHECK-P8-NEXT:    vperm v2, v3, v3, v2
>>  ; CHECK-P8-NEXT:    xxswapd v3, vs0
>>  ; CHECK-P8-NEXT:    vsld v2, v2, v3
>>  ; CHECK-P8-NEXT:    vsrad v2, v2, v3
>> @@ -449,17 +444,16 @@ define void @test4elt_signed(<4 x double>* noalias
>> nocapture sret %agg.result, i
>>  ; CHECK-P8-LABEL: test4elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>>  ; CHECK-P8-NEXT:    addis r5, r2, .LCPI5_0 at toc@ha
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI5_2 at toc@ha
>> -; CHECK-P8-NEXT:    addi r5, r5, .LCPI5_0 at toc@l
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI5_2 at toc@l
>> -; CHECK-P8-NEXT:    lvx v2, 0, r5
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    lvx v4, 0, r4
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI5_2 at toc@ha
>> +; CHECK-P8-NEXT:    mtvsrwz v3, r4
>>  ; CHECK-P8-NEXT:    addis r4, r2, .LCPI5_1 at toc@ha
>> +; CHECK-P8-NEXT:    addi r5, r5, .LCPI5_0 at toc@l
>>  ; CHECK-P8-NEXT:    addi r4, r4, .LCPI5_1 at toc@l
>> +; CHECK-P8-NEXT:    lvx v2, 0, r5
>> +; CHECK-P8-NEXT:    addi r5, r6, .LCPI5_2 at toc@l
>>  ; CHECK-P8-NEXT:    lxvd2x vs0, 0, r4
>>  ; CHECK-P8-NEXT:    li r4, 16
>> +; CHECK-P8-NEXT:    lvx v4, 0, r5
>>  ; CHECK-P8-NEXT:    vperm v2, v3, v3, v2
>>  ; CHECK-P8-NEXT:    vperm v3, v3, v3, v4
>>  ; CHECK-P8-NEXT:    xxswapd v4, vs0
>> @@ -523,26 +517,25 @@ entry:
>>  define void @test8elt_signed(<8 x double>* noalias nocapture sret
>> %agg.result, i64 %a.coerce) local_unnamed_addr #1 {
>>  ; CHECK-P8-LABEL: test8elt_signed:
>>  ; CHECK-P8:       # %bb.0: # %entry
>> -; CHECK-P8-NEXT:    mtfprd f0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI6_2 at toc@ha
>>  ; CHECK-P8-NEXT:    addis r5, r2, .LCPI6_0 at toc@ha
>> -; CHECK-P8-NEXT:    addis r6, r2, .LCPI6_3 at toc@ha
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI6_2 at toc@l
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI6_2 at toc@ha
>> +; CHECK-P8-NEXT:    mtvsrd v3, r4
>> +; CHECK-P8-NEXT:    addis r4, r2, .LCPI6_1 at toc@ha
>>  ; CHECK-P8-NEXT:    addi r5, r5, .LCPI6_0 at toc@l
>> -; CHECK-P8-NEXT:    addi r6, r6, .LCPI6_3 at toc@l
>> -; CHECK-P8-NEXT:    lvx v4, 0, r4
>> -; CHECK-P8-NEXT:    addis r4, r2, .LCPI6_4 at toc@ha
>> +; CHECK-P8-NEXT:    addi r6, r6, .LCPI6_2 at toc@l
>> +; CHECK-P8-NEXT:    addi r4, r4, .LCPI6_1 at toc@l
>>  ; CHECK-P8-NEXT:    lvx v2, 0, r5
>> -; CHECK-P8-NEXT:    xxswapd v3, vs0
>> -; CHECK-P8-NEXT:    lvx v5, 0, r6
>> -; CHECK-P8-NEXT:    addis r5, r2, .LCPI6_1 at toc@ha
>> -; CHECK-P8-NEXT:    addi r4, r4, .LCPI6_4 at toc@l
>> -; CHECK-P8-NEXT:    addi r5, r5, .LCPI6_1 at toc@l
>> -; CHECK-P8-NEXT:    lvx v0, 0, r4
>> -; CHECK-P8-NEXT:    lxvd2x vs0, 0, r5
>> +; CHECK-P8-NEXT:    addis r5, r2, .LCPI6_3 at toc@ha
>> +; CHECK-P8-NEXT:    lvx v4, 0, r6
>> +; CHECK-P8-NEXT:    addis r6, r2, .LCPI6_4 at toc@ha
>> +; CHECK-P8-NEXT:    lxvd2x vs0, 0, r4
>>  ; CHECK-P8-NEXT:    li r4, 48
>> -; CHECK-P8-NEXT:    li r5, 32
>> +; CHECK-P8-NEXT:    addi r5, r5, .LCPI6_3 at toc@l
>> +; CHECK-P8-NEXT:    lvx v5, 0, r5
>> +; CHECK-P8-NEXT:    addi r5, r6, .LCPI6_4 at toc@l
>> +; CHECK-P8-NEXT:    lvx v0, 0, r5
>>  ; CHECK-P8-NEXT:    vperm v2, v3, v3, v2
>> +; CHECK-P8-NEXT:    li r5, 32
>>  ; CHECK-P8-NEXT:    vperm v4, v3, v3, v4
>>  ; CHECK-P8-NEXT:    vperm v5, v3, v3, v5
>>  ; CHECK-P8-NEXT:    vperm v3, v3, v3, v0
>> @@ -572,14 +565,13 @@ define void @test8elt_signed(<8 x double>* noalias
>> nocapture sret %agg.result, i
>>  ;
>>  ; CHECK-P9-LABEL: test8elt_signed:
>>  ; CHECK-P9:       # %bb.0: # %entry
>> -; CHECK-P9-NEXT:    mtfprd f0, r4
>> +; CHECK-P9-NEXT:    mtvsrd v2, r4
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI6_0 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI6_0 at toc@l
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>> -; CHECK-P9-NEXT:    xxswapd v2, vs0
>> -; CHECK-P9-NEXT:    vperm v3, v2, v2, v3
>>  ; CHECK-P9-NEXT:    addis r4, r2, .LCPI6_1 at toc@ha
>>  ; CHECK-P9-NEXT:    addi r4, r4, .LCPI6_1 at toc@l
>> +; CHECK-P9-NEXT:    vperm v3, v2, v2, v3
>>  ; CHECK-P9-NEXT:    vextsb2d v3, v3
>>  ; CHECK-P9-NEXT:    xvcvsxddp vs0, v3
>>  ; CHECK-P9-NEXT:    lxvx v3, 0, r4
>>
>> diff  --git
>> a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> index 7e51f2b862ab..29955dc17f67 100644
>> --- a/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
>> @@ -82,10 +82,10 @@ define <3 x float> @constrained_vector_fdiv_v3f32()
>> #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    xscvdpspn 2, 2
>>  ; PC64LE-NEXT:    xscvdpspn 0, 0
>> -; PC64LE-NEXT:    xxsldwi 34, 1, 1, 1
>> -; PC64LE-NEXT:    xxsldwi 35, 2, 2, 1
>> -; PC64LE-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 1, 1, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 2, 2, 3
>> +; PC64LE-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    blr
>>  ;
>> @@ -106,12 +106,12 @@ define <3 x float> @constrained_vector_fdiv_v3f32()
>> #0 {
>>  ; PC64LE9-NEXT:    xsdivsp 2, 2, 0
>>  ; PC64LE9-NEXT:    xsdivsp 0, 3, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    blr
>>  entry:
>> @@ -359,11 +359,11 @@ define <3 x float> @constrained_vector_frem_v3f32()
>> #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI7_4 at toc@l
>>  ; PC64LE-NEXT:    lvx 4, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 30
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    addi 1, 1, 64
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -401,15 +401,15 @@ define <3 x float> @constrained_vector_frem_v3f32()
>> #0 {
>>  ; PC64LE9-NEXT:    bl fmodf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 29
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI7_4 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI7_4 at toc@l
>>  ; PC64LE9-NEXT:    lxvx 36, 0, 3
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    addi 1, 1, 64
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -710,10 +710,10 @@ define <3 x float> @constrained_vector_fmul_v3f32()
>> #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    xscvdpspn 2, 2
>>  ; PC64LE-NEXT:    xscvdpspn 0, 0
>> -; PC64LE-NEXT:    xxsldwi 34, 1, 1, 1
>> -; PC64LE-NEXT:    xxsldwi 35, 2, 2, 1
>> -; PC64LE-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 1, 1, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 2, 2, 3
>> +; PC64LE-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    blr
>>  ;
>> @@ -735,11 +735,11 @@ define <3 x float> @constrained_vector_fmul_v3f32()
>> #0 {
>>  ; PC64LE9-NEXT:    xsmulsp 1, 1, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 1, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 1, 1, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 1, 1, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 1, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    blr
>>  entry:
>> @@ -925,10 +925,10 @@ define <3 x float> @constrained_vector_fadd_v3f32()
>> #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    xscvdpspn 2, 2
>>  ; PC64LE-NEXT:    xscvdpspn 0, 0
>> -; PC64LE-NEXT:    xxsldwi 34, 1, 1, 1
>> -; PC64LE-NEXT:    xxsldwi 35, 2, 2, 1
>> -; PC64LE-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 1, 1, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 2, 2, 3
>> +; PC64LE-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    blr
>>  ;
>> @@ -945,15 +945,15 @@ define <3 x float> @constrained_vector_fadd_v3f32()
>> #0 {
>>  ; PC64LE9-NEXT:    xsaddsp 1, 0, 1
>>  ; PC64LE9-NEXT:    xsaddsp 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI17_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI17_3 at toc@l
>>  ; PC64LE9-NEXT:    lxvx 36, 0, 3
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    blr
>>  entry:
>> @@ -1137,10 +1137,10 @@ define <3 x float>
>> @constrained_vector_fsub_v3f32() #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    xscvdpspn 2, 2
>>  ; PC64LE-NEXT:    xscvdpspn 0, 0
>> -; PC64LE-NEXT:    xxsldwi 34, 1, 1, 1
>> -; PC64LE-NEXT:    xxsldwi 35, 2, 2, 1
>> -; PC64LE-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 1, 1, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 2, 2, 3
>> +; PC64LE-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    blr
>>  ;
>> @@ -1157,15 +1157,15 @@ define <3 x float>
>> @constrained_vector_fsub_v3f32() #0 {
>>  ; PC64LE9-NEXT:    xssubsp 1, 0, 1
>>  ; PC64LE9-NEXT:    xssubsp 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI22_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI22_3 at toc@l
>>  ; PC64LE9-NEXT:    lxvx 36, 0, 3
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    blr
>>  entry:
>> @@ -1333,12 +1333,12 @@ define <3 x float>
>> @constrained_vector_sqrt_v3f32() #0 {
>>  ; PC64LE-NEXT:    xssqrtsp 2, 2
>>  ; PC64LE-NEXT:    xscvdpspn 0, 0
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 2
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    blr
>>  ;
>> @@ -1358,10 +1358,10 @@ define <3 x float>
>> @constrained_vector_sqrt_v3f32() #0 {
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE9-NEXT:    xscvdpspn 2, 2
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 2, 2, 1
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>> +; PC64LE9-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE9-NEXT:    xxsldwi 34, 2, 2, 3
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    blr
>> @@ -1588,11 +1588,11 @@ define <3 x float>
>> @constrained_vector_pow_v3f32() #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI32_4 at toc@l
>>  ; PC64LE-NEXT:    lvx 4, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 30
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    addi 1, 1, 64
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -1630,15 +1630,15 @@ define <3 x float>
>> @constrained_vector_pow_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl powf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 29
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI32_4 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI32_4 at toc@l
>>  ; PC64LE9-NEXT:    lxvx 36, 0, 3
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    addi 1, 1, 64
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -1992,11 +1992,11 @@ define <3 x float>
>> @constrained_vector_powi_v3f32() #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI37_3 at toc@l
>>  ; PC64LE-NEXT:    lvx 4, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -2030,15 +2030,15 @@ define <3 x float>
>> @constrained_vector_powi_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl __powisf2
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI37_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI37_3 at toc@l
>>  ; PC64LE9-NEXT:    lxvx 36, 0, 3
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -2360,12 +2360,12 @@ define <3 x float>
>> @constrained_vector_sin_v3f32() #0 {
>>  ; PC64LE-NEXT:    addis 3, 2, .LCPI42_3 at toc@ha
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI42_3 at toc@l
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -2396,15 +2396,15 @@ define <3 x float>
>> @constrained_vector_sin_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl sinf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI42_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI42_3 at toc@l
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -2709,12 +2709,12 @@ define <3 x float>
>> @constrained_vector_cos_v3f32() #0 {
>>  ; PC64LE-NEXT:    addis 3, 2, .LCPI47_3 at toc@ha
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI47_3 at toc@l
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -2745,15 +2745,15 @@ define <3 x float>
>> @constrained_vector_cos_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl cosf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI47_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI47_3 at toc@l
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -3058,12 +3058,12 @@ define <3 x float>
>> @constrained_vector_exp_v3f32() #0 {
>>  ; PC64LE-NEXT:    addis 3, 2, .LCPI52_3 at toc@ha
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI52_3 at toc@l
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -3094,15 +3094,15 @@ define <3 x float>
>> @constrained_vector_exp_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl expf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI52_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI52_3 at toc@l
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -3407,12 +3407,12 @@ define <3 x float>
>> @constrained_vector_exp2_v3f32() #0 {
>>  ; PC64LE-NEXT:    addis 3, 2, .LCPI57_3 at toc@ha
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI57_3 at toc@l
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -3443,15 +3443,15 @@ define <3 x float>
>> @constrained_vector_exp2_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl exp2f
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI57_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI57_3 at toc@l
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -3756,12 +3756,12 @@ define <3 x float>
>> @constrained_vector_log_v3f32() #0 {
>>  ; PC64LE-NEXT:    addis 3, 2, .LCPI62_3 at toc@ha
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI62_3 at toc@l
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -3792,15 +3792,15 @@ define <3 x float>
>> @constrained_vector_log_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl logf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI62_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI62_3 at toc@l
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -4105,12 +4105,12 @@ define <3 x float>
>> @constrained_vector_log10_v3f32() #0 {
>>  ; PC64LE-NEXT:    addis 3, 2, .LCPI67_3 at toc@ha
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI67_3 at toc@l
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -4141,15 +4141,15 @@ define <3 x float>
>> @constrained_vector_log10_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl log10f
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI67_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI67_3 at toc@l
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -4454,12 +4454,12 @@ define <3 x float>
>> @constrained_vector_log2_v3f32() #0 {
>>  ; PC64LE-NEXT:    addis 3, 2, .LCPI72_3 at toc@ha
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI72_3 at toc@l
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -4490,15 +4490,15 @@ define <3 x float>
>> @constrained_vector_log2_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl log2f
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI72_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI72_3 at toc@l
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -4748,12 +4748,12 @@ define <3 x float>
>> @constrained_vector_rint_v3f32() #0 {
>>  ; PC64LE-NEXT:    xsrdpic 2, 2
>>  ; PC64LE-NEXT:    xscvdpspn 0, 0
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 2
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    blr
>>  ;
>> @@ -4773,10 +4773,10 @@ define <3 x float>
>> @constrained_vector_rint_v3f32() #0 {
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE9-NEXT:    xscvdpspn 2, 2
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 2, 2, 1
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>> +; PC64LE9-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE9-NEXT:    xxsldwi 34, 2, 2, 3
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    blr
>> @@ -4947,12 +4947,12 @@ define <3 x float>
>> @constrained_vector_nearbyint_v3f32() #0 {
>>  ; PC64LE-NEXT:    addis 3, 2, .LCPI82_3 at toc@ha
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI82_3 at toc@l
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 31
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    addi 1, 1, 48
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -4983,15 +4983,15 @@ define <3 x float>
>> @constrained_vector_nearbyint_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl nearbyintf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 31
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI82_3 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI82_3 at toc@l
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    addi 1, 1, 48
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -5184,11 +5184,11 @@ define <3 x float>
>> @constrained_vector_maxnum_v3f32() #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI87_5 at toc@l
>>  ; PC64LE-NEXT:    lvx 4, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 30
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    addi 1, 1, 64
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -5227,15 +5227,15 @@ define <3 x float>
>> @constrained_vector_maxnum_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl fmaxf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 29
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI87_5 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI87_5 at toc@l
>>  ; PC64LE9-NEXT:    lxvx 36, 0, 3
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    addi 1, 1, 64
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -5471,11 +5471,11 @@ define <3 x float>
>> @constrained_vector_minnum_v3f32() #0 {
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>>  ; PC64LE-NEXT:    addi 3, 3, .LCPI92_5 at toc@l
>>  ; PC64LE-NEXT:    lvx 4, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 30
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 2, 3
>> -; PC64LE-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 2, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE-NEXT:    addi 1, 1, 64
>>  ; PC64LE-NEXT:    ld 0, 16(1)
>> @@ -5514,15 +5514,15 @@ define <3 x float>
>> @constrained_vector_minnum_v3f32() #0 {
>>  ; PC64LE9-NEXT:    bl fminf
>>  ; PC64LE9-NEXT:    nop
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 1
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 29
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 30
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI92_5 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI92_5 at toc@l
>>  ; PC64LE9-NEXT:    lxvx 36, 0, 3
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 3, 2, 4
>>  ; PC64LE9-NEXT:    addi 1, 1, 64
>>  ; PC64LE9-NEXT:    ld 0, 16(1)
>> @@ -5686,9 +5686,9 @@ define <2 x float>
>> @constrained_vector_fptrunc_v2f64() #0 {
>>  ; PC64LE-NEXT:    xsrsp 1, 1
>>  ; PC64LE-NEXT:    xscvdpspn 0, 0
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE-NEXT:    blr
>>  ;
>>  ; PC64LE9-LABEL: constrained_vector_fptrunc_v2f64:
>> @@ -5698,12 +5698,12 @@ define <2 x float>
>> @constrained_vector_fptrunc_v2f64() #0 {
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI96_1 at toc@ha
>>  ; PC64LE9-NEXT:    xsrsp 0, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    lfd 0, .LCPI96_1 at toc@l(3)
>>  ; PC64LE9-NEXT:    xsrsp 0, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    blr
>>  entry:
>>    %result = call <2 x float>
>> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(
>> @@ -5729,12 +5729,12 @@ define <3 x float>
>> @constrained_vector_fptrunc_v3f64() #0 {
>>  ; PC64LE-NEXT:    xsrsp 2, 2
>>  ; PC64LE-NEXT:    xscvdpspn 0, 0
>>  ; PC64LE-NEXT:    xscvdpspn 1, 1
>> -; PC64LE-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE-NEXT:    xscvdpspn 0, 2
>> -; PC64LE-NEXT:    xxsldwi 35, 1, 1, 1
>> -; PC64LE-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE-NEXT:    xxsldwi 35, 1, 1, 3
>> +; PC64LE-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE-NEXT:    lvx 3, 0, 3
>> -; PC64LE-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE-NEXT:    blr
>>  ;
>> @@ -5745,20 +5745,20 @@ define <3 x float>
>> @constrained_vector_fptrunc_v3f64() #0 {
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI97_1 at toc@ha
>>  ; PC64LE9-NEXT:    xsrsp 0, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>> -; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 34, 0, 0, 3
>>  ; PC64LE9-NEXT:    lfd 0, .LCPI97_1 at toc@l(3)
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI97_2 at toc@ha
>>  ; PC64LE9-NEXT:    addi 3, 3, .LCPI97_2 at toc@l
>>  ; PC64LE9-NEXT:    xsrsp 0, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>> -; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 1
>> -; PC64LE9-NEXT:    vmrglw 2, 3, 2
>> +; PC64LE9-NEXT:    xxsldwi 35, 0, 0, 3
>> +; PC64LE9-NEXT:    vmrghw 2, 3, 2
>>  ; PC64LE9-NEXT:    lxvx 35, 0, 3
>>  ; PC64LE9-NEXT:    addis 3, 2, .LCPI97_3 at toc@ha
>>  ; PC64LE9-NEXT:    lfd 0, .LCPI97_3 at toc@l(3)
>>  ; PC64LE9-NEXT:    xsrsp 0, 0
>>  ; PC64LE9-NEXT:    xscvdpspn 0, 0
>> -; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 1
>> +; PC64LE9-NEXT:    xxsldwi 36, 0, 0, 3
>>  ; PC64LE9-NEXT:    vperm 2, 4, 2, 3
>>  ; PC64LE9-NEXT:    blr
>>  entry:
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vsx.ll
>> b/llvm/test/CodeGen/PowerPC/vsx.ll
>> index 8b4e3640ef6b..4a78218262ca 100644
>> --- a/llvm/test/CodeGen/PowerPC/vsx.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vsx.ll
>> @@ -1404,9 +1404,9 @@ define <2 x float> @test44(<2 x i64> %a) {
>>  ; CHECK-LE-NEXT:    xscvuxdsp f0, f0
>>  ; CHECK-LE-NEXT:    xscvdpspn vs1, f1
>>  ; CHECK-LE-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-LE-NEXT:    xxsldwi v3, vs1, vs1, 1
>> -; CHECK-LE-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-LE-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-LE-NEXT:    xxsldwi v3, vs1, vs1, 3
>> +; CHECK-LE-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-LE-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-LE-NEXT:    blr
>>    %v = uitofp <2 x i64> %a to <2 x float>
>>    ret <2 x float> %v
>> @@ -1486,9 +1486,9 @@ define <2 x float> @test45(<2 x i64> %a) {
>>  ; CHECK-LE-NEXT:    xscvsxdsp f0, f0
>>  ; CHECK-LE-NEXT:    xscvdpspn vs1, f1
>>  ; CHECK-LE-NEXT:    xscvdpspn vs0, f0
>> -; CHECK-LE-NEXT:    xxsldwi v3, vs1, vs1, 1
>> -; CHECK-LE-NEXT:    xxsldwi v2, vs0, vs0, 1
>> -; CHECK-LE-NEXT:    vmrglw v2, v3, v2
>> +; CHECK-LE-NEXT:    xxsldwi v3, vs1, vs1, 3
>> +; CHECK-LE-NEXT:    xxsldwi v2, vs0, vs0, 3
>> +; CHECK-LE-NEXT:    vmrghw v2, v3, v2
>>  ; CHECK-LE-NEXT:    blr
>>    %v = sitofp <2 x i64> %a to <2 x float>
>>    ret <2 x float> %v
>> @@ -2437,12 +2437,11 @@ define <2 x i32> @test80(i32 %v) {
>>  ;
>>  ; CHECK-LE-LABEL: test80:
>>  ; CHECK-LE:       # %bb.0:
>> -; CHECK-LE-NEXT:    mtfprd f0, r3
>> +; CHECK-LE-NEXT:    mtfprwz f0, r3
>>  ; CHECK-LE-NEXT:    addis r4, r2, .LCPI65_0 at toc@ha
>>  ; CHECK-LE-NEXT:    addi r3, r4, .LCPI65_0 at toc@l
>> -; CHECK-LE-NEXT:    xxswapd vs0, vs0
>> +; CHECK-LE-NEXT:    xxspltw v2, vs0, 1
>>  ; CHECK-LE-NEXT:    lvx v3, 0, r3
>> -; CHECK-LE-NEXT:    xxspltw v2, vs0, 3
>>  ; CHECK-LE-NEXT:    vadduwm v2, v2, v3
>>  ; CHECK-LE-NEXT:    blr
>>    %b1 = insertelement <2 x i32> undef, i32 %v, i32 0
>>
>> diff  --git a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>> b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>> index 5c05f8dc3d81..a198604f79a4 100644
>> --- a/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>> +++ b/llvm/test/CodeGen/PowerPC/vsx_insert_extract_le.ll
>> @@ -17,17 +17,15 @@ define <2 x double> @testi0(<2 x double>* %p1,
>> double* %p2) {
>>  ; CHECK-NEXT:    lxvd2x vs0, 0, r3
>>  ; CHECK-NEXT:    lfdx f1, 0, r4
>>  ; CHECK-NEXT:    xxswapd vs0, vs0
>> -; CHECK-NEXT:    xxspltd vs1, vs1, 0
>> -; CHECK-NEXT:    xxpermdi v2, vs0, vs1, 1
>> +; CHECK-NEXT:    xxmrghd v2, vs0, vs1
>>  ; CHECK-NEXT:    blr
>>  ;
>>  ; CHECK-P9-VECTOR-LABEL: testi0:
>>  ; CHECK-P9-VECTOR:       # %bb.0:
>>  ; CHECK-P9-VECTOR-NEXT:    lxvd2x vs0, 0, r3
>>  ; CHECK-P9-VECTOR-NEXT:    lfdx f1, 0, r4
>> -; CHECK-P9-VECTOR-NEXT:    xxspltd vs1, vs1, 0
>>  ; CHECK-P9-VECTOR-NEXT:    xxswapd vs0, vs0
>> -; CHECK-P9-VECTOR-NEXT:    xxpermdi v2, vs0, vs1, 1
>> +; CHECK-P9-VECTOR-NEXT:    xxmrghd v2, vs0, vs1
>>  ; CHECK-P9-VECTOR-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testi0:
>> @@ -51,17 +49,15 @@ define <2 x double> @testi1(<2 x double>* %p1,
>> double* %p2) {
>>  ; CHECK-NEXT:    lxvd2x vs0, 0, r3
>>  ; CHECK-NEXT:    lfdx f1, 0, r4
>>  ; CHECK-NEXT:    xxswapd vs0, vs0
>> -; CHECK-NEXT:    xxspltd vs1, vs1, 0
>> -; CHECK-NEXT:    xxmrgld v2, vs1, vs0
>> +; CHECK-NEXT:    xxpermdi v2, vs1, vs0, 1
>>  ; CHECK-NEXT:    blr
>>  ;
>>  ; CHECK-P9-VECTOR-LABEL: testi1:
>>  ; CHECK-P9-VECTOR:       # %bb.0:
>>  ; CHECK-P9-VECTOR-NEXT:    lxvd2x vs0, 0, r3
>>  ; CHECK-P9-VECTOR-NEXT:    lfdx f1, 0, r4
>> -; CHECK-P9-VECTOR-NEXT:    xxspltd vs1, vs1, 0
>>  ; CHECK-P9-VECTOR-NEXT:    xxswapd vs0, vs0
>> -; CHECK-P9-VECTOR-NEXT:    xxmrgld v2, vs1, vs0
>> +; CHECK-P9-VECTOR-NEXT:    xxpermdi v2, vs1, vs0, 1
>>  ; CHECK-P9-VECTOR-NEXT:    blr
>>  ;
>>  ; CHECK-P9-LABEL: testi1:
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200708/8ffc45ee/attachment-0001.html>


More information about the llvm-commits mailing list