[llvm] r319901 - [Hexagon] Generate HVX code for vector construction and access

Eric Christopher via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 8 18:40:29 PST 2018


One more :)

clang noticed that HvxSelector::zerous was unused and was warning on me so
I removed it here:

Committing to https://llvm.org/svn/llvm-project/llvm/trunk ...
M lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp
Committed r322053

If you still need it you might want to add a use. :)

-eric

On Wed, Dec 6, 2017 at 11:44 AM Krzysztof Parzyszek via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Sorry about that, and thanks.
>
> -Krzysztof
>
> On 12/6/2017 12:52 PM, Davide Italiano wrote:
> > I'll go ahead and commit the following to unblock our work, but feel
> > free to follow up accordingly if you don't like it.
> >
> > $ git diff
> > diff --git a/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp
> > b/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp
> > index a636e4e1557..5dc5e764f67 100644
> > --- a/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp
> > +++ b/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp
> > @@ -729,7 +729,9 @@ void NodeTemplate::print(raw_ostream &OS, const
> > SelectionDAG &G) const {
> >
> >   void ResultStack::print(raw_ostream &OS, const SelectionDAG &G) const {
> >     OS << "Input node:\n";
> > +#ifndef NDEBUG
> >     InpNode->dumpr(&G);
> > +#endif
> >     OS << "Result templates:\n";
> >     for (unsigned I = 0, E = List.size(); I != E; ++I) {
> >       OS << '[' << I << "] ";
> >
> >
> > On Wed, Dec 6, 2017 at 10:48 AM, Davide Italiano <davide at freebsd.org>
> wrote:
> >> The build is failing on macOS for me. I think this commit might be
> >> responsible, taking in consideration the range (yesterday night/this
> >> morning).
> >>
> >> Undefined symbols for architecture x86_64:
> >>    "llvm::SDNode::dumpr(llvm::SelectionDAG const*) const", referenced
> from:
> >>        ResultStack::print(llvm::raw_ostream&, llvm::SelectionDAG
> >> const&) const in libLLVMHexagonCodeGen.a(HexagonISelDAGToDAGHVX.cpp.o)
> >> ld: symbol(s) not found for architecture x86_64
> >>
> >> On Wed, Dec 6, 2017 at 8:40 AM, Krzysztof Parzyszek via llvm-commits
> >> <llvm-commits at lists.llvm.org> wrote:
> >>> Author: kparzysz
> >>> Date: Wed Dec  6 08:40:37 2017
> >>> New Revision: 319901
> >>>
> >>> URL: http://llvm.org/viewvc/llvm-project?rev=319901&view=rev
> >>> Log:
> >>> [Hexagon] Generate HVX code for vector construction and access
> >>>
> >>> Support for:
> >>>    - build vector,
> >>>    - extract vector element, subvector,
> >>>    - insert vector element, subvector,
> >>>    - shuffle.
> >>>
> >>> Added:
> >>>      llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp
> >>>      llvm/trunk/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/align-128b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/align-64b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/contract-128b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/contract-64b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/deal-128b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/deal-64b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/delta-128b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/delta-64b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/delta2-64b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/extract-element.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/reg-sequence.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/shuff-128b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/shuff-64b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/shuff-combos-128b.ll
> >>>      llvm/trunk/test/CodeGen/Hexagon/autohvx/shuff-combos-64b.ll
> >>> Modified:
> >>>      llvm/trunk/lib/Target/Hexagon/CMakeLists.txt
> >>>      llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.cpp
> >>>      llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.h
> >>>      llvm/trunk/lib/Target/Hexagon/HexagonISelLowering.cpp
> >>>      llvm/trunk/lib/Target/Hexagon/HexagonISelLowering.h
> >>>      llvm/trunk/lib/Target/Hexagon/HexagonPatterns.td
> >>>
> >>> Modified: llvm/trunk/lib/Target/Hexagon/CMakeLists.txt
> >>> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Hexagon/CMakeLists.txt?rev=319901&r1=319900&r2=319901&view=diff
> >>>
> ==============================================================================
> >>> --- llvm/trunk/lib/Target/Hexagon/CMakeLists.txt (original)
> >>> +++ llvm/trunk/lib/Target/Hexagon/CMakeLists.txt Wed Dec  6 08:40:37
> 2017
> >>> @@ -35,7 +35,9 @@ add_llvm_target(HexagonCodeGen
> >>>     HexagonHazardRecognizer.cpp
> >>>     HexagonInstrInfo.cpp
> >>>     HexagonISelDAGToDAG.cpp
> >>> +  HexagonISelDAGToDAGHVX.cpp
> >>>     HexagonISelLowering.cpp
> >>> +  HexagonISelLoweringHVX.cpp
> >>>     HexagonLoopIdiomRecognition.cpp
> >>>     HexagonMachineFunctionInfo.cpp
> >>>     HexagonMachineScheduler.cpp
> >>>
> >>> Modified: llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.cpp
> >>> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.cpp?rev=319901&r1=319900&r2=319901&view=diff
> >>>
> ==============================================================================
> >>> --- llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.cpp (original)
> >>> +++ llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.cpp Wed Dec  6
> 08:40:37 2017
> >>> @@ -754,7 +754,6 @@ void HexagonDAGToDAGISel::SelectBitcast(
> >>>     CurDAG->RemoveDeadNode(N);
> >>>   }
> >>>
> >>> -
> >>>   void HexagonDAGToDAGISel::Select(SDNode *N) {
> >>>     if (N->isMachineOpcode())
> >>>       return N->setNodeId(-1);  // Already selected.
> >>> @@ -772,6 +771,13 @@ void HexagonDAGToDAGISel::Select(SDNode
> >>>     case ISD::INTRINSIC_WO_CHAIN:   return SelectIntrinsicWOChain(N);
> >>>     }
> >>>
> >>> +  if (HST->useHVXOps()) {
> >>> +    switch (N->getOpcode()) {
> >>> +    case ISD::VECTOR_SHUFFLE:     return SelectHvxShuffle(N);
> >>> +    case HexagonISD::VROR:        return SelectHvxRor(N);
> >>> +    }
> >>> +  }
> >>> +
> >>>     SelectCode(N);
> >>>   }
> >>>
> >>>
> >>> Modified: llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.h
> >>> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.h?rev=319901&r1=319900&r2=319901&view=diff
> >>>
> ==============================================================================
> >>> --- llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.h (original)
> >>> +++ llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAG.h Wed Dec  6
> 08:40:37 2017
> >>> @@ -26,6 +26,7 @@ namespace llvm {
> >>>   class MachineFunction;
> >>>   class HexagonInstrInfo;
> >>>   class HexagonRegisterInfo;
> >>> +class HexagonTargetLowering;
> >>>
> >>>   class HexagonDAGToDAGISel : public SelectionDAGISel {
> >>>     const HexagonSubtarget *HST;
> >>> @@ -100,13 +101,25 @@ public:
> >>>     void SelectConstant(SDNode *N);
> >>>     void SelectConstantFP(SDNode *N);
> >>>     void SelectBitcast(SDNode *N);
> >>> -  void SelectVectorShuffle(SDNode *N);
> >>>
> >>> -  // Include the pieces autogenerated from the target description.
> >>> +  // Include the declarations autogenerated from the selection
> patterns.
> >>>     #define GET_DAGISEL_DECL
> >>>     #include "HexagonGenDAGISel.inc"
> >>>
> >>>   private:
> >>> +  // This is really only to get access to ReplaceNode (which is a
> protected
> >>> +  // member). Any other members used by HvxSelector can be moved
> around to
> >>> +  // make them accessible).
> >>> +  friend struct HvxSelector;
> >>> +
> >>> +  SDValue selectUndef(const SDLoc &dl, MVT ResTy) {
> >>> +    SDNode *U = CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF,
> dl, ResTy);
> >>> +    return SDValue(U, 0);
> >>> +  }
> >>> +
> >>> +  void SelectHvxShuffle(SDNode *N);
> >>> +  void SelectHvxRor(SDNode *N);
> >>> +
> >>>     bool keepsLowBits(const SDValue &Val, unsigned NumBits, SDValue
> &Src);
> >>>     bool isOrEquivalentToAdd(const SDNode *N) const;
> >>>     bool isAlignedMemNode(const MemSDNode *N) const;
> >>>
> >>> Added: llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp
> >>> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp?rev=319901&view=auto
> >>>
> ==============================================================================
> >>> --- llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp (added)
> >>> +++ llvm/trunk/lib/Target/Hexagon/HexagonISelDAGToDAGHVX.cpp Wed Dec
> 6 08:40:37 2017
> >>> @@ -0,0 +1,1924 @@
> >>> +//===-- HexagonISelDAGToDAGHVX.cpp
> ----------------------------------------===//
> >>> +//
> >>> +//                     The LLVM Compiler Infrastructure
> >>> +//
> >>> +// This file is distributed under the University of Illinois Open
> Source
> >>> +// License. See LICENSE.TXT for details.
> >>> +//
> >>>
> +//===----------------------------------------------------------------------===//
> >>> +
> >>> +#include "Hexagon.h"
> >>> +#include "HexagonISelDAGToDAG.h"
> >>> +#include "HexagonISelLowering.h"
> >>> +#include "HexagonTargetMachine.h"
> >>> +#include "llvm/CodeGen/MachineInstrBuilder.h"
> >>> +#include "llvm/CodeGen/SelectionDAGISel.h"
> >>> +#include "llvm/IR/Intrinsics.h"
> >>> +#include "llvm/Support/CommandLine.h"
> >>> +#include "llvm/Support/Debug.h"
> >>> +
> >>> +#include <deque>
> >>> +#include <map>
> >>> +#include <set>
> >>> +#include <utility>
> >>> +#include <vector>
> >>> +
> >>> +#define DEBUG_TYPE "hexagon-isel"
> >>> +
> >>> +using namespace llvm;
> >>> +
> >>> +//
> --------------------------------------------------------------------
> >>> +// Implementation of permutation networks.
> >>> +
> >>> +// Implementation of the node routing through butterfly networks:
> >>> +// - Forward delta.
> >>> +// - Reverse delta.
> >>> +// - Benes.
> >>> +//
> >>> +//
> >>> +// Forward delta network consists of log(N) steps, where N is the
> number
> >>> +// of inputs. In each step, an input can stay in place, or it can get
> >>> +// routed to another position[1]. The step after that consists of two
> >>> +// networks, each half in size in terms of the number of nodes. In
> those
> >>> +// terms, in the given step, an input can go to either the upper or
> the
> >>> +// lower network in the next step.
> >>> +//
> >>> +// [1] Hexagon's vdelta/vrdelta allow an element to be routed to both
> >>> +// positions as long as there is no conflict.
> >>> +
> >>> +// Here's a delta network for 8 inputs, only the switching routes are
> >>> +// shown:
> >>> +//
> >>> +//         Steps:
> >>> +//         |- 1 ---------------|- 2 -----|- 3 -|
> >>> +//
> >>> +// Inp[0] ***                 ***       ***   *** Out[0]
> >>> +//           \               /   \     /   \ /
> >>> +//            \             /     \   /     X
> >>> +//             \           /       \ /     / \
> >>> +// Inp[1] ***   \         /   ***   X   ***   *** Out[1]
> >>> +//           \   \       /   /   \ / \ /
> >>> +//            \   \     /   /     X   X
> >>> +//             \   \   /   /     / \ / \
> >>> +// Inp[2] ***   \   \ /   /   ***   X   ***   *** Out[2]
> >>> +//           \   \   X   /   /     / \     \ /
> >>> +//            \   \ / \ /   /     /   \     X
> >>> +//             \   X   X   /     /     \   / \
> >>> +// Inp[3] ***   \ / \ / \ /   ***       ***   *** Out[3]
> >>> +//           \   X   X   X   /
> >>> +//            \ / \ / \ / \ /
> >>> +//             X   X   X   X
> >>> +//            / \ / \ / \ / \
> >>> +//           /   X   X   X   \
> >>> +// Inp[4] ***   / \ / \ / \   ***       ***   *** Out[4]
> >>> +//             /   X   X   \     \     /   \ /
> >>> +//            /   / \ / \   \     \   /     X
> >>> +//           /   /   X   \   \     \ /     / \
> >>> +// Inp[5] ***   /   / \   \   ***   X   ***   *** Out[5]
> >>> +//             /   /   \   \     \ / \ /
> >>> +//            /   /     \   \     X   X
> >>> +//           /   /       \   \   / \ / \
> >>> +// Inp[6] ***   /         \   ***   X   ***   *** Out[6]
> >>> +//             /           \       / \     \ /
> >>> +//            /             \     /   \     X
> >>> +//           /               \   /     \   / \
> >>> +// Inp[7] ***                 ***       ***   *** Out[7]
> >>> +//
> >>> +//
> >>> +// Reverse delta network is same as delta network, with the steps in
> >>> +// the opposite order.
> >>> +//
> >>> +//
> >>> +// Benes network is a forward delta network immediately followed by
> >>> +// a reverse delta network.
> >>> +
> >>> +
> >>> +// Graph coloring utility used to partition nodes into two groups:
> >>> +// they will correspond to nodes routed to the upper and lower
> networks.
> >>> +struct Coloring {
> >>> +  enum : uint8_t {
> >>> +    None = 0,
> >>> +    Red,
> >>> +    Black
> >>> +  };
> >>> +
> >>> +  using Node = int;
> >>> +  using MapType = std::map<Node,uint8_t>;
> >>> +  static constexpr Node Ignore = Node(-1);
> >>> +
> >>> +  Coloring(ArrayRef<Node> Ord) : Order(Ord) {
> >>> +    build();
> >>> +    if (!color())
> >>> +      Colors.clear();
> >>> +  }
> >>> +
> >>> +  const MapType &colors() const {
> >>> +    return Colors;
> >>> +  }
> >>> +
> >>> +  uint8_t other(uint8_t Color) {
> >>> +    if (Color == None)
> >>> +      return Red;
> >>> +    return Color == Red ? Black : Red;
> >>> +  }
> >>> +
> >>> +  void dump() const;
> >>> +
> >>> +private:
> >>> +  ArrayRef<Node> Order;
> >>> +  MapType Colors;
> >>> +  std::set<Node> Needed;
> >>> +
> >>> +  using NodeSet = std::set<Node>;
> >>> +  std::map<Node,NodeSet> Edges;
> >>> +
> >>> +  Node conj(Node Pos) {
> >>> +    Node Num = Order.size();
> >>> +    return (Pos < Num/2) ? Pos + Num/2 : Pos - Num/2;
> >>> +  }
> >>> +
> >>> +  uint8_t getColor(Node N) {
> >>> +    auto F = Colors.find(N);
> >>> +    return F != Colors.end() ? F->second : None;
> >>> +  }
> >>> +
> >>> +  std::pair<bool,uint8_t> getUniqueColor(const NodeSet &Nodes);
> >>> +
> >>> +  void build();
> >>> +  bool color();
> >>> +};
> >>> +
> >>> +std::pair<bool,uint8_t> Coloring::getUniqueColor(const NodeSet
> &Nodes) {
> >>> +  uint8_t Color = None;
> >>> +  for (Node N : Nodes) {
> >>> +    uint8_t ColorN = getColor(N);
> >>> +    if (ColorN == None)
> >>> +      continue;
> >>> +    if (Color == None)
> >>> +      Color = ColorN;
> >>> +    else if (Color != None && Color != ColorN)
> >>> +      return { false, None };
> >>> +  }
> >>> +  return { true, Color };
> >>> +}
> >>> +
> >>> +void Coloring::build() {
> >>> +  // Add Order[P] and Order[conj(P)] to Edges.
> >>> +  for (unsigned P = 0; P != Order.size(); ++P) {
> >>> +    Node I = Order[P];
> >>> +    if (I != Ignore) {
> >>> +      Needed.insert(I);
> >>> +      Node PC = Order[conj(P)];
> >>> +      if (PC != Ignore && PC != I)
> >>> +        Edges[I].insert(PC);
> >>> +    }
> >>> +  }
> >>> +  // Add I and conj(I) to Edges.
> >>> +  for (unsigned I = 0; I != Order.size(); ++I) {
> >>> +    if (!Needed.count(I))
> >>> +      continue;
> >>> +    Node C = conj(I);
> >>> +    // This will create an entry in the edge table, even if I is not
> >>> +    // connected to any other node. This is necessary, because it
> still
> >>> +    // needs to be colored.
> >>> +    NodeSet &Is = Edges[I];
> >>> +    if (Needed.count(C))
> >>> +      Is.insert(C);
> >>> +  }
> >>> +}
> >>> +
> >>> +bool Coloring::color() {
> >>> +  SetVector<Node> FirstQ;
> >>> +  auto Enqueue = [this,&FirstQ] (Node N) {
> >>> +    SetVector<Node> Q;
> >>> +    Q.insert(N);
> >>> +    for (unsigned I = 0; I != Q.size(); ++I) {
> >>> +      NodeSet &Ns = Edges[Q[I]];
> >>> +      Q.insert(Ns.begin(), Ns.end());
> >>> +    }
> >>> +    FirstQ.insert(Q.begin(), Q.end());
> >>> +  };
> >>> +  for (Node N : Needed)
> >>> +    Enqueue(N);
> >>> +
> >>> +  for (Node N : FirstQ) {
> >>> +    if (Colors.count(N))
> >>> +      continue;
> >>> +    NodeSet &Ns = Edges[N];
> >>> +    auto P = getUniqueColor(Ns);
> >>> +    if (!P.first)
> >>> +      return false;
> >>> +    Colors[N] = other(P.second);
> >>> +  }
> >>> +
> >>> +  // First, color nodes that don't have any dups.
> >>> +  for (auto E : Edges) {
> >>> +    Node N = E.first;
> >>> +    if (!Needed.count(conj(N)) || Colors.count(N))
> >>> +      continue;
> >>> +    auto P = getUniqueColor(E.second);
> >>> +    if (!P.first)
> >>> +      return false;
> >>> +    Colors[N] = other(P.second);
> >>> +  }
> >>> +
> >>> +  // Now, nodes that are still uncolored. Since the graph can be
> modified
> >>> +  // in this step, create a work queue.
> >>> +  std::vector<Node> WorkQ;
> >>> +  for (auto E : Edges) {
> >>> +    Node N = E.first;
> >>> +    if (!Colors.count(N))
> >>> +      WorkQ.push_back(N);
> >>> +  }
> >>> +
> >>> +  for (unsigned I = 0; I < WorkQ.size(); ++I) {
> >>> +    Node N = WorkQ[I];
> >>> +    NodeSet &Ns = Edges[N];
> >>> +    auto P = getUniqueColor(Ns);
> >>> +    if (P.first) {
> >>> +      Colors[N] = other(P.second);
> >>> +      continue;
> >>> +    }
> >>> +
> >>> +    // Coloring failed. Split this node.
> >>> +    Node C = conj(N);
> >>> +    uint8_t ColorN = other(None);
> >>> +    uint8_t ColorC = other(ColorN);
> >>> +    NodeSet &Cs = Edges[C];
> >>> +    NodeSet CopyNs = Ns;
> >>> +    for (Node M : CopyNs) {
> >>> +      uint8_t ColorM = getColor(M);
> >>> +      if (ColorM == ColorC) {
> >>> +        // Connect M with C, disconnect M from N.
> >>> +        Cs.insert(M);
> >>> +        Edges[M].insert(C);
> >>> +        Ns.erase(M);
> >>> +        Edges[M].erase(N);
> >>> +      }
> >>> +    }
> >>> +    Colors[N] = ColorN;
> >>> +    Colors[C] = ColorC;
> >>> +  }
> >>> +
> >>> +  // Explicitly assign "None" all all uncolored nodes.
> >>> +  for (unsigned I = 0; I != Order.size(); ++I)
> >>> +    if (Colors.count(I) == 0)
> >>> +      Colors[I] = None;
> >>> +
> >>> +  return true;
> >>> +}
> >>> +
> >>> +LLVM_DUMP_METHOD
> >>> +void Coloring::dump() const {
> >>> +  dbgs() << "{ Order:   {";
> >>> +  for (unsigned I = 0; I != Order.size(); ++I) {
> >>> +    Node P = Order[I];
> >>> +    if (P != Ignore)
> >>> +      dbgs() << ' ' << P;
> >>> +    else
> >>> +      dbgs() << " -";
> >>> +  }
> >>> +  dbgs() << " }\n";
> >>> +  dbgs() << "  Needed: {";
> >>> +  for (Node N : Needed)
> >>> +    dbgs() << ' ' << N;
> >>> +  dbgs() << " }\n";
> >>> +
> >>> +  dbgs() << "  Edges: {\n";
> >>> +  for (auto E : Edges) {
> >>> +    dbgs() << "    " << E.first << " -> {";
> >>> +    for (auto N : E.second)
> >>> +      dbgs() << ' ' << N;
> >>> +    dbgs() << " }\n";
> >>> +  }
> >>> +  dbgs() << "  }\n";
> >>> +
> >>> +  static const char *const Names[] = { "None", "Red", "Black" };
> >>> +  dbgs() << "  Colors: {\n";
> >>> +  for (auto C : Colors)
> >>> +    dbgs() << "    " << C.first << " -> " << Names[C.second] << "\n";
> >>> +  dbgs() << "  }\n}\n";
> >>> +}
> >>> +
> >>> +// Base class of for reordering networks. They don't strictly need to
> be
> >>> +// permutations, as outputs with repeated occurrences of an input
> element
> >>> +// are allowed.
> >>> +struct PermNetwork {
> >>> +  using Controls = std::vector<uint8_t>;
> >>> +  using ElemType = int;
> >>> +  static constexpr ElemType Ignore = ElemType(-1);
> >>> +
> >>> +  enum : uint8_t {
> >>> +    None,
> >>> +    Pass,
> >>> +    Switch
> >>> +  };
> >>> +  enum : uint8_t {
> >>> +    Forward,
> >>> +    Reverse
> >>> +  };
> >>> +
> >>> +  PermNetwork(ArrayRef<ElemType> Ord, unsigned Mult = 1) {
> >>> +    Order.assign(Ord.data(), Ord.data()+Ord.size());
> >>> +    Log = 0;
> >>> +
> >>> +    unsigned S = Order.size();
> >>> +    while (S >>= 1)
> >>> +      ++Log;
> >>> +
> >>> +    Table.resize(Order.size());
> >>> +    for (RowType &Row : Table)
> >>> +      Row.resize(Mult*Log, None);
> >>> +  }
> >>> +
> >>> +  void getControls(Controls &V, unsigned StartAt, uint8_t Dir) const {
> >>> +    unsigned Size = Order.size();
> >>> +    V.resize(Size);
> >>> +    for (unsigned I = 0; I != Size; ++I) {
> >>> +      unsigned W = 0;
> >>> +      for (unsigned L = 0; L != Log; ++L) {
> >>> +        unsigned C = ctl(I, StartAt+L) == Switch;
> >>> +        if (Dir == Forward)
> >>> +          W |= C << (Log-1-L);
> >>> +        else
> >>> +          W |= C << L;
> >>> +      }
> >>> +      assert(isUInt<8>(W));
> >>> +      V[I] = uint8_t(W);
> >>> +    }
> >>> +  }
> >>> +
> >>> +  uint8_t ctl(ElemType Pos, unsigned Step) const {
> >>> +    return Table[Pos][Step];
> >>> +  }
> >>> +  unsigned size() const {
> >>> +    return Order.size();
> >>> +  }
> >>> +  unsigned steps() const {
> >>> +    return Log;
> >>> +  }
> >>> +
> >>> +protected:
> >>> +  unsigned Log;
> >>> +  std::vector<ElemType> Order;
> >>> +  using RowType = std::vector<uint8_t>;
> >>> +  std::vector<RowType> Table;
> >>> +};
> >>> +
> >>> +struct ForwardDeltaNetwork : public PermNetwork {
> >>> +  ForwardDeltaNetwork(ArrayRef<ElemType> Ord) : PermNetwork(Ord) {}
> >>> +
> >>> +  bool run(Controls &V) {
> >>> +    if (!route(Order.data(), Table.data(), size(), 0))
> >>> +      return false;
> >>> +    getControls(V, 0, Forward);
> >>> +    return true;
> >>> +  }
> >>> +
> >>> +private:
> >>> +  bool route(ElemType *P, RowType *T, unsigned Size, unsigned Step);
> >>> +};
> >>> +
> >>> +struct ReverseDeltaNetwork : public PermNetwork {
> >>> +  ReverseDeltaNetwork(ArrayRef<ElemType> Ord) : PermNetwork(Ord) {}
> >>> +
> >>> +  bool run(Controls &V) {
> >>> +    if (!route(Order.data(), Table.data(), size(), 0))
> >>> +      return false;
> >>> +    getControls(V, 0, Reverse);
> >>> +    return true;
> >>> +  }
> >>> +
> >>> +private:
> >>> +  bool route(ElemType *P, RowType *T, unsigned Size, unsigned Step);
> >>> +};
> >>> +
> >>> +struct BenesNetwork : public PermNetwork {
> >>> +  BenesNetwork(ArrayRef<ElemType> Ord) : PermNetwork(Ord, 2) {}
> >>> +
> >>> +  bool run(Controls &F, Controls &R) {
> >>> +    if (!route(Order.data(), Table.data(), size(), 0))
> >>> +      return false;
> >>> +
> >>> +    getControls(F, 0, Forward);
> >>> +    getControls(R, Log, Reverse);
> >>> +    return true;
> >>> +  }
> >>> +
> >>> +private:
> >>> +  bool route(ElemType *P, RowType *T, unsigned Size, unsigned Step);
> >>> +};
> >>> +
> >>> +
> >>> +bool ForwardDeltaNetwork::route(ElemType *P, RowType *T, unsigned
> Size,
> >>> +                                unsigned Step) {
> >>> +  bool UseUp = false, UseDown = false;
> >>> +  ElemType Num = Size;
> >>> +
> >>> +  // Cannot use coloring here, because coloring is used to determine
> >>> +  // the "big" switch, i.e. the one that changes halves, and in a
> forward
> >>> +  // network, a color can be simultaneously routed to both halves in
> the
> >>> +  // step we're working on.
> >>> +  for (ElemType J = 0; J != Num; ++J) {
> >>> +    ElemType I = P[J];
> >>> +    // I is the position in the input,
> >>> +    // J is the position in the output.
> >>> +    if (I == Ignore)
> >>> +      continue;
> >>> +    uint8_t S;
> >>> +    if (I < Num/2)
> >>> +      S = (J < Num/2) ? Pass : Switch;
> >>> +    else
> >>> +      S = (J < Num/2) ? Switch : Pass;
> >>> +
> >>> +    // U is the element in the table that needs to be updated.
> >>> +    ElemType U = (S == Pass) ? I : (I < Num/2 ? I+Num/2 : I-Num/2);
> >>> +    if (U < Num/2)
> >>> +      UseUp = true;
> >>> +    else
> >>> +      UseDown = true;
> >>> +    if (T[U][Step] != S && T[U][Step] != None)
> >>> +      return false;
> >>> +    T[U][Step] = S;
> >>> +  }
> >>> +
> >>> +  for (ElemType J = 0; J != Num; ++J)
> >>> +    if (P[J] != Ignore && P[J] >= Num/2)
> >>> +      P[J] -= Num/2;
> >>> +
> >>> +  if (Step+1 < Log) {
> >>> +    if (UseUp   && !route(P,        T,        Size/2, Step+1))
> >>> +      return false;
> >>> +    if (UseDown && !route(P+Size/2, T+Size/2, Size/2, Step+1))
> >>> +      return false;
> >>> +  }
> >>> +  return true;
> >>> +}
> >>> +
> >>> +bool ReverseDeltaNetwork::route(ElemType *P, RowType *T, unsigned
> Size,
> >>> +                                unsigned Step) {
> >>> +  unsigned Pets = Log-1 - Step;
> >>> +  bool UseUp = false, UseDown = false;
> >>> +  ElemType Num = Size;
> >>> +
> >>> +  // In this step half-switching occurs, so coloring can be used.
> >>> +  Coloring G({P,Size});
> >>> +  const Coloring::MapType &M = G.colors();
> >>> +  if (M.empty())
> >>> +    return false;
> >>> +
> >>> +  uint8_t ColorUp = Coloring::None;
> >>> +  for (ElemType J = 0; J != Num; ++J) {
> >>> +    ElemType I = P[J];
> >>> +    // I is the position in the input,
> >>> +    // J is the position in the output.
> >>> +    if (I == Ignore)
> >>> +      continue;
> >>> +    uint8_t C = M.at(I);
> >>> +    if (C == Coloring::None)
> >>> +      continue;
> >>> +    // During "Step", inputs cannot switch halves, so if the "up"
> color
> >>> +    // is still unknown, make sure that it is selected in such a way
> that
> >>> +    // "I" will stay in the same half.
> >>> +    bool InpUp = I < Num/2;
> >>> +    if (ColorUp == Coloring::None)
> >>> +      ColorUp = InpUp ? C : G.other(C);
> >>> +    if ((C == ColorUp) != InpUp) {
> >>> +      // If I should go to a different half than where is it now,
> give up.
> >>> +      return false;
> >>> +    }
> >>> +
> >>> +    uint8_t S;
> >>> +    if (InpUp) {
> >>> +      S = (J < Num/2) ? Pass : Switch;
> >>> +      UseUp = true;
> >>> +    } else {
> >>> +      S = (J < Num/2) ? Switch : Pass;
> >>> +      UseDown = true;
> >>> +    }
> >>> +    T[J][Pets] = S;
> >>> +  }
> >>> +
> >>> +  // Reorder the working permutation according to the computed switch
> table
> >>> +  // for the last step (i.e. Pets).
> >>> +  for (ElemType J = 0; J != Size/2; ++J) {
> >>> +    ElemType PJ = P[J];         // Current values of P[J]
> >>> +    ElemType PC = P[J+Size/2];  // and P[conj(J)]
> >>> +    ElemType QJ = PJ;           // New values of P[J]
> >>> +    ElemType QC = PC;           // and P[conj(J)]
> >>> +    if (T[J][Pets] == Switch)
> >>> +      QC = PJ;
> >>> +    if (T[J+Size/2][Pets] == Switch)
> >>> +      QJ = PC;
> >>> +    P[J] = QJ;
> >>> +    P[J+Size/2] = QC;
> >>> +  }
> >>> +
> >>> +  for (ElemType J = 0; J != Num; ++J)
> >>> +    if (P[J] != Ignore && P[J] >= Num/2)
> >>> +      P[J] -= Num/2;
> >>> +
> >>> +  if (Step+1 < Log) {
> >>> +    if (UseUp && !route(P, T, Size/2, Step+1))
> >>> +      return false;
> >>> +    if (UseDown && !route(P+Size/2, T+Size/2, Size/2, Step+1))
> >>> +      return false;
> >>> +  }
> >>> +  return true;
> >>> +}
> >>> +
> >>> +bool BenesNetwork::route(ElemType *P, RowType *T, unsigned Size,
> >>> +                         unsigned Step) {
> >>> +  Coloring G({P,Size});
> >>> +  const Coloring::MapType &M = G.colors();
> >>> +  if (M.empty())
> >>> +    return false;
> >>> +  ElemType Num = Size;
> >>> +
> >>> +  unsigned Pets = 2*Log-1 - Step;
> >>> +  bool UseUp = false, UseDown = false;
> >>> +
> >>> +  // Both assignments, i.e. Red->Up and Red->Down are valid, but they
> will
> >>> +  // result in different controls. Let's pick the one where the first
> >>> +  // control will be "Pass".
> >>> +  uint8_t ColorUp = Coloring::None;
> >>> +  for (ElemType J = 0; J != Num; ++J) {
> >>> +    ElemType I = P[J];
> >>> +    if (I == Ignore)
> >>> +      continue;
> >>> +    uint8_t C = M.at(I);
> >>> +    if (C == Coloring::None)
> >>> +      continue;
> >>> +    if (ColorUp == Coloring::None) {
> >>> +      ColorUp = (I < Num/2) ? Coloring::Red : Coloring::Black;
> >>> +    }
> >>> +    unsigned CI = (I < Num/2) ? I+Num/2 : I-Num/2;
> >>> +    if (C == ColorUp) {
> >>> +      if (I < Num/2)
> >>> +        T[I][Step] = Pass;
> >>> +      else
> >>> +        T[CI][Step] = Switch;
> >>> +      T[J][Pets] = (J < Num/2) ? Pass : Switch;
> >>> +      UseUp = true;
> >>> +    } else { // Down
> >>> +      if (I < Num/2)
> >>> +        T[CI][Step] = Switch;
> >>> +      else
> >>> +        T[I][Step] = Pass;
> >>> +      T[J][Pets] = (J < Num/2) ? Switch : Pass;
> >>> +      UseDown = true;
> >>> +    }
> >>> +  }
> >>> +
> >>> +  // Reorder the working permutation according to the computed switch
> table
> >>> +  // for the last step (i.e. Pets).
> >>> +  for (ElemType J = 0; J != Num/2; ++J) {
> >>> +    ElemType PJ = P[J];         // Current values of P[J]
> >>> +    ElemType PC = P[J+Num/2];   // and P[conj(J)]
> >>> +    ElemType QJ = PJ;           // New values of P[J]
> >>> +    ElemType QC = PC;           // and P[conj(J)]
> >>> +    if (T[J][Pets] == Switch)
> >>> +      QC = PJ;
> >>> +    if (T[J+Num/2][Pets] == Switch)
> >>> +      QJ = PC;
> >>> +    P[J] = QJ;
> >>> +    P[J+Num/2] = QC;
> >>> +  }
> >>> +
> >>> +  for (ElemType J = 0; J != Num; ++J)
> >>> +    if (P[J] != Ignore && P[J] >= Num/2)
> >>> +      P[J] -= Num/2;
> >>> +
> >>> +  if (Step+1 < Log) {
> >>> +    if (UseUp && !route(P, T, Size/2, Step+1))
> >>> +      return false;
> >>> +    if (UseDown && !route(P+Size/2, T+Size/2, Size/2, Step+1))
> >>> +      return false;
> >>> +  }
> >>> +  return true;
> >>> +}
> >>> +
> >>> +//
> --------------------------------------------------------------------
> >>> +// Support for building selection results (output instructions that
> are
> >>> +// parts of the final selection).
> >>> +
> >>> +struct OpRef {
> >>> +  OpRef(SDValue V) : OpV(V) {}
> >>> +  bool isValue() const { return OpV.getNode() != nullptr; }
> >>> +  bool isValid() const { return isValue() || !(OpN & Invalid); }
> >>> +  static OpRef res(int N) { return OpRef(Whole | (N & Index)); }
> >>> +  static OpRef fail() { return OpRef(Invalid); }
> >>> +
> >>> +  static OpRef lo(const OpRef &R) {
> >>> +    assert(!R.isValue());
> >>> +    return OpRef(R.OpN & (Undef | Index | LoHalf));
> >>> +  }
> >>> +  static OpRef hi(const OpRef &R) {
> >>> +    assert(!R.isValue());
> >>> +    return OpRef(R.OpN & (Undef | Index | HiHalf));
> >>> +  }
> >>> +  static OpRef undef(MVT Ty) { return OpRef(Undef | Ty.SimpleTy); }
> >>> +
> >>> +  // Direct value.
> >>> +  SDValue OpV = SDValue();
> >>> +
> >>> +  // Reference to the operand of the input node:
> >>> +  // If the 31st bit is 1, it's undef, otherwise, bits 28..0 are the
> >>> +  // operand index:
> >>> +  // If bit 30 is set, it's the high half of the operand.
> >>> +  // If bit 29 is set, it's the low half of the operand.
> >>> +  unsigned OpN = 0;
> >>> +
> >>> +  enum : unsigned {
> >>> +    Invalid = 0x10000000,
> >>> +    LoHalf  = 0x20000000,
> >>> +    HiHalf  = 0x40000000,
> >>> +    Whole   = LoHalf | HiHalf,
> >>> +    Undef   = 0x80000000,
> >>> +    Index   = 0x0FFFFFFF,  // Mask of the index value.
> >>> +    IndexBits = 28,
> >>> +  };
> >>> +
> >>> +  void print(raw_ostream &OS, const SelectionDAG &G) const;
> >>> +
> >>> +private:
> >>> +  OpRef(unsigned N) : OpN(N) {}
> >>> +};
> >>> +
> >>> +struct NodeTemplate {
> >>> +  NodeTemplate() = default;
> >>> +  unsigned Opc = 0;
> >>> +  MVT Ty = MVT::Other;
> >>> +  std::vector<OpRef> Ops;
> >>> +
> >>> +  void print(raw_ostream &OS, const SelectionDAG &G) const;
> >>> +};
> >>> +
> >>> +struct ResultStack {
> >>> +  ResultStack(SDNode *Inp)
> >>> +    : InpNode(Inp), InpTy(Inp->getValueType(0).getSimpleVT()) {}
> >>> +  SDNode *InpNode;
> >>> +  MVT InpTy;
> >>> +  unsigned push(const NodeTemplate &Res) {
> >>> +    List.push_back(Res);
> >>> +    return List.size()-1;
> >>> +  }
> >>> +  unsigned push(unsigned Opc, MVT Ty, std::vector<OpRef> &&Ops) {
> >>> +    NodeTemplate Res;
> >>> +    Res.Opc = Opc;
> >>> +    Res.Ty = Ty;
> >>> +    Res.Ops = Ops;
> >>> +    return push(Res);
> >>> +  }
> >>> +  bool empty() const { return List.empty(); }
> >>> +  unsigned size() const { return List.size(); }
> >>> +  unsigned top() const { return size()-1; }
> >>> +  const NodeTemplate &operator[](unsigned I) const { return List[I]; }
> >>> +  unsigned reset(unsigned NewTop) {
> >>> +    List.resize(NewTop+1);
> >>> +    return NewTop;
> >>> +  }
> >>> +
> >>> +  using BaseType = std::vector<NodeTemplate>;
> >>> +  BaseType::iterator begin() { return List.begin(); }
> >>> +  BaseType::iterator end()   { return List.end(); }
> >>> +  BaseType::const_iterator begin() const { return List.begin(); }
> >>> +  BaseType::const_iterator end() const   { return List.end(); }
> >>> +
> >>> +  BaseType List;
> >>> +
> >>> +  void print(raw_ostream &OS, const SelectionDAG &G) const;
> >>> +};
> >>> +
> >>> +void OpRef::print(raw_ostream &OS, const SelectionDAG &G) const {
> >>> +  if (isValue()) {
> >>> +    OpV.getNode()->print(OS, &G);
> >>> +    return;
> >>> +  }
> >>> +  if (OpN & Invalid) {
> >>> +    OS << "invalid";
> >>> +    return;
> >>> +  }
> >>> +  if (OpN & Undef) {
> >>> +    OS << "undef";
> >>> +    return;
> >>> +  }
> >>> +  if ((OpN & Whole) != Whole) {
> >>> +    assert((OpN & Whole) == LoHalf || (OpN & Whole) == HiHalf);
> >>> +    if (OpN & LoHalf)
> >>> +      OS << "lo ";
> >>> +    else
> >>> +      OS << "hi ";
> >>> +  }
> >>> +  OS << '#' << SignExtend32(OpN & Index, IndexBits);
> >>> +}
> >>> +
> >>> +void NodeTemplate::print(raw_ostream &OS, const SelectionDAG &G)
> const {
> >>> +  const TargetInstrInfo &TII = *G.getSubtarget().getInstrInfo();
> >>> +  OS << format("%8s", EVT(Ty).getEVTString().c_str()) << "  "
> >>> +     << TII.getName(Opc);
> >>> +  bool Comma = false;
> >>> +  for (const auto &R : Ops) {
> >>> +    if (Comma)
> >>> +      OS << ',';
> >>> +    Comma = true;
> >>> +    OS << ' ';
> >>> +    R.print(OS, G);
> >>> +  }
> >>> +}
> >>> +
> >>> +void ResultStack::print(raw_ostream &OS, const SelectionDAG &G) const
> {
> >>> +  OS << "Input node:\n";
> >>> +  InpNode->dumpr(&G);
> >>> +  OS << "Result templates:\n";
> >>> +  for (unsigned I = 0, E = List.size(); I != E; ++I) {
> >>> +    OS << '[' << I << "] ";
> >>> +    List[I].print(OS, G);
> >>> +    OS << '\n';
> >>> +  }
> >>> +}
> >>> +
> >>> +struct ShuffleMask {
> >>> +  ShuffleMask(ArrayRef<int> M) : Mask(M) {
> >>> +    for (unsigned I = 0, E = Mask.size(); I != E; ++I) {
> >>> +      int M = Mask[I];
> >>> +      if (M == -1)
> >>> +        continue;
> >>> +      MinSrc = (MinSrc == -1) ? M : std::min(MinSrc, M);
> >>> +      MaxSrc = (MaxSrc == -1) ? M : std::max(MaxSrc, M);
> >>> +    }
> >>> +  }
> >>> +
> >>> +  ArrayRef<int> Mask;
> >>> +  int MinSrc = -1, MaxSrc = -1;
> >>> +
> >>> +  ShuffleMask lo() const {
> >>> +    size_t H = Mask.size()/2;
> >>> +    return ShuffleMask({Mask.data(), H});
> >>> +  }
> >>> +  ShuffleMask hi() const {
> >>> +    size_t H = Mask.size()/2;
> >>> +    return ShuffleMask({Mask.data()+H, H});
> >>> +  }
> >>> +};
> >>> +
> >>> +//
> --------------------------------------------------------------------
> >>> +// The HvxSelector class.
> >>> +
> >>> +static const HexagonTargetLowering &getHexagonLowering(SelectionDAG
> &G) {
> >>> +  return static_cast<const
> HexagonTargetLowering&>(G.getTargetLoweringInfo());
> >>> +}
> >>> +static const HexagonSubtarget &getHexagonSubtarget(SelectionDAG &G) {
> >>> +  return static_cast<const HexagonSubtarget&>(G.getSubtarget());
> >>> +}
> >>> +
> >>> +namespace llvm {
> >>> +  struct HvxSelector {
> >>> +    const HexagonTargetLowering &Lower;
> >>> +    HexagonDAGToDAGISel &ISel;
> >>> +    SelectionDAG &DAG;
> >>> +    const HexagonSubtarget &HST;
> >>> +    const unsigned HwLen;
> >>> +
> >>> +    HvxSelector(HexagonDAGToDAGISel &HS, SelectionDAG &G)
> >>> +      : Lower(getHexagonLowering(G)),  ISel(HS), DAG(G),
> >>> +        HST(getHexagonSubtarget(G)), HwLen(HST.getVectorLength()) {}
> >>> +
> >>> +    MVT getSingleVT(MVT ElemTy) const {
> >>> +      unsigned NumElems = HwLen / (ElemTy.getSizeInBits()/8);
> >>> +      return MVT::getVectorVT(ElemTy, NumElems);
> >>> +    }
> >>> +
> >>> +    MVT getPairVT(MVT ElemTy) const {
> >>> +      unsigned NumElems = (2*HwLen) / (ElemTy.getSizeInBits()/8);
> >>> +      return MVT::getVectorVT(ElemTy, NumElems);
> >>> +    }
> >>> +
> >>> +    void selectShuffle(SDNode *N);
> >>> +    void selectRor(SDNode *N);
> >>> +
> >>> +  private:
> >>> +    void materialize(const ResultStack &Results);
> >>> +
> >>> +    SDValue getVectorConstant(ArrayRef<uint8_t> Data, const SDLoc
> &dl);
> >>> +
> >>> +    enum : unsigned {
> >>> +      None,
> >>> +      PackMux,
> >>> +    };
> >>> +    OpRef concat(OpRef Va, OpRef Vb, ResultStack &Results);
> >>> +    OpRef packs(ShuffleMask SM, OpRef Va, OpRef Vb, ResultStack
> &Results,
> >>> +                MutableArrayRef<int> NewMask, unsigned Options =
> None);
> >>> +    OpRef packp(ShuffleMask SM, OpRef Va, OpRef Vb, ResultStack
> &Results,
> >>> +                MutableArrayRef<int> NewMask);
> >>> +    OpRef zerous(ShuffleMask SM, OpRef Va, ResultStack &Results);
> >>> +    OpRef vmuxs(ArrayRef<uint8_t> Bytes, OpRef Va, OpRef Vb,
> >>> +                ResultStack &Results);
> >>> +    OpRef vmuxp(ArrayRef<uint8_t> Bytes, OpRef Va, OpRef Vb,
> >>> +                ResultStack &Results);
> >>> +
> >>> +    OpRef shuffs1(ShuffleMask SM, OpRef Va, ResultStack &Results);
> >>> +    OpRef shuffs2(ShuffleMask SM, OpRef Va, OpRef Vb, ResultStack
> &Results);
> >>> +    OpRef shuffp1(ShuffleMask SM, OpRef Va, ResultStack &Results);
> >>> +    OpRef shuffp2(ShuffleMask SM, OpRef Va, OpRef Vb, ResultStack
> &Results);
> >>> +
> >>> +    OpRef butterfly(ShuffleMask SM, OpRef Va, ResultStack &Results);
> >>> +    OpRef contracting(ShuffleMask SM, OpRef Va, OpRef Vb, ResultStack
> &Results);
> >>> +    OpRef expanding(ShuffleMask SM, OpRef Va, ResultStack &Results);
> >>> +    OpRef perfect(ShuffleMask SM, OpRef Va, ResultStack &Results);
> >>> +
> >>> +    bool selectVectorConstants(SDNode *N);
> >>> +    bool scalarizeShuffle(ArrayRef<int> Mask, const SDLoc &dl, MVT
> ResTy,
> >>> +                          SDValue Va, SDValue Vb, SDNode *N);
> >>> +
> >>> +  };
> >>> +}
> >>> +
> >>> +// Return a submask of A that is shorter than A by |C| elements:
> >>> +// - if C > 0, return a submask of A that starts at position C,
> >>> +// - if C <= 0, return a submask of A that starts at 0 (reduce A by
> |C|).
> >>> +static ArrayRef<int> subm(ArrayRef<int> A, int C) {
> >>> +  if (C > 0)
> >>> +    return { A.data()+C, A.size()-C };
> >>> +  return { A.data(), A.size()+C };
> >>> +}
> >>> +
> >>> +static void splitMask(ArrayRef<int> Mask, MutableArrayRef<int> MaskL,
> >>> +                      MutableArrayRef<int> MaskR) {
> >>> +  unsigned VecLen = Mask.size();
> >>> +  assert(MaskL.size() == VecLen && MaskR.size() == VecLen);
> >>> +  for (unsigned I = 0; I != VecLen; ++I) {
> >>> +    int M = Mask[I];
> >>> +    if (M < 0) {
> >>> +      MaskL[I] = MaskR[I] = -1;
> >>> +    } else if (unsigned(M) < VecLen) {
> >>> +      MaskL[I] = M;
> >>> +      MaskR[I] = -1;
> >>> +    } else {
> >>> +      MaskL[I] = -1;
> >>> +      MaskR[I] = M-VecLen;
> >>> +    }
> >>> +  }
> >>> +}
> >>> +
> >>> +static std::pair<int,unsigned> findStrip(ArrayRef<int> A, int Inc,
> >>> +                                         unsigned MaxLen) {
> >>> +  assert(A.size() > 0 && A.size() >= MaxLen);
> >>> +  int F = A[0];
> >>> +  int E = F;
> >>> +  for (unsigned I = 1; I != MaxLen; ++I) {
> >>> +    if (A[I] - E != Inc)
> >>> +      return { F, I };
> >>> +    E = A[I];
> >>> +  }
> >>> +  return { F, MaxLen };
> >>> +}
> >>> +
> >>> +static bool isUndef(ArrayRef<int> Mask) {
> >>> +  for (int Idx : Mask)
> >>> +    if (Idx != -1)
> >>> +      return false;
> >>> +  return true;
> >>> +}
> >>> +
> >>> +static bool isIdentity(ArrayRef<int> Mask) {
> >>> +  unsigned Size = Mask.size();
> >>> +  return findStrip(Mask, 1, Size) == std::make_pair(0, Size);
> >>> +}
> >>> +
> >>> +static bool isPermutation(ArrayRef<int> Mask) {
> >>> +  // Check by adding all numbers only works if there is no overflow.
> >>> +  assert(Mask.size() < 0x00007FFF && "Sanity failure");
> >>> +  int Sum = 0;
> >>> +  for (int Idx : Mask) {
> >>> +    if (Idx == -1)
> >>> +      return false;
> >>> +    Sum += Idx;
> >>> +  }
> >>> +  int N = Mask.size();
> >>> +  return 2*Sum == N*(N-1);
> >>> +}
> >>> +
> >>> +bool HvxSelector::selectVectorConstants(SDNode *N) {
> >>> +  // Constant vectors are generated as loads from constant pools.
> >>> +  // Since they are generated during the selection process, the main
> >>> +  // selection algorithm is not aware of them. Select them directly
> >>> +  // here.
> >>> +  if (!N->isMachineOpcode() && N->getOpcode() == ISD::LOAD) {
> >>> +    SDValue Addr = cast<LoadSDNode>(N)->getBasePtr();
> >>> +    unsigned AddrOpc = Addr.getOpcode();
> >>> +    if (AddrOpc == HexagonISD::AT_PCREL || AddrOpc == HexagonISD::CP)
> {
> >>> +      if (Addr.getOperand(0).getOpcode() == ISD::TargetConstantPool) {
> >>> +        ISel.Select(N);
> >>> +        return true;
> >>> +      }
> >>> +    }
> >>> +  }
> >>> +
> >>> +  bool Selected = false;
> >>> +  for (unsigned I = 0, E = N->getNumOperands(); I != E; ++I)
> >>> +    Selected = selectVectorConstants(N->getOperand(I).getNode()) ||
> Selected;
> >>> +  return Selected;
> >>> +}
> >>> +
> >>> +void HvxSelector::materialize(const ResultStack &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {
> >>> +    dbgs() << "Materializing\n";
> >>> +    Results.print(dbgs(), DAG);
> >>> +  });
> >>> +  if (Results.empty())
> >>> +    return;
> >>> +  const SDLoc &dl(Results.InpNode);
> >>> +  std::vector<SDValue> Output;
> >>> +
> >>> +  for (unsigned I = 0, E = Results.size(); I != E; ++I) {
> >>> +    const NodeTemplate &Node = Results[I];
> >>> +    std::vector<SDValue> Ops;
> >>> +    for (const OpRef &R : Node.Ops) {
> >>> +      assert(R.isValid());
> >>> +      if (R.isValue()) {
> >>> +        Ops.push_back(R.OpV);
> >>> +        continue;
> >>> +      }
> >>> +      if (R.OpN & OpRef::Undef) {
> >>> +        MVT::SimpleValueType SVT = MVT::SimpleValueType(R.OpN &
> OpRef::Index);
> >>> +        Ops.push_back(ISel.selectUndef(dl, MVT(SVT)));
> >>> +        continue;
> >>> +      }
> >>> +      // R is an index of a result.
> >>> +      unsigned Part = R.OpN & OpRef::Whole;
> >>> +      int Idx = SignExtend32(R.OpN & OpRef::Index, OpRef::IndexBits);
> >>> +      if (Idx < 0)
> >>> +        Idx += I;
> >>> +      assert(Idx >= 0 && unsigned(Idx) < Output.size());
> >>> +      SDValue Op = Output[Idx];
> >>> +      MVT OpTy = Op.getValueType().getSimpleVT();
> >>> +      if (Part != OpRef::Whole) {
> >>> +        assert(Part == OpRef::LoHalf || Part == OpRef::HiHalf);
> >>> +        if (Op.getOpcode() == HexagonISD::VCOMBINE) {
> >>> +          Op = (Part == OpRef::HiHalf) ? Op.getOperand(0) :
> Op.getOperand(1);
> >>> +        } else {
> >>> +          MVT HalfTy = MVT::getVectorVT(OpTy.getVectorElementType(),
> >>> +
> OpTy.getVectorNumElements()/2);
> >>> +          unsigned Sub = (Part == OpRef::LoHalf) ? Hexagon::vsub_lo
> >>> +                                                 : Hexagon::vsub_hi;
> >>> +          Op = DAG.getTargetExtractSubreg(Sub, dl, HalfTy, Op);
> >>> +        }
> >>> +      }
> >>> +      Ops.push_back(Op);
> >>> +    } // for (Node : Results)
> >>> +
> >>> +    assert(Node.Ty != MVT::Other);
> >>> +    SDNode *ResN = (Node.Opc == TargetOpcode::COPY)
> >>> +                      ? Ops.front().getNode()
> >>> +                      : DAG.getMachineNode(Node.Opc, dl, Node.Ty,
> Ops);
> >>> +    Output.push_back(SDValue(ResN, 0));
> >>> +  }
> >>> +
> >>> +  SDNode *OutN = Output.back().getNode();
> >>> +  SDNode *InpN = Results.InpNode;
> >>> +  DEBUG_WITH_TYPE("isel", {
> >>> +    dbgs() << "Generated node:\n";
> >>> +    OutN->dumpr(&DAG);
> >>> +  });
> >>> +
> >>> +  ISel.ReplaceNode(InpN, OutN);
> >>> +  selectVectorConstants(OutN);
> >>> +  DAG.RemoveDeadNodes();
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::concat(OpRef Lo, OpRef Hi, ResultStack &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  const SDLoc &dl(Results.InpNode);
> >>> +  Results.push(TargetOpcode::REG_SEQUENCE, getPairVT(MVT::i8), {
> >>> +    DAG.getTargetConstant(Hexagon::HvxWRRegClassID, dl, MVT::i32),
> >>> +    Lo, DAG.getTargetConstant(Hexagon::vsub_lo, dl, MVT::i32),
> >>> +    Hi, DAG.getTargetConstant(Hexagon::vsub_hi, dl, MVT::i32),
> >>> +  });
> >>> +  return OpRef::res(Results.top());
> >>> +}
> >>> +
> >>> +// Va, Vb are single vectors, SM can be arbitrarily long.
> >>> +OpRef HvxSelector::packs(ShuffleMask SM, OpRef Va, OpRef Vb,
> >>> +                         ResultStack &Results, MutableArrayRef<int>
> NewMask,
> >>> +                         unsigned Options) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  if (!Va.isValid() || !Vb.isValid())
> >>> +    return OpRef::fail();
> >>> +
> >>> +  int VecLen = SM.Mask.size();
> >>> +  MVT Ty = getSingleVT(MVT::i8);
> >>> +
> >>> +  if (SM.MaxSrc - SM.MinSrc < int(HwLen)) {
> >>> +    if (SM.MaxSrc < int(HwLen)) {
> >>> +      memcpy(NewMask.data(), SM.Mask.data(), sizeof(int)*VecLen);
> >>> +      return Va;
> >>> +    }
> >>> +    if (SM.MinSrc >= int(HwLen)) {
> >>> +      for (int I = 0; I != VecLen; ++I) {
> >>> +        int M = SM.Mask[I];
> >>> +        if (M != -1)
> >>> +          M -= HwLen;
> >>> +        NewMask[I] = M;
> >>> +      }
> >>> +      return Vb;
> >>> +    }
> >>> +    const SDLoc &dl(Results.InpNode);
> >>> +    SDValue S = DAG.getTargetConstant(SM.MinSrc, dl, MVT::i32);
> >>> +    if (isUInt<3>(SM.MinSrc)) {
> >>> +      Results.push(Hexagon::V6_valignbi, Ty, {Vb, Va, S});
> >>> +    } else {
> >>> +      Results.push(Hexagon::A2_tfrsi, MVT::i32, {S});
> >>> +      unsigned Top = Results.top();
> >>> +      Results.push(Hexagon::V6_valignb, Ty, {Vb, Va,
> OpRef::res(Top)});
> >>> +    }
> >>> +    for (int I = 0; I != VecLen; ++I) {
> >>> +      int M = SM.Mask[I];
> >>> +      if (M != -1)
> >>> +        M -= SM.MinSrc;
> >>> +      NewMask[I] = M;
> >>> +    }
> >>> +    return OpRef::res(Results.top());
> >>> +  }
> >>> +
> >>> +  if (Options & PackMux) {
> >>> +    // If elements picked from Va and Vb have all different (source)
> indexes
> >>> +    // (relative to the start of the argument), do a mux, and update
> the mask.
> >>> +    BitVector Picked(HwLen);
> >>> +    SmallVector<uint8_t,128> MuxBytes(HwLen);
> >>> +    bool CanMux = true;
> >>> +    for (int I = 0; I != VecLen; ++I) {
> >>> +      int M = SM.Mask[I];
> >>> +      if (M == -1)
> >>> +        continue;
> >>> +      if (M >= int(HwLen))
> >>> +        M -= HwLen;
> >>> +      else
> >>> +        MuxBytes[M] = 0xFF;
> >>> +      if (Picked[M]) {
> >>> +        CanMux = false;
> >>> +        break;
> >>> +      }
> >>> +      NewMask[I] = M;
> >>> +    }
> >>> +    if (CanMux)
> >>> +      return vmuxs(MuxBytes, Va, Vb, Results);
> >>> +  }
> >>> +
> >>> +  return OpRef::fail();
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::packp(ShuffleMask SM, OpRef Va, OpRef Vb,
> >>> +                         ResultStack &Results, MutableArrayRef<int>
> NewMask) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  unsigned HalfMask = 0;
> >>> +  unsigned LogHw = Log2_32(HwLen);
> >>> +  for (int M : SM.Mask) {
> >>> +    if (M == -1)
> >>> +      continue;
> >>> +    HalfMask |= (1u << (M >> LogHw));
> >>> +  }
> >>> +
> >>> +  if (HalfMask == 0)
> >>> +    return OpRef::undef(getPairVT(MVT::i8));
> >>> +
> >>> +  // If more than two halves are used, bail.
> >>> +  // TODO: be more aggressive here?
> >>> +  if (countPopulation(HalfMask) > 2)
> >>> +    return OpRef::fail();
> >>> +
> >>> +  MVT HalfTy = getSingleVT(MVT::i8);
> >>> +
> >>> +  OpRef Inp[2] = { Va, Vb };
> >>> +  OpRef Out[2] = { OpRef::undef(HalfTy), OpRef::undef(HalfTy) };
> >>> +
> >>> +  uint8_t HalfIdx[4] = { 0xFF, 0xFF, 0xFF, 0xFF };
> >>> +  unsigned Idx = 0;
> >>> +  for (unsigned I = 0; I != 4; ++I) {
> >>> +    if ((HalfMask & (1u << I)) == 0)
> >>> +      continue;
> >>> +    assert(Idx < 2);
> >>> +    OpRef Op = Inp[I/2];
> >>> +    Out[Idx] = (I & 1) ? OpRef::hi(Op) : OpRef::lo(Op);
> >>> +    HalfIdx[I] = Idx++;
> >>> +  }
> >>> +
> >>> +  int VecLen = SM.Mask.size();
> >>> +  for (int I = 0; I != VecLen; ++I) {
> >>> +    int M = SM.Mask[I];
> >>> +    if (M >= 0) {
> >>> +      uint8_t Idx = HalfIdx[M >> LogHw];
> >>> +      assert(Idx == 0 || Idx == 1);
> >>> +      M = (M & (HwLen-1)) + HwLen*Idx;
> >>> +    }
> >>> +    NewMask[I] = M;
> >>> +  }
> >>> +
> >>> +  return concat(Out[0], Out[1], Results);
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::zerous(ShuffleMask SM, OpRef Va, ResultStack
> &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +
> >>> +  int VecLen = SM.Mask.size();
> >>> +  SmallVector<uint8_t,128> UsedBytes(VecLen);
> >>> +  bool HasUnused = false;
> >>> +  for (int I = 0; I != VecLen; ++I) {
> >>> +    if (SM.Mask[I] != -1)
> >>> +      UsedBytes[I] = 0xFF;
> >>> +    else
> >>> +      HasUnused = true;
> >>> +  }
> >>> +  if (!HasUnused)
> >>> +    return Va;
> >>> +  SDValue B = getVectorConstant(UsedBytes, SDLoc(Results.InpNode));
> >>> +  Results.push(Hexagon::V6_vand, getSingleVT(MVT::i8), {Va,
> OpRef(B)});
> >>> +  return OpRef::res(Results.top());
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::vmuxs(ArrayRef<uint8_t> Bytes, OpRef Va, OpRef Vb,
> >>> +                         ResultStack &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  MVT ByteTy = getSingleVT(MVT::i8);
> >>> +  MVT BoolTy = MVT::getVectorVT(MVT::i1, 8*HwLen); // XXX
> >>> +  const SDLoc &dl(Results.InpNode);
> >>> +  SDValue B = getVectorConstant(Bytes, dl);
> >>> +  Results.push(Hexagon::V6_vd0, ByteTy, {});
> >>> +  Results.push(Hexagon::V6_veqb, BoolTy, {OpRef(B), OpRef::res(-1)});
> >>> +  Results.push(Hexagon::V6_vmux, ByteTy, {OpRef::res(-1), Va, Vb});
> >>> +  return OpRef::res(Results.top());
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::vmuxp(ArrayRef<uint8_t> Bytes, OpRef Va, OpRef Vb,
> >>> +                         ResultStack &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  size_t S = Bytes.size() / 2;
> >>> +  OpRef L = vmuxs({Bytes.data(),   S}, OpRef::lo(Va), OpRef::lo(Vb),
> Results);
> >>> +  OpRef H = vmuxs({Bytes.data()+S, S}, OpRef::hi(Va), OpRef::hi(Vb),
> Results);
> >>> +  return concat(L, H, Results);
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::shuffs1(ShuffleMask SM, OpRef Va, ResultStack
> &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  unsigned VecLen = SM.Mask.size();
> >>> +  assert(HwLen == VecLen);
> >>> +  assert(all_of(SM.Mask, [this](int M) { return M == -1 || M <
> int(HwLen); }));
> >>> +
> >>> +  if (isIdentity(SM.Mask))
> >>> +    return Va;
> >>> +  if (isUndef(SM.Mask))
> >>> +    return OpRef::undef(getSingleVT(MVT::i8));
> >>> +
> >>> +  return butterfly(SM, Va, Results);
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::shuffs2(ShuffleMask SM, OpRef Va, OpRef Vb,
> >>> +                           ResultStack &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  OpRef C = contracting(SM, Va, Vb, Results);
> >>> +  if (C.isValid())
> >>> +    return C;
> >>> +
> >>> +  int VecLen = SM.Mask.size();
> >>> +  SmallVector<int,128> NewMask(VecLen);
> >>> +  OpRef P = packs(SM, Va, Vb, Results, NewMask);
> >>> +  if (P.isValid())
> >>> +    return shuffs1(ShuffleMask(NewMask), P, Results);
> >>> +
> >>> +  SmallVector<int,128> MaskL(VecLen), MaskR(VecLen);
> >>> +  splitMask(SM.Mask, MaskL, MaskR);
> >>> +
> >>> +  OpRef L = shuffs1(ShuffleMask(MaskL), Va, Results);
> >>> +  OpRef R = shuffs1(ShuffleMask(MaskR), Vb, Results);
> >>> +  if (!L.isValid() || !R.isValid())
> >>> +    return OpRef::fail();
> >>> +
> >>> +  SmallVector<uint8_t,128> Bytes(VecLen);
> >>> +  for (int I = 0; I != VecLen; ++I) {
> >>> +    if (MaskL[I] != -1)
> >>> +      Bytes[I] = 0xFF;
> >>> +  }
> >>> +  return vmuxs(Bytes, L, R, Results);
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::shuffp1(ShuffleMask SM, OpRef Va, ResultStack
> &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  int VecLen = SM.Mask.size();
> >>> +
> >>> +  SmallVector<int,128> PackedMask(VecLen);
> >>> +  OpRef P = packs(SM, OpRef::lo(Va), OpRef::hi(Va), Results,
> PackedMask);
> >>> +  if (P.isValid()) {
> >>> +    ShuffleMask PM(PackedMask);
> >>> +    OpRef E = expanding(PM, P, Results);
> >>> +    if (E.isValid())
> >>> +      return E;
> >>> +
> >>> +    OpRef L = shuffs1(PM.lo(), P, Results);
> >>> +    OpRef H = shuffs1(PM.hi(), P, Results);
> >>> +    if (L.isValid() && H.isValid())
> >>> +      return concat(L, H, Results);
> >>> +  }
> >>> +
> >>> +  OpRef R = perfect(SM, Va, Results);
> >>> +  if (R.isValid())
> >>> +    return R;
> >>> +  // TODO commute the mask and try the opposite order of the halves.
> >>> +
> >>> +  OpRef L = shuffs2(SM.lo(), OpRef::lo(Va), OpRef::hi(Va), Results);
> >>> +  OpRef H = shuffs2(SM.hi(), OpRef::lo(Va), OpRef::hi(Va), Results);
> >>> +  if (L.isValid() && H.isValid())
> >>> +    return concat(L, H, Results);
> >>> +
> >>> +  return OpRef::fail();
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::shuffp2(ShuffleMask SM, OpRef Va, OpRef Vb,
> >>> +                           ResultStack &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  int VecLen = SM.Mask.size();
> >>> +
> >>> +  SmallVector<int,256> PackedMask(VecLen);
> >>> +  OpRef P = packp(SM, Va, Vb, Results, PackedMask);
> >>> +  if (P.isValid())
> >>> +    return shuffp1(ShuffleMask(PackedMask), P, Results);
> >>> +
> >>> +  SmallVector<int,256> MaskL(VecLen), MaskR(VecLen);
> >>> +  OpRef L = shuffp1(ShuffleMask(MaskL), Va, Results);
> >>> +  OpRef R = shuffp1(ShuffleMask(MaskR), Vb, Results);
> >>> +  if (!L.isValid() || !R.isValid())
> >>> +    return OpRef::fail();
> >>> +
> >>> +  // Mux the results.
> >>> +  SmallVector<uint8_t,256> Bytes(VecLen);
> >>> +  for (int I = 0; I != VecLen; ++I) {
> >>> +    if (MaskL[I] != -1)
> >>> +      Bytes[I] = 0xFF;
> >>> +  }
> >>> +  return vmuxp(Bytes, L, R, Results);
> >>> +}
> >>> +
> >>> +bool HvxSelector::scalarizeShuffle(ArrayRef<int> Mask, const SDLoc
> &dl,
> >>> +                                   MVT ResTy, SDValue Va, SDValue Vb,
> >>> +                                   SDNode *N) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  MVT ElemTy = ResTy.getVectorElementType();
> >>> +  assert(ElemTy == MVT::i8);
> >>> +  unsigned VecLen = Mask.size();
> >>> +  bool HavePairs = (2*HwLen == VecLen);
> >>> +  MVT SingleTy = getSingleVT(MVT::i8);
> >>> +
> >>> +  SmallVector<SDValue,128> Ops;
> >>> +  for (int I : Mask) {
> >>> +    if (I < 0) {
> >>> +      Ops.push_back(ISel.selectUndef(dl, ElemTy));
> >>> +      continue;
> >>> +    }
> >>> +    SDValue Vec;
> >>> +    unsigned M = I;
> >>> +    if (M < VecLen) {
> >>> +      Vec = Va;
> >>> +    } else {
> >>> +      Vec = Vb;
> >>> +      M -= VecLen;
> >>> +    }
> >>> +    if (HavePairs) {
> >>> +      if (M < HwLen) {
> >>> +        Vec = DAG.getTargetExtractSubreg(Hexagon::vsub_lo, dl,
> SingleTy, Vec);
> >>> +      } else {
> >>> +        Vec = DAG.getTargetExtractSubreg(Hexagon::vsub_hi, dl,
> SingleTy, Vec);
> >>> +        M -= HwLen;
> >>> +      }
> >>> +    }
> >>> +    SDValue Idx = DAG.getConstant(M, dl, MVT::i32);
> >>> +    SDValue Ex = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ElemTy,
> {Vec, Idx});
> >>> +    SDValue L = Lower.LowerOperation(Ex, DAG);
> >>> +    assert(L.getNode());
> >>> +    Ops.push_back(L);
> >>> +  }
> >>> +
> >>> +  SDValue LV;
> >>> +  if (2*HwLen == VecLen) {
> >>> +    SDValue B0 = DAG.getBuildVector(SingleTy, dl, {Ops.data(),
> HwLen});
> >>> +    SDValue L0 = Lower.LowerOperation(B0, DAG);
> >>> +    SDValue B1 = DAG.getBuildVector(SingleTy, dl, {Ops.data()+HwLen,
> HwLen});
> >>> +    SDValue L1 = Lower.LowerOperation(B1, DAG);
> >>> +    // XXX CONCAT_VECTORS is legal for HVX vectors. Legalizing
> (lowering)
> >>> +    // functions may expect to be called only for illegal operations,
> so
> >>> +    // make sure that they are not called for legal ones. Develop a
> better
> >>> +    // mechanism for dealing with this.
> >>> +    LV = DAG.getNode(ISD::CONCAT_VECTORS, dl, ResTy, {L0, L1});
> >>> +  } else {
> >>> +    SDValue BV = DAG.getBuildVector(ResTy, dl, Ops);
> >>> +    LV = Lower.LowerOperation(BV, DAG);
> >>> +  }
> >>> +
> >>> +  assert(!N->use_empty());
> >>> +  ISel.ReplaceNode(N, LV.getNode());
> >>> +  DAG.RemoveDeadNodes();
> >>> +
> >>> +  std::deque<SDNode*> SubNodes;
> >>> +  SubNodes.push_back(LV.getNode());
> >>> +  for (unsigned I = 0; I != SubNodes.size(); ++I) {
> >>> +    for (SDValue Op : SubNodes[I]->ops())
> >>> +      SubNodes.push_back(Op.getNode());
> >>> +  }
> >>> +  while (!SubNodes.empty()) {
> >>> +    SDNode *S = SubNodes.front();
> >>> +    SubNodes.pop_front();
> >>> +    if (S->use_empty())
> >>> +      continue;
> >>> +    // This isn't great, but users need to be selected before any
> nodes that
> >>> +    // they use. (The reason is to match larger patterns, and avoid
> nodes that
> >>> +    // cannot be matched on their own, e.g. ValueType, TokenFactor,
> etc.).
> >>> +    bool PendingUser = llvm::any_of(S->uses(), [&SubNodes](const
> SDNode *U) {
> >>> +                         return llvm::any_of(SubNodes, [U](const
> SDNode *T) {
> >>> +                           return T == U;
> >>> +                         });
> >>> +                       });
> >>> +    if (PendingUser)
> >>> +      SubNodes.push_back(S);
> >>> +    else
> >>> +      ISel.Select(S);
> >>> +  }
> >>> +
> >>> +  DAG.RemoveDeadNodes();
> >>> +  return true;
> >>> +}
> >>> +
> >>> +OpRef HvxSelector::contracting(ShuffleMask SM, OpRef Va, OpRef Vb,
> >>> +                               ResultStack &Results) {
> >>> +  DEBUG_WITH_TYPE("isel", {dbgs() << __func__ << '\n';});
> >>> +  if (!Va.isValid() || !Vb.isValid())
> >>> +    return OpRef::fail();
> >>> +
> >>> +  // Contracting shuffles, i.e. instructions that always discard some
> bytes
> >>> +  // from the operand vectors.
> >>> +  //
> >>> +  // V6_vshuff{e,o}b
> >>> +  // V6_vdealb4w
> >>> +  // V6_vpack{e,o}{b,h}
> >>> +
> >>> +  int VecLen = SM.Mask.size();
> >>> +  std::pair<int,unsigned> Strip = findStrip(SM.Mask, 1, VecLen);
> >>> +  MVT ResTy = getSingleVT(MVT::i8);
> >>> +
> >>> +  // The following shuffles only work for bytes and halfwords. This
> requires
> >>> +  // the strip length to be 1 or 2.
> >>> +  if (Strip.second != 1 && Strip.second != 2)
> >>> +    return OpRef::fail();
> >>> +
> >>> +  // The patterns for the shuffles, in terms of the starting offsets
> of the
> >>> +  // consecutive strips (L = length of the strip, N = VecLen):
> >>> +  //
> >>> +  // vpacke:    0, 2L, 4L ... N+0, N+2L, N+4L ...      L = 1 or 2
> >>> +  // vpacko:    L, 3L, 5L ... N+L, N+3L, N+5L ...      L = 1 or 2
> >>> +  //
> >>> +  // vshuffe:   0, N+0, 2L, N+2L, 4L ...               L = 1 or 2
> >>> +  // vshuffo:   L, N+L, 3L, N+3L, 5L ...               L = 1 or 2
> >>> +  //
> >>> +  // vdealb4w:  0, 4, 8 ... 2, 6, 10 ... N+0, N+4, N+8 ... N+2, N+6,
> N+10 ...
> >>> +
> >>> +  // The value of the element in the mask following the strip will
> decide
> >>> +  // what kind of a shuffle this can be.
> >>> +  int NextInMask = SM.Mask[Strip.second];
> >>> +
> >>> +  // Check if NextInMask could be 2L, 3L or 4, i.e. if it could be a
> mask
> >>> +  // for vpack or vdealb4w. VecLen > 4, so NextInMask for vdealb4w
> would
> >>> +  // satisfy this.
> >>> +  if (NextInMask < VecLen) {
> >>> +    // vpack{e,o} or vdealb4w
> >>> +    if (Strip.first == 0 && Strip.second == 1 && NextInMask == 4) {
> >>> +      int N = VecLen;
> >>> +      // Check if this is vdealb4w (L=1).
> >>> +      for (int I = 0; I != N/4; ++I)
> >>> +        if (SM.Mask[I] != 4*I)
> >>> +          return OpRef::fail();
> >>> +      for (int I = 0; I != N/4; ++I)
> >>> +        if (SM.Mask[I+N/4] != 2 + 4*I)
> >>> +          return OpRef::fail();
> >>> +      for (int I = 0; I != N/4; ++I)
> >>> +        if (SM.Mask[I+N/2] != N + 4*I)
> >>> +          return OpRef::fail();
> >>> +      for (int I = 0; I != N/4; ++I)
> >>> +        if (SM.Mask[I+3*N/4] != N+2 + 4*I)
> >>> +          return OpRef::fail();
> >>> +      // Matched mask for vdealb4w.
> >>> +      Results.push(Hexagon::V6_vdealb4w, ResTy, {Vb, Va});
> >>> +      return OpRef::res(Results.top());
> >>> +    }
> >>> +
> >>> +    // Check if this is vpack{e,o}.
> >>> +    int N = VecLen;
> >>> +    int L = Strip.second;
> >>> +    // Check if the first strip starts at 0 or at L.
> >>> +    if (Strip.first != 0 && Strip.first != L)
> >>> +      return OpRef::fail();
> >>> +    // Examine the rest of the mask.
> >>> +    for (int I = L; I < N/2; I += L) {
> >>> +      auto S = findStrip(subm(SM.Mask,I), 1, N-I);
> >>> +      // Check whether the mask element at the beginning of each strip
> >>> +      // increases by 2L each time.
> >>> +      if (S.first - Strip.first != 2*I)
> >>> +        return OpRef::fail();
> >>> +      // Check whether each strip is of the same length.
> >>> +      if (S.second != unsigned(L))
> >>> +        return OpRef::fail();
> >>> +    }
> >>> +
> >>> +    // Strip.first == 0  =>  vpacke
> >>> +    // Strip.first == L  =>  vpacko
> >>> +    assert(Strip.first == 0 || Strip.first == L);
> >>> +    using namespace Hexagon;
> >>> +    NodeTemplate Res;
> >>> +    Res.Opc = Strip.second == 1 // Number of bytes.
> >>> +                  ? (Strip.first == 0 ? V6_vpackeb : V6_vpackob)
> >>> +                  : (Strip.first == 0 ? V6_vpackeh : V6_vpackoh);
> >>> +    Res.Ty = ResTy;
> >>> +    Res.Ops = { Vb, Va };
> >>> +    Results.push(Res);
> >>> +    r
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180109/358fce41/attachment-0001.html>


More information about the llvm-commits mailing list