[LLVMdev] Long-Term ISel Design

Sun Mar 27 13:16:00 PDT 2011

Chris Lattner <clattner at apple.com> writes:

>> We would still keep the existing pre-table-driven-isel passes so we'd
>> still have a chance to do some cleanup before the main table-driven
>> isel.
>> 
>> Obviously a lot of details have to be worked out.
>
> I'm not seeing how this is useful for shuffles.  Since tblgen doesn't
> generate table based matching for *any* shuffles, all of the matching
> code would end up as C++ code in X86ISelDagToDag, which would give us
> all of the problems we had before by moving to X86ISD nodes.

Actually, it would be matching code in X86ISelLowering, in the form
of is.*Mask.  For example:

In X86ISelLowering.cpp:

bool X86ISelLowering::isSHUFPMask(...) {
  ...
}
unsigned X86::getShuffleSHUFImmediate(SDNode *N) {
  ...
}

In X86InstrFragmentsSIMD.td:

def SHUFFLE_get_shuf_imm : SDNodeXForm<vector_shuffle, [{
  return getI8Imm(X86::getShuffleSHUFImmediate(N));
}]>;
def shufp : PatFrag<(ops node:$lhs, node:$rhs),
                    (vector_shuffle node:$lhs, node:$rhs), [{
  return X86::isSHUFPMask(cast<ShuffleVectorSDNode>(N));
}], SHUFFLE_get_shuf_imm>;

Ok, so far this is exactly the same as today.

What we eliminate is this:

void X86TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) {
  ...
  isShuffleMaskLegal(...)
}

bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int> &M,
                                           EVT VT) const {
  // FIXME: pshufb, blends, shifts.
  return (VT.getVectorNumElements() == 2 ||
          ShuffleVectorSDNode::isSplatMask(&M[0], VT) ||
          isMOVLMask(M, VT) ||
          isSHUFPMask(M, VT) ||
          ...
}

We git rid of this call to isSHUFPMask, which currently happens during
legalize.  Instead of trying to see if shuffles are already legal, just
run the isntruction selector and see what it can find.  For everything
else, we know by definition it's not legal and we have to manually
lower it somehow, just like today.  The only difference is that we do
it after (or in conjunction with) isel instead of before it.  This will
eliminate a lot of confusing and redundant code in X86ISelLowering.

>> Well, it dopesn't _have_to_ form an X86ISD node.  I don't do that now.
>> But it's fragile in the sense that no one else should mess with that
>> piece of the DAG.

> I don't consider that an acceptable approach.  There is no way to
> prevent something from CSE'ing a load away and "breaking" the dag, or
> moving an add between the nodes etc.  You're violating a design
> principle of selection dags.

Fair enough.  The current proposal eliminates any need to do any of
this.  I did end up creating a special X86 node for VBROADCAST because
you're right, its too brittle to rely on something else not breaking it.

>> But the real point is that in forming the X86ISD node currently, I'm
>> doing exaclty what the tblgen-generated code already does.  If the
>> shuffle doesn't take a memory operand, then I have to lower it to
>> something else.  Where I do that (before or after table-driven isel)
>> doesn't matter.  I do the same work either way.  But by doing it after I
>> avoid writing duplicate DAG matching code in the case where the operand
>> is in memory.

> I don't agree, and I don't see why this is as bad as you're saying.
> The code creating these nodes is already target specific.  You seem to
> be objecting to this because it is easier to write .td files (which
> turn into generated isel code) than it is to write legalize code in
> C++.  If that is the problem you've identified, then why not make it
> possible to write legalize code in .td files?

I _am_ saying writing .td files is much easier, and I'm saying that half
the legalize code is already there, the part that checks whether stuff
is already legal.  I'm not trying to make the manual lowering case
easier.  I'm trying to avoid having to write manual code to discover
patterns that isel is already perfectly capable of discovering on its
own.

Take the broadcast case.  The .td is something like this (theoretical,
because I did end up creating an X86broadcast node) :

(set VR128:$dst, (v8f32 (splat_lo (v4f32 scalar_to_vector (f32 load addr:$src)),
                                  (undef))))

It's fairly straightforward.

The way things are now, in legalize I have to look for broadcast-like
operations and make sure they come from memory, something like:

if (...we have avx, this is a splat from element 0, etc....) {
  if (...operand 2 is undef...)
    if (...operand 1 is a scalar_to_vector...)
      if (...scalar_to_vector operand is a load...)
         if (...the load only has one use and is foldable...)
           ...transform this node to an X86braodcast node...
}

then write the .td pattern as:

(set VR128:$dst, (v8f32 (X86vbroadcast (f32 load addr:$src))))

In addition, I have to write the code to lower a splat_lo-type
thing that comes from a register (can't use VBROADCAST) but I
have to do that no matter what since it's not a legal DAG.

I had to write a whole bunch of manual matching code.  This is
error-prone is completely automatable by tblegen.  The current
difficulty is that the tblegen-generated matcher runs too late, after we
have a legal DAG guarantee.  If it ran earlier and instead of aborting
simply returned the DAGs (or fragments) it could not match, I could
avoid writing all this nasty code.

Using TableGen to generate legalize code (the other half that really has
to morph nodes) is an idea I've tossed around from time to time.  I ran
into this need very early on.  It would be nice to write patterns like:

def : Pat<(...some dag...),
          (...some other, non-target-specific dag...)>;

I think there's a baked-in assumption in TableGen that the pattern
matched to is a machine instruction dag.  The memory is fuzzy but I
remember trying to do something like this and failing utterly.  :)

> I think that this is just because the current code is in a half
> converted state.  Bruno can say more, but the ultimate goal is to make
> ISD::SHUFFLE completely illegal on X86, so you'd never have this sort
> of thing.

Erm...how would that work exactly?  Manual matching and lowering for all
shuffles?  That sounds like a huge step backward.  Shuffle is a key
operation on X86.  Anything that can automate its selection is a huge
win in terms of correctness.  After writing all of the shuffle matching
code for AVX (on its way into trunk), I don't ever, ever, ever want to
have to do anything like that ever again, ever.  :)

                                  -Dave