[LLVMdev] Long-Term ISel Design

Thu Mar 17 09:32:01 PDT 2011

Chris Lattner <clattner at apple.com> writes:

>> 1. We have special target-specific operators for certain shuffles in X86,
>>   such as X86unpckl.

> It also eliminates a lot of fragility.  Before doing this, X86
> legalize would have to be very careful to specifically form shuffles
> that it knew isel would turn into (e.g.) unpck operations.  Now
> instead of forming specific carefully constructed shuffle masks (not
> making sure other code doesn't violate them) it can just directly form
> the X86ISD node.

Right.  What I've presented would reverse this.  Rather than making
Legalize have to know about what table-driven isel can and cannot do,
have table-driven isel run first, see what it can do and then leave
the rest for manual selection.

We would still keep the existing pre-table-driven-isel passes so we'd
still have a chance to do some cleanup before the main table-driven
isel.

Obviously a lot of details have to be worked out.

>> 2. Sometimes DAGs are legal in some contexts but not others and it is a
>>   pain to deal with.  A good example is VBROADCAST, where a <0,0,0,0>
>>   shuffle is natively supported if the source vector is in memory.
>>   Otherwise it's not legal and manual lowering is required.  In this
>>   case the legality check is doing the DAG match by hand, replicating
>>   what TableGen-produced code already does.

> Yes, this isn't good.  Instead, the shuffle should be legalized to
> something that takes a pointer (memory operand).  That means that X86
> isel would form the *fully legal* X86ISD node, and nothing would be
> able to break it and it could never fail to match.

Well, it dopesn't _have_to_ form an X86ISD node.  I don't do that now.
But it's fragile in the sense that no one else should mess with that
piece of the DAG.

But the real point is that in forming the X86ISD node currently, I'm
doing exaclty what the tblgen-generated code already does.  If the
shuffle doesn't take a memory operand, then I have to lower it to
something else.  Where I do that (before or after table-driven isel)
doesn't matter.  I do the same work either way.  But by doing it after I
avoid writing duplicate DAG matching code in the case where the operand
is in memory.

>> These two examples are related: we're duplicating functionality manually
>> that's already available automatically.
>
> Not sure what you mean by this.

I mean that in legalize/lowering we're massaging the DAG to get it into
a state where tabel-driven isel can match it.  There is a lot of code
like this:

if (shuffle_is_MOVL)
  do_nothing_and_return

It's duplicating exactly the checks that the table-driven isel does
later.  In the VBROADCASTSS/D case, it's doing an entire DAG match
to check whether it's implementable with VBROADCASTSS/D.

Why not just let table-driven isel run first and take care of these
checks just once?  If something doesn't match, we then know it needs
manual lowering and selection.

>>                  legalize
>>                     |
>>                     V
>>        manual lowering (X86ISelLowering)
>>                     |
>>                     V
>>         manual isel (X86ISelDAGToDAG)
>>                     |
>>                     V
>>    table-driven isel (.td files/X86GenDAGISel)
>>                     |
>>                     V
>>      manual isel (some to-be-design piece)
>

> I'm not sure what you mean here.  Are you suggesting that these be
> completely separate passes over the dag?  Why do "manual isel" and
> "table driven isel" as separate passes?  If they are interlaced, then
> how is this different than what we already have?

No, not as separate passes.  Right now we have code like this in
X86ISelDAGToDAG:

X86DAGToDAGISel::Select(SDNode *Node) {
  ...
  switch (Opcode) {
    ...do a bunch of manual selection...
  }
  // If we get here we didn't select manually.
  SelectCode();  // Select via table-driven isel, abort if no match.
}

What I'm proposing is that we do this:

X86DAGToDAGISel::Select(SDNode *Node) {
  ...
  switch (Opcode) {
    ...do a bunch of manual selection, less than before...
  }

  // If we get here we didn't select manually.

  result = SelectCode();  // Select via table-driven isel.

  if (result_is_good()) {
    return;
  }

  switch (Opcode) {
    ...do a bunch of manual selection, some that used to be above and in
       legalize/lowering...
  }

  cannot_select_abort();
}

> You're saying that we do this on a node-by-node basis?

You mean on an SDNode basis?  I don't think so.  As I said, details have
to be worked out but I imagine we'd send the selection DAG to the
tblgen-generated code as we do now and any "leftover" bits would have
to be processed afterward by the manual selector.

Now, there is a phase ordering issue in that some of the
legalize/lowering code massages the tree so the tblgen stuff can make a
"good" match.  We probably still want to do that in some cases so we
keep that code where it is.  What I'm aiming at getting rid of is all of
the code that does:

if (this_is_already_matchable()) {
  return;
}

> The reason that codegen aborts on unselectable operations is that they
> are invalid and should not be formed.  Your example of vbroadcast is a
> great one: the X86ISD node for it *should not take a vector register
> input*.  If it does, then the X86ISD node is incorrectly defined.

What I'm saying is that there would be no X86ISD node.  If the pattern
is there, tblgen-produced code will match it.  If not, we have to lower
it manually and would just do what we'd have done in legalize/lowering
before, the difference being we'd now do it after table-driven isel
rather than before.

                                 -Dave