[LLVMdev] Long-Term ISel Design

Fri Apr 8 22:32:26 PDT 2011

On Mar 27, 2011, at 1:16 PM, David A. Greene wrote:

> Chris Lattner <clattner at apple.com> writes:
> 
>>> We would still keep the existing pre-table-driven-isel passes so we'd
>>> still have a chance to do some cleanup before the main table-driven
>>> isel.
>>> 
>>> Obviously a lot of details have to be worked out.
>> 
>> I'm not seeing how this is useful for shuffles.  Since tblgen doesn't
>> generate table based matching for *any* shuffles, all of the matching
>> code would end up as C++ code in X86ISelDagToDag, which would give us
>> all of the problems we had before by moving to X86ISD nodes.
> 
> bool X86TargetLowering::isShuffleMaskLegal(const SmallVectorImpl<int> &M,
>                                           EVT VT) const {
>  // FIXME: pshufb, blends, shifts.
>  return (VT.getVectorNumElements() == 2 ||
>          ShuffleVectorSDNode::isSplatMask(&M[0], VT) ||
>          isMOVLMask(M, VT) ||
>          isSHUFPMask(M, VT) ||
>          ...
> }
> 
> We git rid of this call to isSHUFPMask, which currently happens during
> legalize.  Instead of trying to see if shuffles are already legal, just
> run the isntruction selector and see what it can find.  For everything
> else, we know by definition it's not legal and we have to manually
> lower it somehow, just like today.  The only difference is that we do
> it after (or in conjunction with) isel instead of before it.  This will
> eliminate a lot of confusing and redundant code in X86ISelLowering.

I don't really see where you're going with this.  I agree that there is confusing and fragile code for shuffle lowering, but this is what Bruno is working on fixing.  To me, the problem is that legalize of SHUFFLE_VECTOR eliminates shuffles that are not directly matchable into a single machine instruction, but preserves ones that do match.  This means that there is duplicated code in both legalize and isel that has to know that a shuffle is an "unpacklps" or whatever.  Other parts of the code that generate shuffles generally know exactly what shuffle they want, and have to synthesize a shuffle node with the right operands to get it to select into the right operation.

To me at least, the right solution to this problem is to make one X86ISD node for each legal shuffle.  This has several desirable properties:

1. Legalize is now responsible for eliminating ALL vector_shuffle nodes, and it is really obvious what shuffles it is generating when reading the dag dumps.
2. Other code that generates shuffles can just generate their exact node.
3. The duplicated isel code is gone, because there is a simple mapping between X86ISD nodes and machine instrs.
4. We push fewer shuffle masks through the compiler which is a marginal speedup.

To me, there are two missing pieces here:
1. The x86 backend is half way converted over to the new model.
2. We *really really* want a way to express shuffle masks in .td files, and tblgen should generate the Legalize code.  The "def X86Punpcklbw" should include a (per cpu) cost for the shuffle as well as the mask that it matches, or a predicate to run if it the mask isn't constant.

>> I think that this is just because the current code is in a half
>> converted state.  Bruno can say more, but the ultimate goal is to make
>> ISD::SHUFFLE completely illegal on X86, so you'd never have this sort
>> of thing.
> 
> Erm...how would that work exactly?  Manual matching and lowering for all
> shuffles?  That sounds like a huge step backward.

I'm not sure how it is a step backward.  Legalize already does matching to "know and ignore" a shuffle that will match a legal instruction.  It isn't hard to change that code to "know and transform" it to the instruction.  The net result of this is that the duplicate code in isel goes away.

> Shuffle is a key
> operation on X86.  Anything that can automate its selection is a huge
> win in terms of correctness.  After writing all of the shuffle matching
> code for AVX (on its way into trunk), I don't ever, ever, ever want to
> have to do anything like that ever again, ever.  :)

Yes, I certainly agree that tblgen should generate the shuffle matching code, I just think that the generated code should be executed in the LegalizeOp(shuffle vector) code.

-Chris