[PATCH] D21534: GlobalISel: first outline of legalization interface.

Thu Jul 7 19:12:12 PDT 2016

On Thu, Jul 7, 2016 at 6:18 PM, Quentin Colombet <qcolombet at apple.com>
wrote:

>
> On Jul 7, 2016, at 4:24 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>
> On Thu, Jul 7, 2016 at 10:39 AM, Quentin Colombet <qcolombet at apple.com>
> wrote:
>
>> Hi Eli,
>>
>> Thanks for your feedbacks.
>>
>> Answers inlined.
>>
>> On Jun 22, 2016, at 4:21 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>
>> On Wed, Jun 22, 2016 at 2:39 PM, Tim Northover <t.p.northover at gmail.com>
>> wrote:
>>
>>> > I'm not sure if this is really going to work the way you want. On x86
>>> with
>>> > AVX (but not AVX2), is LOAD <8 x i32> legal?  I mean, you could
>>> declare that
>>> > it is... but you're going to end up with a bunch of vector shuffles
>>> trying
>>> > to legalize ADD <8 x i32>. You could clean it up afterwards with some
>>> sort
>>> > of optimization pass to split vectors where it's profitable... but it
>>> gets
>>> > complicated when you start dealing with values with multiple uses and
>>> PHI
>>> > nodes.
>>>
>>> This still seems to be something for RegBankSelect to me. It's going
>>> to see something like
>>>
>>>     %0(256) = G_LOAD <4 x i32> ...
>>>     %1(128) = G_EXTRACT <2 x i32> %0, 0
>>>     %2(128) = G_EXTRACT <2 x i32> %0, 1
>>>     %3(128) = G_ADD <2 x i32> %1, ...
>>>     %4(128) = G_ADD <2 x i32> %2, ...
>>>     %5(256) = G_SEQ <4 x i32> %3 %4
>>>
>>> and ought to have the cost model necessary to decide that (XMM, XMM)
>>> is the best register class (in whatever representation it has, an
>>> extension of the .td RegClasses with tuples) rather than YMM.
>>>
>>
>> We run RegBankSelect after legalization?  Then what happens?  Presumably,
>> if you have a load or arithmetic operation whose result ends up in (XMM,
>> XMM), you then want to split it so you have two operations which each end
>> up in one xmm register... then you have a bunch of new operations which
>> haven't been through legalization and register bank selection, so you need
>> to run legalization and RegBankSelect from the top again?
>>
>>
>> No, we do not run the full legalizer. RegBankSelect creates very specific
>> operations (glorified copies per say, this includes extract and
>> build_sequence) and the plan is to apply the legalizer helper on them.
>>
>
> That works to split a vector... not so much to split an integer.
>
>
> Ok, let me clarify, I think I see the misunderstanding.
> The generic code of RegBankSelect will only insert glorified version of
> copies.
> The target specific code can do whatever it wants like splitting add and
> such. The caveat is that whatever the target does, it needs to be able to
> select it, so it must be legal (even if the target runs the legalizer
> helper when doing the remapping).
>

Oh, okay; so RegBankSelect basically only makes copies, but it has hooks if
the target wants tot try to do something fancy.  That makes sense.

>
> For example, to split an i64 add, you end up with adde operations... which
> probably aren't legal.  Not sure if that will come up in practice.
>
>
>> Or do we have some sort of restricted post-RegBankSelect legalizer which
>> doesn't require a second pass?
>>
>>
>> No, see my previous answer.
>>
>>
>> If we're doing custom lowering before RegBankSelect, we could end up
>> being effectively forced to choose a bank during legalization, without the
>> benefit of a cost model.
>>
>>
>> Even custom lowered instructions can be remapped. The target can specify
>> alternative instructions mapping for every instruction, generic or not.
>>
>
> I assume by "remapped" you mean there's a target hook to transform the
> instruction (tables probably aren't enough in some cases).  And I guess it
> would be a requirement that all operations generated at this point are
> legal?
>
>
> That is correct.
>
> That can be kind of awkward in some cases.
>
>
> How so? (I guess your later example try to convey that, but I did not get
> the problem)
>

The problem I was thinking of is the cost computation... but I guess if the
target is doing it, it can figure out the relevant costs itself.  Okay.

>
>
>
>>
>> For example, if you need to custom-lower an <8 x i32> shuffle on AVX, the
>> result could look substantially different depending on whether the result
>> needs to be in on YMM or two XMM registers.  Things become even more
>> awkward if you don't distinguish between integer and vector registers on
>> x86; for example, if I have an i64 add on x86-32, does it need to be
>> widened to <2 x i64> or split into two i32 ADDE operations?
>>
>>
>> I don’t understand the example. Also how is this different from what we
>> currently do?
>>
>
> For the <8 x i32> shuffle example, you have an illegal shuffle;
> legalization splits it into multiple shuffles (this would happen before
> RegBankSelect, right?). RegBankSelect sees the shuffles, and considers
> splitting them... but how does it figure out how many shuffles you end up
> with?
>
>
> That’s up to the target. It will say it maps the <8 x i32> on N definition
> and materialize that target.
>
>   One way is to merge the shuffles, but that involves RegBankSelect
> special-casing shuffles.
>
>
> Yeah, merging is not something I wanted to consider in RegBankSelect, at
> least for now.
> One thing to keep in mind is that unlike SDISel, you can insert new target
> specific (or target independent) passes wherever you want in the pipeline.
> Therefore if we need a shuffle combiner of some sort, we do not have to do
> it in regbankselect or legalization.
>
> And we don't really want to end up with a bunch of special cases in
> RegBankSelect.  IIRC, this is an existing problem with SDISel to some
> extent because we consider <8 x i32> legal, so it's not really something
> new, but it's worth considering.
>
> A different example: suppose you have an `add <1 x i64>` and an `add i64`
> on x86-32.  (`<1 x i64>` comes up with code ported from MMX.)  Currently,
> ISel will put the former into a vector register, and the latter into
> integer registers.  (This isn't ideal, but it generally works out
> reasonably well.)  With GlobalISel, both are just an i64 add, and i64 isn't
> legal, so we have to decide: do we WidenVector or NarrowScalar?  Without
> any context, there isn't an obvious right answer.
>
>
> The options I see are:
> - Mark it legal, you know how to select it and let the regbankselect
> decide. I actually don’t get why it is illegal in your example.
> - Mark it custom and look for context around.
>
> I would recommend the first approach.
>
> We have a few options:
>
> 1. always WidenVector, and end up with terrible code if we need to
> transfer the result to integer registers;
> 2. always NarrowScalar, and end up with terrible code if we need to
> transfer the result to xmm registers;
> 3. pretend i64 add is legal, and let RegBankSelect assign it to either a
> fake vector register or a fake integer register
>
>
> Why pretend, this is legal, right?
>
> To me legal means we know how to select it and with that definition add
> i64 seems legal to me.
>

This is x86-32, so the options are either a 32-bit GPR or an 128-bit XMM
register.  I mean, I guess you could consider it legal... but the same
reasoning would lead us to conclude that "add <4 x i8>" is legal, so we
would end up with a bunch of register classes for registers which don't
really exist.

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160707/7ac13fef/attachment.html>