[PATCH] D21534: GlobalISel: first outline of legalization interface.

Fri Jul 8 10:54:55 PDT 2016

On Fri, Jul 8, 2016 at 10:36 AM, Quentin Colombet <qcolombet at apple.com>
wrote:

>
> On Jul 7, 2016, at 7:12 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>
> On Thu, Jul 7, 2016 at 6:18 PM, Quentin Colombet <qcolombet at apple.com>
> wrote:
>
>>
>> On Jul 7, 2016, at 4:24 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>
>> On Thu, Jul 7, 2016 at 10:39 AM, Quentin Colombet <qcolombet at apple.com>
>> wrote:
>>
>>> Hi Eli,
>>>
>>> Thanks for your feedbacks.
>>>
>>> Answers inlined.
>>>
>>> On Jun 22, 2016, at 4:21 PM, Eli Friedman <eli.friedman at gmail.com>
>>> wrote:
>>>
>>> On Wed, Jun 22, 2016 at 2:39 PM, Tim Northover <t.p.northover at gmail.com>
>>>  wrote:
>>>
>>>> > I'm not sure if this is really going to work the way you want. On x86
>>>> with
>>>> > AVX (but not AVX2), is LOAD <8 x i32> legal?  I mean, you could
>>>> declare that
>>>> > it is... but you're going to end up with a bunch of vector shuffles
>>>> trying
>>>> > to legalize ADD <8 x i32>. You could clean it up afterwards with some
>>>> sort
>>>> > of optimization pass to split vectors where it's profitable... but it
>>>> gets
>>>> > complicated when you start dealing with values with multiple uses and
>>>> PHI
>>>> > nodes.
>>>>
>>>> This still seems to be something for RegBankSelect to me. It's going
>>>> to see something like
>>>>
>>>>     %0(256) = G_LOAD <4 x i32> ...
>>>>     %1(128) = G_EXTRACT <2 x i32> %0, 0
>>>>     %2(128) = G_EXTRACT <2 x i32> %0, 1
>>>>     %3(128) = G_ADD <2 x i32> %1, ...
>>>>     %4(128) = G_ADD <2 x i32> %2, ...
>>>>     %5(256) = G_SEQ <4 x i32> %3 %4
>>>>
>>>> and ought to have the cost model necessary to decide that (XMM, XMM)
>>>> is the best register class (in whatever representation it has, an
>>>> extension of the .td RegClasses with tuples) rather than YMM.
>>>>
>>>
>>> We run RegBankSelect after legalization?  Then what happens?
>>> Presumably, if you have a load or arithmetic operation whose result ends up
>>> in (XMM, XMM), you then want to split it so you have two operations which
>>> each end up in one xmm register... then you have a bunch of new operations
>>> which haven't been through legalization and register bank selection, so you
>>> need to run legalization and RegBankSelect from the top again?
>>>
>>>
>>> No, we do not run the full legalizer. RegBankSelect creates very
>>> specific operations (glorified copies per say, this includes extract and
>>> build_sequence) and the plan is to apply the legalizer helper on them.
>>>
>>
>> That works to split a vector... not so much to split an integer.
>>
>>
>> Ok, let me clarify, I think I see the misunderstanding.
>> The generic code of RegBankSelect will only insert glorified version of
>> copies.
>> The target specific code can do whatever it wants like splitting add and
>> such. The caveat is that whatever the target does, it needs to be able to
>> select it, so it must be legal (even if the target runs the legalizer
>> helper when doing the remapping).
>>
>
> Oh, okay; so RegBankSelect basically only makes copies, but it has hooks
> if the target wants tot try to do something fancy.
>
>
> Exactly.
>
> That makes sense.
>
>
>>
>> For example, to split an i64 add, you end up with adde operations...
>> which probably aren't legal.  Not sure if that will come up in practice.
>>
>>
>>> Or do we have some sort of restricted post-RegBankSelect legalizer which
>>> doesn't require a second pass?
>>>
>>>
>>> No, see my previous answer.
>>>
>>>
>>> If we're doing custom lowering before RegBankSelect, we could end up
>>> being effectively forced to choose a bank during legalization, without the
>>> benefit of a cost model.
>>>
>>>
>>> Even custom lowered instructions can be remapped. The target can specify
>>> alternative instructions mapping for every instruction, generic or not.
>>>
>>
>> I assume by "remapped" you mean there's a target hook to transform the
>> instruction (tables probably aren't enough in some cases).  And I guess it
>> would be a requirement that all operations generated at this point are
>> legal?
>>
>>
>> That is correct.
>>
>> That can be kind of awkward in some cases.
>>
>>
>> How so? (I guess your later example try to convey that, but I did not get
>> the problem)
>>
>
> The problem I was thinking of is the cost computation... but I guess if
> the target is doing it, it can figure out the relevant costs itself.  Okay.
>
>
>>
>>
>>
>>>
>>> For example, if you need to custom-lower an <8 x i32> shuffle on AVX,
>>> the result could look substantially different depending on whether the
>>> result needs to be in on YMM or two XMM registers.  Things become even more
>>> awkward if you don't distinguish between integer and vector registers on
>>> x86; for example, if I have an i64 add on x86-32, does it need to be
>>> widened to <2 x i64> or split into two i32 ADDE operations?
>>>
>>>
>>> I don’t understand the example. Also how is this different from what we
>>> currently do?
>>>
>>
>> For the <8 x i32> shuffle example, you have an illegal shuffle;
>> legalization splits it into multiple shuffles (this would happen before
>> RegBankSelect, right?). RegBankSelect sees the shuffles, and considers
>> splitting them... but how does it figure out how many shuffles you end up
>> with?
>>
>>
>> That’s up to the target. It will say it maps the <8 x i32> on N
>> definition and materialize that target.
>>
>>   One way is to merge the shuffles, but that involves RegBankSelect
>> special-casing shuffles.
>>
>>
>> Yeah, merging is not something I wanted to consider in RegBankSelect, at
>> least for now.
>> One thing to keep in mind is that unlike SDISel, you can insert new
>> target specific (or target independent) passes wherever you want in the
>> pipeline. Therefore if we need a shuffle combiner of some sort, we do not
>> have to do it in regbankselect or legalization.
>>
>> And we don't really want to end up with a bunch of special cases in
>> RegBankSelect.  IIRC, this is an existing problem with SDISel to some
>> extent because we consider <8 x i32> legal, so it's not really something
>> new, but it's worth considering.
>>
>> A different example: suppose you have an `add <1 x i64>` and an `add i64`
>> on x86-32.  (`<1 x i64>` comes up with code ported from MMX.)  Currently,
>> ISel will put the former into a vector register, and the latter into
>> integer registers.  (This isn't ideal, but it generally works out
>> reasonably well.)  With GlobalISel, both are just an i64 add, and i64 isn't
>> legal, so we have to decide: do we WidenVector or NarrowScalar?  Without
>> any context, there isn't an obvious right answer.
>>
>>
>> The options I see are:
>> - Mark it legal, you know how to select it and let the regbankselect
>> decide. I actually don’t get why it is illegal in your example.
>> - Mark it custom and look for context around.
>>
>> I would recommend the first approach.
>>
>> We have a few options:
>>
>> 1. always WidenVector, and end up with terrible code if we need to
>> transfer the result to integer registers;
>> 2. always NarrowScalar, and end up with terrible code if we need to
>> transfer the result to xmm registers;
>> 3. pretend i64 add is legal, and let RegBankSelect assign it to either a
>> fake vector register or a fake integer register
>>
>>
>> Why pretend, this is legal, right?
>>
>> To me legal means we know how to select it and with that definition add
>> i64 seems legal to me.
>>
>
> This is x86-32, so the options are either a 32-bit GPR or an 128-bit XMM
> register.  I mean, I guess you could consider it legal... but the same
> reasoning would lead us to conclude that "add <4 x i8>" is legal, so we
> would end up with a bunch of register classes for registers which don't
> really exist.
>
>
> You mean like register classes for 64-bit values that live in 128-bit
> registers?
> I thought we were talking about 64-bit value in 64-bit registers (MMX).
> For 64-bit values that live in 128-bit register, yeah I believe we would
> need to use widening or NarrowScalar. But, yeah modeling the cost of such
> interaction was not in the scope of the legalizer (at least not for the
> generic framework). However, this is something we should think of.
>
> Anyhow, thanks for bringing that problem!
>
>
LLVM only uses MMX for intrinsics which take x86_mmx as inputs because MMX
registers are too weird to mix with normal code.  (Among other things, emms
is ridiculously expensive.)

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160708/6fe035d8/attachment.html>