[PATCH] D21534: GlobalISel: first outline of legalization interface.

Thu Jul 7 16:24:52 PDT 2016

On Thu, Jul 7, 2016 at 10:39 AM, Quentin Colombet <qcolombet at apple.com>
wrote:

> Hi Eli,
>
> Thanks for your feedbacks.
>
> Answers inlined.
>
> On Jun 22, 2016, at 4:21 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>
> On Wed, Jun 22, 2016 at 2:39 PM, Tim Northover <t.p.northover at gmail.com>
> wrote:
>
>> > I'm not sure if this is really going to work the way you want. On x86
>> with
>> > AVX (but not AVX2), is LOAD <8 x i32> legal?  I mean, you could declare
>> that
>> > it is... but you're going to end up with a bunch of vector shuffles
>> trying
>> > to legalize ADD <8 x i32>. You could clean it up afterwards with some
>> sort
>> > of optimization pass to split vectors where it's profitable... but it
>> gets
>> > complicated when you start dealing with values with multiple uses and
>> PHI
>> > nodes.
>>
>> This still seems to be something for RegBankSelect to me. It's going
>> to see something like
>>
>>     %0(256) = G_LOAD <4 x i32> ...
>>     %1(128) = G_EXTRACT <2 x i32> %0, 0
>>     %2(128) = G_EXTRACT <2 x i32> %0, 1
>>     %3(128) = G_ADD <2 x i32> %1, ...
>>     %4(128) = G_ADD <2 x i32> %2, ...
>>     %5(256) = G_SEQ <4 x i32> %3 %4
>>
>> and ought to have the cost model necessary to decide that (XMM, XMM)
>> is the best register class (in whatever representation it has, an
>> extension of the .td RegClasses with tuples) rather than YMM.
>>
>
> We run RegBankSelect after legalization?  Then what happens?  Presumably,
> if you have a load or arithmetic operation whose result ends up in (XMM,
> XMM), you then want to split it so you have two operations which each end
> up in one xmm register... then you have a bunch of new operations which
> haven't been through legalization and register bank selection, so you need
> to run legalization and RegBankSelect from the top again?
>
>
> No, we do not run the full legalizer. RegBankSelect creates very specific
> operations (glorified copies per say, this includes extract and
> build_sequence) and the plan is to apply the legalizer helper on them.
>

That works to split a vector... not so much to split an integer.  For
example, to split an i64 add, you end up with adde operations... which
probably aren't legal.  Not sure if that will come up in practice.

> Or do we have some sort of restricted post-RegBankSelect legalizer which
> doesn't require a second pass?
>
>
> No, see my previous answer.
>
>
> If we're doing custom lowering before RegBankSelect, we could end up being
> effectively forced to choose a bank during legalization, without the
> benefit of a cost model.
>
>
> Even custom lowered instructions can be remapped. The target can specify
> alternative instructions mapping for every instruction, generic or not.
>

I assume by "remapped" you mean there's a target hook to transform the
instruction (tables probably aren't enough in some cases).  And I guess it
would be a requirement that all operations generated at this point are
legal?  That can be kind of awkward in some cases.

>
> For example, if you need to custom-lower an <8 x i32> shuffle on AVX, the
> result could look substantially different depending on whether the result
> needs to be in on YMM or two XMM registers.  Things become even more
> awkward if you don't distinguish between integer and vector registers on
> x86; for example, if I have an i64 add on x86-32, does it need to be
> widened to <2 x i64> or split into two i32 ADDE operations?
>
>
> I don’t understand the example. Also how is this different from what we
> currently do?
>

For the <8 x i32> shuffle example, you have an illegal shuffle;
legalization splits it into multiple shuffles (this would happen before
RegBankSelect, right?). RegBankSelect sees the shuffles, and considers
splitting them... but how does it figure out how many shuffles you end up
with?  One way is to merge the shuffles, but that involves RegBankSelect
special-casing shuffles.  And we don't really want to end up with a bunch
of special cases in RegBankSelect.  IIRC, this is an existing problem with
SDISel to some extent because we consider <8 x i32> legal, so it's not
really something new, but it's worth considering.

A different example: suppose you have an `add <1 x i64>` and an `add i64`
on x86-32.  (`<1 x i64>` comes up with code ported from MMX.)  Currently,
ISel will put the former into a vector register, and the latter into
integer registers.  (This isn't ideal, but it generally works out
reasonably well.)  With GlobalISel, both are just an i64 add, and i64 isn't
legal, so we have to decide: do we WidenVector or NarrowScalar?  Without
any context, there isn't an obvious right answer.  We have a few options:

1. always WidenVector, and end up with terrible code if we need to transfer
the result to integer registers;
2. always NarrowScalar, and end up with terrible code if we need to
transfer the result to xmm registers;
3. pretend i64 add is legal, and let RegBankSelect assign it to either a
fake vector register or a fake integer register
4. "cheat", and use the IR type as an input to legalization
5. add a bunch of code separate from legalization and RegBankSelect to try
and handle this.  (For example, x86 could have a special pass before
legalization which does some sort of cost analysis, then uses legalization
helpers to force legalization to happen in a specific manner.)

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160707/4369611b/attachment.html>