[PATCH] D21534: GlobalISel: first outline of legalization interface.
Eli Friedman via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 7 19:12:12 PDT 2016
On Thu, Jul 7, 2016 at 6:18 PM, Quentin Colombet <qcolombet at apple.com>
> On Jul 7, 2016, at 4:24 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Thu, Jul 7, 2016 at 10:39 AM, Quentin Colombet <qcolombet at apple.com>
>> Hi Eli,
>> Thanks for your feedbacks.
>> Answers inlined.
>> On Jun 22, 2016, at 4:21 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> On Wed, Jun 22, 2016 at 2:39 PM, Tim Northover <t.p.northover at gmail.com>
>>> > I'm not sure if this is really going to work the way you want. On x86
>>> > AVX (but not AVX2), is LOAD <8 x i32> legal? I mean, you could
>>> declare that
>>> > it is... but you're going to end up with a bunch of vector shuffles
>>> > to legalize ADD <8 x i32>. You could clean it up afterwards with some
>>> > of optimization pass to split vectors where it's profitable... but it
>>> > complicated when you start dealing with values with multiple uses and
>>> > nodes.
>>> This still seems to be something for RegBankSelect to me. It's going
>>> to see something like
>>> %0(256) = G_LOAD <4 x i32> ...
>>> %1(128) = G_EXTRACT <2 x i32> %0, 0
>>> %2(128) = G_EXTRACT <2 x i32> %0, 1
>>> %3(128) = G_ADD <2 x i32> %1, ...
>>> %4(128) = G_ADD <2 x i32> %2, ...
>>> %5(256) = G_SEQ <4 x i32> %3 %4
>>> and ought to have the cost model necessary to decide that (XMM, XMM)
>>> is the best register class (in whatever representation it has, an
>>> extension of the .td RegClasses with tuples) rather than YMM.
>> We run RegBankSelect after legalization? Then what happens? Presumably,
>> if you have a load or arithmetic operation whose result ends up in (XMM,
>> XMM), you then want to split it so you have two operations which each end
>> up in one xmm register... then you have a bunch of new operations which
>> haven't been through legalization and register bank selection, so you need
>> to run legalization and RegBankSelect from the top again?
>> No, we do not run the full legalizer. RegBankSelect creates very specific
>> operations (glorified copies per say, this includes extract and
>> build_sequence) and the plan is to apply the legalizer helper on them.
> That works to split a vector... not so much to split an integer.
> Ok, let me clarify, I think I see the misunderstanding.
> The generic code of RegBankSelect will only insert glorified version of
> The target specific code can do whatever it wants like splitting add and
> such. The caveat is that whatever the target does, it needs to be able to
> select it, so it must be legal (even if the target runs the legalizer
> helper when doing the remapping).
Oh, okay; so RegBankSelect basically only makes copies, but it has hooks if
the target wants tot try to do something fancy. That makes sense.
> For example, to split an i64 add, you end up with adde operations... which
> probably aren't legal. Not sure if that will come up in practice.
>> Or do we have some sort of restricted post-RegBankSelect legalizer which
>> doesn't require a second pass?
>> No, see my previous answer.
>> If we're doing custom lowering before RegBankSelect, we could end up
>> being effectively forced to choose a bank during legalization, without the
>> benefit of a cost model.
>> Even custom lowered instructions can be remapped. The target can specify
>> alternative instructions mapping for every instruction, generic or not.
> I assume by "remapped" you mean there's a target hook to transform the
> instruction (tables probably aren't enough in some cases). And I guess it
> would be a requirement that all operations generated at this point are
> That is correct.
> That can be kind of awkward in some cases.
> How so? (I guess your later example try to convey that, but I did not get
> the problem)
The problem I was thinking of is the cost computation... but I guess if the
target is doing it, it can figure out the relevant costs itself. Okay.
>> For example, if you need to custom-lower an <8 x i32> shuffle on AVX, the
>> result could look substantially different depending on whether the result
>> needs to be in on YMM or two XMM registers. Things become even more
>> awkward if you don't distinguish between integer and vector registers on
>> x86; for example, if I have an i64 add on x86-32, does it need to be
>> widened to <2 x i64> or split into two i32 ADDE operations?
>> I don’t understand the example. Also how is this different from what we
>> currently do?
> For the <8 x i32> shuffle example, you have an illegal shuffle;
> legalization splits it into multiple shuffles (this would happen before
> RegBankSelect, right?). RegBankSelect sees the shuffles, and considers
> splitting them... but how does it figure out how many shuffles you end up
> That’s up to the target. It will say it maps the <8 x i32> on N definition
> and materialize that target.
> One way is to merge the shuffles, but that involves RegBankSelect
> special-casing shuffles.
> Yeah, merging is not something I wanted to consider in RegBankSelect, at
> least for now.
> One thing to keep in mind is that unlike SDISel, you can insert new target
> specific (or target independent) passes wherever you want in the pipeline.
> Therefore if we need a shuffle combiner of some sort, we do not have to do
> it in regbankselect or legalization.
> And we don't really want to end up with a bunch of special cases in
> RegBankSelect. IIRC, this is an existing problem with SDISel to some
> extent because we consider <8 x i32> legal, so it's not really something
> new, but it's worth considering.
> A different example: suppose you have an `add <1 x i64>` and an `add i64`
> on x86-32. (`<1 x i64>` comes up with code ported from MMX.) Currently,
> ISel will put the former into a vector register, and the latter into
> integer registers. (This isn't ideal, but it generally works out
> reasonably well.) With GlobalISel, both are just an i64 add, and i64 isn't
> legal, so we have to decide: do we WidenVector or NarrowScalar? Without
> any context, there isn't an obvious right answer.
> The options I see are:
> - Mark it legal, you know how to select it and let the regbankselect
> decide. I actually don’t get why it is illegal in your example.
> - Mark it custom and look for context around.
> I would recommend the first approach.
> We have a few options:
> 1. always WidenVector, and end up with terrible code if we need to
> transfer the result to integer registers;
> 2. always NarrowScalar, and end up with terrible code if we need to
> transfer the result to xmm registers;
> 3. pretend i64 add is legal, and let RegBankSelect assign it to either a
> fake vector register or a fake integer register
> Why pretend, this is legal, right?
> To me legal means we know how to select it and with that definition add
> i64 seems legal to me.
This is x86-32, so the options are either a 32-bit GPR or an 128-bit XMM
register. I mean, I guess you could consider it legal... but the same
reasoning would lead us to conclude that "add <4 x i8>" is legal, so we
would end up with a bunch of register classes for registers which don't
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits