[PATCH] D21534: GlobalISel: first outline of legalization interface.

Wed Jun 22 16:21:54 PDT 2016

On Wed, Jun 22, 2016 at 2:39 PM, Tim Northover <t.p.northover at gmail.com>
wrote:

> > I'm not sure if this is really going to work the way you want. On x86
> with
> > AVX (but not AVX2), is LOAD <8 x i32> legal?  I mean, you could declare
> that
> > it is... but you're going to end up with a bunch of vector shuffles
> trying
> > to legalize ADD <8 x i32>. You could clean it up afterwards with some
> sort
> > of optimization pass to split vectors where it's profitable... but it
> gets
> > complicated when you start dealing with values with multiple uses and PHI
> > nodes.
>
> This still seems to be something for RegBankSelect to me. It's going
> to see something like
>
>     %0(256) = G_LOAD <4 x i32> ...
>     %1(128) = G_EXTRACT <2 x i32> %0, 0
>     %2(128) = G_EXTRACT <2 x i32> %0, 1
>     %3(128) = G_ADD <2 x i32> %1, ...
>     %4(128) = G_ADD <2 x i32> %2, ...
>     %5(256) = G_SEQ <4 x i32> %3 %4
>
> and ought to have the cost model necessary to decide that (XMM, XMM)
> is the best register class (in whatever representation it has, an
> extension of the .td RegClasses with tuples) rather than YMM.
>

We run RegBankSelect after legalization?  Then what happens?  Presumably,
if you have a load or arithmetic operation whose result ends up in (XMM,
XMM), you then want to split it so you have two operations which each end
up in one xmm register... then you have a bunch of new operations which
haven't been through legalization and register bank selection, so you need
to run legalization and RegBankSelect from the top again?  Or do we have
some sort of restricted post-RegBankSelect legalizer which doesn't require
a second pass?

If we're doing custom lowering before RegBankSelect, we could end up being
effectively forced to choose a bank during legalization, without the
benefit of a cost model.  For example, if you need to custom-lower an <8 x
i32> shuffle on AVX, the result could look substantially different
depending on whether the result needs to be in on YMM or two XMM
registers.  Things become even more awkward if you don't distinguish
between integer and vector registers on x86; for example, if I have an i64
add on x86-32, does it need to be widened to <2 x i64> or split into two
i32 ADDE operations?

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160622/017be482/attachment.html>