[PATCH] D53594: [GlobalISel] Introduce G_BUILD_VECTOR and G_CONCAT_VECTOR opcodes

Fri Nov 2 10:20:44 PDT 2018

dsanders added a comment.

In https://reviews.llvm.org/D53594#1283016, @qcolombet wrote:

> > I'm not sure how this is any better. If we had an opcode G_BUILD_VECTOR_TRUNC, it would have a superset of the functionality of the G_BUILD_VECTOR. You'd still have this powerful opcode, except now you have potential code duplication in handling the two, as well as a hindered G_BUILD_VECTOR.
> > 
> > Unless you're say that G_BUILD_VECTOR _TRUNC can *only* deal with oversize scalars, in which case it's an odd opcode for a corner case (and I would suggest here that we just deal with the inconvenient types, but I'm not married to the idea).
>
> I was suggesting the later.
>
> Generally speaking, I am not a fan of implicit stuff, thus the proposal (and that's easy to have a isBuildVectorLike or something). That said, I'd like to go back as why we need to do that.
>
> So the problem was that we wanted to represent:
>  a(<4 x s32>) = G_BUILD_VECTOR p(s32), q, r, s
>  b(<4 x s8>) = G_TRUNC a
>
> into
>  b(< 4 x s8>) = G_BUILD_VECTOR p(s32), q, r, s
>
> Why can't we keep the "verbose" representation?

Hi Quentin,

The verbose representation may not be legal for the given target. Let's take Mips MSA as a concrete example. MSA has <16 x s8> vectors and <4 x s32> vectors (among others but they're not relevant) but doesn't have <16 x s32> (its 512-bits and the registers are 128-bit) or <4 x s8> (although we can potentially emulate it with <16 x s8> but let's stick to the real types the ISA can handle for now). For an example similar to yours,  the verbose representation might be:

  %0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
  %z(<16 x s8>) = G_TRUNC %0

but <16 x s8> G_TRUNC <16 x s32> isn't legal so we start by expanding that:

  %0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
  %r0(<4 x s32>), %y1, %y2, %y3 = G_UNMERGE_VALUES %0
  %z0(<4 x s8>) = G_TRUNC %r0
  %z1(<4 x s8>) = G_TRUNC %r1
  %z2(<4 x s8>) = G_TRUNC %r2
  %z3(<4 x s8>) = G_TRUNC %r3
  %z(16 x s8) = G_MERGE_VALUES %z0, %z1, %z2, %z3

the new <4 x s8> G_TRUNC <4 x s32> isn't legal either so we expand again to:

  %0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
  %r0(<4 x s32>), %r1, %r2, %r3 = G_UNMERGE_VALUES %0
  %t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %r0
  %t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %r1
  %t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %r2
  %t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %r3
  %y0(s8) = G_TRUNC %t0
  %y1(s8) = G_TRUNC %t1
  %y2(s8) = G_TRUNC %t2
  %y3(s8) = G_TRUNC %t3
  %y4(s8) = G_TRUNC %t4
  %y5(s8) = G_TRUNC %t5
  %y6(s8) = G_TRUNC %t6
  %y7(s8) = G_TRUNC %t7
  %y8(s8) = G_TRUNC %t8
  %y9(s8) = G_TRUNC %t9
  %y10(s8) = G_TRUNC %t10
  %y11(s8) = G_TRUNC %t11
  %y12(s8) = G_TRUNC %t12
  %y13(s8) = G_TRUNC %t13
  %y14(s8) = G_TRUNC %t14
  %y15(s8) = G_TRUNC %t15
  %z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15

Next, <16 x 32> G_BUILD_VECTOR isn't legal so we expand to:

  %v0(<4 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d
  %v1(<4 x s32>) = G_BUILD_VECTOR %e, %f, %g, %h
  %v2(<4 x s32>) = G_BUILD_VECTOR %i, %j, %k, %l
  %v3(<4 x s32>) = G_BUILD_VECTOR %m, %n, %o, %p
  %u(<16 x s32>) = G_MERGE_VALUES %v0, %v1, %v2, %v3
  %r0(<4 x s32>), %r1, %r2, %r3 = G_UNMERGE_VALUES %u
  %t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %r0
  %t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %r0
  %t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %r0
  %t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %r0
  %y0(s8) = G_TRUNC %t0
  %y1(s8) = G_TRUNC %t1
  %y2(s8) = G_TRUNC %t2
  %y3(s8) = G_TRUNC %t3
  %y4(s8) = G_TRUNC %t4
  %y5(s8) = G_TRUNC %t5
  %y6(s8) = G_TRUNC %t6
  %y7(s8) = G_TRUNC %t7
  %y8(s8) = G_TRUNC %t8
  %y9(s8) = G_TRUNC %t9
  %y10(s8) = G_TRUNC %t10
  %y11(s8) = G_TRUNC %t11
  %y12(s8) = G_TRUNC %t12
  %y13(s8) = G_TRUNC %t13
  %y14(s8) = G_TRUNC %t14
  %y15(s8) = G_TRUNC %t15
  %z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15

and finally, we clean up the artefacts:

  %v0(<4 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d
  %v1(<4 x s32>) = G_BUILD_VECTOR %e, %f, %g, %h
  %v2(<4 x s32>) = G_BUILD_VECTOR %i, %j, %k, %l
  %v3(<4 x s32>) = G_BUILD_VECTOR %m, %n, %o, %p
  %t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %v0
  %t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %v1
  %t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %v2
  %t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %v3
  %y0(s8) = G_TRUNC %t0
  %y1(s8) = G_TRUNC %t1
  %y2(s8) = G_TRUNC %t2
  %y3(s8) = G_TRUNC %t3
  %y4(s8) = G_TRUNC %t4
  %y5(s8) = G_TRUNC %t5
  %y6(s8) = G_TRUNC %t6
  %y7(s8) = G_TRUNC %t7
  %y8(s8) = G_TRUNC %t8
  %y9(s8) = G_TRUNC %t9
  %y10(s8) = G_TRUNC %t10
  %y11(s8) = G_TRUNC %t11
  %y12(s8) = G_TRUNC %t12
  %y13(s8) = G_TRUNC %t13
  %y14(s8) = G_TRUNC %t14
  %y15(s8) = G_TRUNC %t15
  %z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15

Meanwhile, the compact representation would be:

  %z(<16 x s8>) = G_BUILD_VECTOR %a(s32), %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p

and is legal from the start.

> I would expect it would be easy to handle that in ISel.
> 
> Also, just throwing that here, I haven't really thought about it: should we rethink our legality model to check patterns instead of one instruction at a time?
>  I.e., maybe
> 
>   a(s8) = ...
> 
> 
> is not legal but
> 
>   a(s8) = ...
>   b(<4 x s8>) = build_vector a(s8)
> 
> 
> is.

The main problem with this is optimization. If anything breaks that pattern apart, then it ceases to be legal. That was the reason we moved away from G_SEXT+G_LOAD and to G_SEXTLOAD. We had problems with optimizations on G_SEXT making related G_LOAD's illegal.

Another issue there is that you can't look at an instruction and decide if it's legal. You may need to walk the uses looking for a pattern that covers it.

> This actually ties back to one question that I don't think we answered. Do we legalize on the type (i.e., s8 to s32) or the size and type can be different (i.e., s8(8b) to s8(32b))?
>  It seems to me we are trying to write <4 x s8> = s8(32b), ...., but maybe we want <4 x s8> = s32, ....

Those look the same to me, it's just that one is implicitly converting s32 to s8.

> Anyhow, to be concrete, I am fine with the implicit truncate, we know this is not too bad based on SDISel experience, I want to make sure we consider all our options here.
> 
> Cheers,
> -Quentin

Repository:
  rL LLVM

https://reviews.llvm.org/D53594