[PATCH] D53594: [GlobalISel] Introduce G_BUILD_VECTOR and G_CONCAT_VECTOR opcodes
Daniel Sanders via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 2 10:20:44 PDT 2018
dsanders added a comment.
In https://reviews.llvm.org/D53594#1283016, @qcolombet wrote:
> > I'm not sure how this is any better. If we had an opcode G_BUILD_VECTOR_TRUNC, it would have a superset of the functionality of the G_BUILD_VECTOR. You'd still have this powerful opcode, except now you have potential code duplication in handling the two, as well as a hindered G_BUILD_VECTOR.
> >
> > Unless you're say that G_BUILD_VECTOR _TRUNC can *only* deal with oversize scalars, in which case it's an odd opcode for a corner case (and I would suggest here that we just deal with the inconvenient types, but I'm not married to the idea).
>
> I was suggesting the later.
>
> Generally speaking, I am not a fan of implicit stuff, thus the proposal (and that's easy to have a isBuildVectorLike or something). That said, I'd like to go back as why we need to do that.
>
> So the problem was that we wanted to represent:
> a(<4 x s32>) = G_BUILD_VECTOR p(s32), q, r, s
> b(<4 x s8>) = G_TRUNC a
>
> into
> b(< 4 x s8>) = G_BUILD_VECTOR p(s32), q, r, s
>
> Why can't we keep the "verbose" representation?
Hi Quentin,
The verbose representation may not be legal for the given target. Let's take Mips MSA as a concrete example. MSA has <16 x s8> vectors and <4 x s32> vectors (among others but they're not relevant) but doesn't have <16 x s32> (its 512-bits and the registers are 128-bit) or <4 x s8> (although we can potentially emulate it with <16 x s8> but let's stick to the real types the ISA can handle for now). For an example similar to yours, the verbose representation might be:
%0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
%z(<16 x s8>) = G_TRUNC %0
but <16 x s8> G_TRUNC <16 x s32> isn't legal so we start by expanding that:
%0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
%r0(<4 x s32>), %y1, %y2, %y3 = G_UNMERGE_VALUES %0
%z0(<4 x s8>) = G_TRUNC %r0
%z1(<4 x s8>) = G_TRUNC %r1
%z2(<4 x s8>) = G_TRUNC %r2
%z3(<4 x s8>) = G_TRUNC %r3
%z(16 x s8) = G_MERGE_VALUES %z0, %z1, %z2, %z3
the new <4 x s8> G_TRUNC <4 x s32> isn't legal either so we expand again to:
%0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
%r0(<4 x s32>), %r1, %r2, %r3 = G_UNMERGE_VALUES %0
%t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %r0
%t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %r1
%t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %r2
%t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %r3
%y0(s8) = G_TRUNC %t0
%y1(s8) = G_TRUNC %t1
%y2(s8) = G_TRUNC %t2
%y3(s8) = G_TRUNC %t3
%y4(s8) = G_TRUNC %t4
%y5(s8) = G_TRUNC %t5
%y6(s8) = G_TRUNC %t6
%y7(s8) = G_TRUNC %t7
%y8(s8) = G_TRUNC %t8
%y9(s8) = G_TRUNC %t9
%y10(s8) = G_TRUNC %t10
%y11(s8) = G_TRUNC %t11
%y12(s8) = G_TRUNC %t12
%y13(s8) = G_TRUNC %t13
%y14(s8) = G_TRUNC %t14
%y15(s8) = G_TRUNC %t15
%z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15
Next, <16 x 32> G_BUILD_VECTOR isn't legal so we expand to:
%v0(<4 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d
%v1(<4 x s32>) = G_BUILD_VECTOR %e, %f, %g, %h
%v2(<4 x s32>) = G_BUILD_VECTOR %i, %j, %k, %l
%v3(<4 x s32>) = G_BUILD_VECTOR %m, %n, %o, %p
%u(<16 x s32>) = G_MERGE_VALUES %v0, %v1, %v2, %v3
%r0(<4 x s32>), %r1, %r2, %r3 = G_UNMERGE_VALUES %u
%t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %r0
%t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %r0
%t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %r0
%t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %r0
%y0(s8) = G_TRUNC %t0
%y1(s8) = G_TRUNC %t1
%y2(s8) = G_TRUNC %t2
%y3(s8) = G_TRUNC %t3
%y4(s8) = G_TRUNC %t4
%y5(s8) = G_TRUNC %t5
%y6(s8) = G_TRUNC %t6
%y7(s8) = G_TRUNC %t7
%y8(s8) = G_TRUNC %t8
%y9(s8) = G_TRUNC %t9
%y10(s8) = G_TRUNC %t10
%y11(s8) = G_TRUNC %t11
%y12(s8) = G_TRUNC %t12
%y13(s8) = G_TRUNC %t13
%y14(s8) = G_TRUNC %t14
%y15(s8) = G_TRUNC %t15
%z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15
and finally, we clean up the artefacts:
%v0(<4 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d
%v1(<4 x s32>) = G_BUILD_VECTOR %e, %f, %g, %h
%v2(<4 x s32>) = G_BUILD_VECTOR %i, %j, %k, %l
%v3(<4 x s32>) = G_BUILD_VECTOR %m, %n, %o, %p
%t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %v0
%t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %v1
%t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %v2
%t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %v3
%y0(s8) = G_TRUNC %t0
%y1(s8) = G_TRUNC %t1
%y2(s8) = G_TRUNC %t2
%y3(s8) = G_TRUNC %t3
%y4(s8) = G_TRUNC %t4
%y5(s8) = G_TRUNC %t5
%y6(s8) = G_TRUNC %t6
%y7(s8) = G_TRUNC %t7
%y8(s8) = G_TRUNC %t8
%y9(s8) = G_TRUNC %t9
%y10(s8) = G_TRUNC %t10
%y11(s8) = G_TRUNC %t11
%y12(s8) = G_TRUNC %t12
%y13(s8) = G_TRUNC %t13
%y14(s8) = G_TRUNC %t14
%y15(s8) = G_TRUNC %t15
%z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15
Meanwhile, the compact representation would be:
%z(<16 x s8>) = G_BUILD_VECTOR %a(s32), %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
and is legal from the start.
> I would expect it would be easy to handle that in ISel.
>
> Also, just throwing that here, I haven't really thought about it: should we rethink our legality model to check patterns instead of one instruction at a time?
> I.e., maybe
>
> a(s8) = ...
>
>
> is not legal but
>
> a(s8) = ...
> b(<4 x s8>) = build_vector a(s8)
>
>
> is.
The main problem with this is optimization. If anything breaks that pattern apart, then it ceases to be legal. That was the reason we moved away from G_SEXT+G_LOAD and to G_SEXTLOAD. We had problems with optimizations on G_SEXT making related G_LOAD's illegal.
Another issue there is that you can't look at an instruction and decide if it's legal. You may need to walk the uses looking for a pattern that covers it.
> This actually ties back to one question that I don't think we answered. Do we legalize on the type (i.e., s8 to s32) or the size and type can be different (i.e., s8(8b) to s8(32b))?
> It seems to me we are trying to write <4 x s8> = s8(32b), ...., but maybe we want <4 x s8> = s32, ....
Those look the same to me, it's just that one is implicitly converting s32 to s8.
> Anyhow, to be concrete, I am fine with the implicit truncate, we know this is not too bad based on SDISel experience, I want to make sure we consider all our options here.
>
> Cheers,
> -Quentin
Repository:
rL LLVM
https://reviews.llvm.org/D53594
More information about the llvm-commits
mailing list