[PATCH] D105119: [SVE] Fix incorrect codegen when inserting vector elements into widened scalable vectors

Thu Jul 1 05:57:00 PDT 2021

sdesmalen added a comment.

In D105119#2850584 <https://reviews.llvm.org/D105119#2850584>, @efriedma wrote:

> In D105119#2849757 <https://reviews.llvm.org/D105119#2849757>, @sdesmalen wrote:
>
>> In D105119#2848107 <https://reviews.llvm.org/D105119#2848107>, @efriedma wrote:
>>
>>> For non-scalable vectors, widening means we pad the end with undef elements (i.e. INSERT_SUBVECTOR into an undef vector).  Do you want it to mean something different for scalable vectors?  Is this documented somewhere?
>>
>> The difference is that scalable vectors are unpacked, so for `<vscale x 1 x i64>` <=> `<v0 | v1 | .... | vn-1>`, needs widening to `<vscale x 2 x i64>` <=> `<v0, _ | v1, _, | ... | vn-1, _>`, which means that the index in which to insert the element needs to be multiplied by 2. For fixed-width vectors, we'd indeed pad the number of elements in the vector to go from `<2 x i32> <v0, v1>` to `<4 x i32> <v0, v1, _, _>`, so the index stays the same. I'm not really sure where this should be documented as it's mostly a characteristic of scalable vectors, but I guess an extra comment describing it here wouldn't hurt. Note that the widening only works for `vscale x 1` because it is a power of 2. I'm not sure if it would even be possible to widen `<vscale x 3` to `<vscale x 4`.
>
> This sounds very SVE-specific.
>
> I'm not sure I understand why we can't pack the elements tightly.  It might be a little more verbose to widen certain SVE operations, but not impossible, and injecting SVE-specific behavior into target-independent code is messy.

This was a design decision that was made quite early on. It's not really specific to SVE and various code in SelectionDAG already builds on this decision.
It seems the design was never really documented anywhere, which I think would be a good idea for us to still do (suggestions on where to put this are welcome).

For `<vscale x 4 x i16>` (assuming a native vector width of vscale x 128 bits for this example):

  <h0, _, h1, _, h2, _, h3, _> (vscale = 1)
  <h0, _, h1, _, h2, _, h3, _ | h4, _, h5, _, h6, _, h7, _> (vscale = 2)
  <h0, _, h1, _, h2, _, h3, _ | h4, _, h5, _, h6, _, h7, _ | h8, _, h9, _, h10, _, h11, _> (vscale = 3)
  <h0, _, h1, _, h2, _, h3, _ | h4, _, h5, _, h6, _, h7, _ | h8, _, h9, _, h10, _, h11, _ | h12, _, h13, _, h14, _, h15, _> (vscale = 4)

If the elements would be packed, then this would be laid out as:

  <h0, h1, h2, h3, _, _, _, _> (vscale = 1)
  <h0, h1, h2, h3, h4 , h5, h6, h7 | _, _, _, _, _, _, _, _> (vscale = 2)
  <h0, h1, h2, h3, h4 , h5, h6, h7 | h8, h9, h10, h11, _, _, _, _ | _, _, _, _, _, _, _, _> (vscale = 3)
  <h0, h1, h2, h3, h4 , h5, h6, h7 | h8, h9, h10, h11, h12, h13, h14, h15 | _, _, _, _, _, _, _, _, _ | _, _, _, _, _, _, _, _, _> (vscale = 4)

There is no efficient way to half the vector now from <vscale x 4 x i16> to <vscale x 2 x i16>, because that requires knowledge about the runtime value of vscale which may be anywhere in the vector (and its index may no longer be a power of two). For the unpacked format, it means that common unpack/zip operations can be used. Concatenating vectors would be even more awkward, whereas for the unpacked format, a 'concat' is just a matter of concatenating each even element from both vectors.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105119/new/

https://reviews.llvm.org/D105119