[PATCH] D94444: [RFC][Scalable] Add scalable shuffle intrinsic to extract evens from a pair of vectors

Fri Jan 22 08:18:19 PST 2021

cameron.mcinally added a comment.

> In D94444#2497697 <https://reviews.llvm.org/D94444#2497697>, @paulwalker-arm wrote:
> <A x Elt> llvm.experimental.vector.extract.elements(<B x Elt> %invec, i32 index, i32 stride)

Sorry for the slow reply. I'm just getting back to looking at this and now notice it is a unary shuffle. I'd like to see this as a binary shuffle. E.g.:

  void foo(double res[16], double x[16], std::complex<double> vec[16]) {
    for (int i = 0; i < 16; i++)
      res[i] = x[i] + vec[i].real();
    return;
  } 

In the general vectorization case, we want to keep the vectors as full as possible on each iteration . I think the Complex part of the loop body should look like:

  %lo = load %vec, 0
  %hi = load %vec, 64
  %reals = extract_elements(%lo, %hi, 0, 2)

And not splicing together two 1/2 width vectors:

  %lo = load %vec, 0
  %reals_lo = extract_elements(%lo, 0, 2)
  %hi = load %vec, 64
  %reals_hi = extract_elements(%hi, 0, 2)
  %reals = concat(%reals_lo, %reals_hi)

And also not having 2x the loop trips on 1/2 width vectors:

  %ld = load %vec, 0
  %reals = extract_elements(%ld, 0, 2)

I'm hand-waving over some other obvious optimizations, but I think this illustrates the `unary shuffle` problem pretty well. Thoughts?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94444/new/

https://reviews.llvm.org/D94444