[PATCH] D141924: [IR] Add new intrinsics interleave and deinterleave vectors

Wed Jan 18 06:03:12 PST 2023

sdesmalen added a comment.

Thanks for your feedback @reames

> We could even land the current definition and do this generalization in a follow on if desired.

Great! That was kind of our reasoning with doing just a stride of 2, keep it simple at first and if there is need for it we can extend it to other strides as well.

> For the interleave case, we can simply allow an arbitrary number of vector arguments with matching vector types. If the input type is <vscale x N x ty> than the result is <vscale x A*N x ty> where A is the (compile time constant) number of arguments.
>
> For the deinterleave case, it's a bit trickier. I'd like to avoid specific odd/even versions. One option I see is to add two integer constant arguments to the intrinsic.

That is one of the things we experimented with and a reason to design the ISD nodes in this way, as they're easily extended for higher strides.
Legalisation for non-power-of-2 strides (like 3 or 5) gets a bit awkward though as it requires lots of insert/extract_subvector operations, some of them SVE does not yet support (we need to put in a bit more work to support nxv1* types), but we could have a cost-model to avoid choosing such strides in the LV.

One thing to keep in mind is that all targets must be able to lower these intrinsics even if they don't have dedicated instructions for such interleaves (e.g. when they can't be merged with load/store instructions). I think we can probably fall back to using gather/scatter to implement lowering of these operations for //any// stride.

> The first would be the stride, the second would be the remainder. So, your "even" variant becomes deinterleave(vec, 2, 0). One piece that I'm not sure works here is that our result type needs to be a function of the type of the vector argument and the first integer argument. That may require some custom verification rules.

We experimented with just passing the 'offset'. The stride could be deduced from the types (e.g. if output is <vscale x 2 x i64> and input is <vscale x 8 x i64>, then the stride is 4), and the offset would tell at what element to start deinterleaving (it would be a value 0 <= offset < stride).

================
Comment at: llvm/include/llvm/CodeGen/ISDOpcodes.h:574

+  /// VECTOR_DEINTERLEAVE(VEC1, VEC2, IDX) - Returns a deinterleaved subvector
+  /// from VEC1 and VEC2. The vector is deinterleaved with a stride of 2
----------------
CarolineConcatto wrote:
> reames wrote:
> > The choice here to represent the longer vector type as two vectors which are implicitly concatenated is interesting.  Can you explain why you made that choice?  Is it important for legalization?
> It is more complicated when we have different sizes of input and output to legalise, keeping all inputs and outputs in the same size makes legalisation  simpler.
That's right, legalisation becomes simpler when all the types are the same. For example, when the input vector is illegal (too wide) but the vector output is legal, we'd need to split the  operation into two.

  // Assuming that nxv4i32 is legal, and nxv8i32 needs splitting
  nxv4i32 deinterleave(nxv8i32, 0)
  ->
  // Now the input vector is legal, but the output type of deinterleave (nxv2i32)
  // is illegal and the operation needs further promotion or widening
  nxv4i32 concat(nxv2i32 deinterleave(nxv4i32 extract_lo(nxv8i32), 0),
                 nxv2i32 deinterleave(nxv4i32 extract_hi(nxv8i32), 0))

Whereas, if we split the vector such that we have `nxv4i32 deinterleave(nxv4i32, nxv4i32, 0)`, then all the types are legal (or illegal, when using a different example) at the same time.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141924/new/

https://reviews.llvm.org/D141924