[PATCH] D141924: [IR] Add new intrinsics interleave and deinterleave vectors

Tue Jan 17 05:41:47 PST 2023

CarolineConcatto created this revision.
Herald added subscribers: jdoerfert, hiraditya.
Herald added a project: All.
CarolineConcatto requested review of this revision.
Herald added subscribers: llvm-commits, pcwang-thead, alextsao1999.
Herald added a project: LLVM.

This patch adds 3 new intrinsics:

  ; Interleave two vectors into a wider vector
  <vscale x 4 x i64> @llvm.vector.interleave.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)

  ; Deinterleave the odd/even lanes from a wider vector
  <vscale x 2 x i64> @llvm.vector.deinterleave.even.nxv2i64(<vscale x 4 x i64> %vec)
  <vscale x 2 x i64> @llvm.vector.deinterleave.odd.nxv2i64(<vscale x 4 x i64> %vec)

The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.

The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.

The format of the intrinsics matches how `shufflevector` is used in
LoopVectorize. For example:

  using cf = std::complex<float>;

  void foo(cf * dst, int N) {
      for (int i=0; i<N; ++i)
          dst[i] += cf(1.f, 2.f);
  }

For this loop, LoopVectorize:

  (1) Loads a wide vector (e.g. <8 x float>)
  (2) Extracts odd lanes using shufflevector (leading to <4 x float>)
  (3) Extracts even lanes using shufflevector (leading to <4 x float>)
  (4) Performs the addition
  (5) Interleaves the two <4 x float> vectors into a single <8 x float> using
      shufflevector
  (6) Stores the wide vector.

In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsics, and replace the shufflevector in (5) with the
interleave intrinsic.

The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.

Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.

Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D141924

Files:
  llvm/docs/LangRef.rst
  llvm/include/llvm/CodeGen/ISDOpcodes.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/Target/TargetSelectionDAG.td
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.h
  llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll
  llvm/test/CodeGen/AArch64/fixed-vector-interleave.ll
  llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
  llvm/test/CodeGen/AArch64/sve-vector-interleave.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D141924.489784.patch
Type: text/x-patch
Size: 54130 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230117/463683e8/attachment.bin>