[all-commits] [llvm/llvm-project] d515ec: [IR] Add new intrinsics interleave and deinterleav...

Mon Feb 20 04:44:23 PST 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: d515ecca683422fbb46e23d30f0016efcbc84f01
      https://github.com/llvm/llvm-project/commit/d515ecca683422fbb46e23d30f0016efcbc84f01
  Author: Caroline Concatto <caroline.concatto at arm.com>
  Date:   2023-02-20 (Mon, 20 Feb 2023)

  Changed paths:
    M llvm/docs/LangRef.rst
    M llvm/include/llvm/CodeGen/ISDOpcodes.h
    M llvm/include/llvm/IR/Intrinsics.td
    M llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    M llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
    M llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
    M llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
    M llvm/lib/Target/AArch64/AArch64ISelLowering.h
    A llvm/test/CodeGen/AArch64/fixed-vector-deinterleave.ll
    A llvm/test/CodeGen/AArch64/fixed-vector-interleave.ll
    A llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
    A llvm/test/CodeGen/AArch64/sve-vector-interleave.ll

  Log Message:
  -----------
  [IR] Add new intrinsics interleave and deinterleave vectors

This patch adds 2 new intrinsics:

  ; Interleave two vectors into a wider vector
  <vscale x 4 x i64> @llvm.vector.interleave2.nxv2i64(<vscale x 2 x i64> %even, <vscale x 2 x i64> %odd)

  ; Deinterleave the odd and even lanes from a wider vector
  {<vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave2.nxv2i64(<vscale x 4 x i64> %vec)

The main motivator for adding these intrinsics is to support vectorization of
complex types using scalable vectors.

The intrinsics are kept simple by only supporting a stride of 2, which makes
them easy to lower and type-legalize. A stride of 2 is sufficient to handle
complex types which only have a real/imaginary component.

The format of the intrinsics matches how `shufflevector` is used in
LoopVectorize. For example:

  using cf = std::complex<float>;

  void foo(cf * dst, int N) {
      for (int i=0; i<N; ++i)
          dst[i] += cf(1.f, 2.f);
  }

For this loop, LoopVectorize:
  (1) Loads a wide vector (e.g. <8 x float>)
  (2) Extracts odd lanes using shufflevector (leading to <4 x float>)
  (3) Extracts even lanes using shufflevector (leading to <4 x float>)
  (4) Performs the addition
  (5) Interleaves the two <4 x float> vectors into a single <8 x float> using
      shufflevector
  (6) Stores the wide vector.

In this example, we can 1-1 replace shufflevector in (2) and (3) with the
deinterleave intrinsic, and replace the shufflevector in (5) with the
interleave intrinsic.

The SelectionDAG nodes might be extended to support higher strides (3, 4, etc)
as well in the future.

Similar to what was done for vector.splice and vector.reverse, the intrinsic
is lowered to a shufflevector when the type is fixed width, so to benefit from
existing code that was written to recognize/optimize shufflevector patterns.

Note that this approach does not prevent us from adding new intrinsics for other
strides, or adding a more generic shuffle intrinsic in the future. It just solves
the immediate problem of being able to vectorize loops with complex math.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D141924