[PATCH] D71469: [AArch64] Add sq(r)dmulh_lane(q) LLVM IR intrinsics

Fri Dec 13 08:06:15 PST 2019

sanwou01 created this revision.
Herald added subscribers: llvm-commits, cfe-commits, jdoerfert, hiraditya, kristof.beyls.
Herald added projects: clang, LLVM.
sanwou01 added reviewers: SjoerdMeijer, dmgreen, t.p.northover.

Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a by vector sqdmulh and a vector of (repeated)
indices, like so:

  %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
  %vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)

When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.

This makes it impossible to do a neat trick when using NEON intrinsics: one can
load a number of constants using a single vector load, which are then repeatedly
used to multiply whole vectors by one of the constants. This trick is used for a
nice performance upside (2.5% to 4% on one microbenchmark) in libjpeg-turbo.

This patch adds four LLVM IR intrinsics to the AArch64 backend:

- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.

This prevents the constant propagation when it is not wanted.

In order to represent the type of these intrinsics, the patch adds
LLVMNarrowType and LLVMWideType to the IntrinsicEmitter.  The second parameter
of the 'lane' variants is a vector with the same element type, restricted to a
total of 64 bits.  Similarly, the 'laneq' variants' second parameter is widened
to a total of 128 bits.

The 'lane' variants also need an additional register class.  The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.

Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.

This patch also changes clang to emit the new LLVM IR intrinsics for the
corresponding NEON intrinsics (AArch64 only).

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D71469

Files:
  clang/include/clang/Basic/arm_neon.td
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGen/aarch64-neon-2velem.c
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/IR/Function.cpp
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/lib/Target/AArch64/AArch64InstrInfo.td
  llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp
  llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
  llvm/lib/Target/AArch64/AArch64RegisterInfo.td
  llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
  llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll
  llvm/utils/TableGen/IntrinsicEmitter.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D71469.233807.patch
Type: text/x-patch
Size: 76669 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191213/f64476df/attachment.bin>