[PATCH] D131047: [AArch64] Add a tablegen pattern to transform duplane(scalar_to_vector(x),0) to dup(x), and vectorize scalar operands for aarch64.neon.pmull64 intrinsic

Sun Aug 14 12:50:09 PDT 2022

mingmingl updated this revision to Diff 452557.
mingmingl retitled this revision from "[AArch64] Add a tablegen pattern for aarch64.neon.pmull64" to "[AArch64] Add a tablegen pattern to transform duplane(scalar_to_vector(x),0) to dup(x), and vectorize scalar operands for aarch64.neon.pmull64 intrinsic".
mingmingl edited the summary of this revision.
mingmingl added a comment.

Decide to go with a tablegen pattern for dup, and vectorization for aarch64.neon.pmumll64 intrinisc.

- The alternative is a tablegen pattern for instrinsic [1]. This alternative is sub-optimal since it create new nodes (to dup from GPR) in the final instruction stage, and missing chances of combination in dag-combiners.
  - For example, if i64 is the lower-half of SIMD registers, we want to dup from lane, directly rather than generating a fmov (from SIMD lane 0 to GPR) followed by a dup (from GPR to all lanes of SIMD).  `test2` in `aarch64-pmull2` is the corresponding test for this.

[1]

  def : Pat<(int_aarch64_neon_pmull64 (extractelt (v2i64 V128:$Rn), (i64 1)),
                                      GPR64:$Rm),
            (PMULLv2i64 V128:$Rn, (v2i64 (DUPv2i64gpr GPR64:$Rm)))>;

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131047/new/

https://reviews.llvm.org/D131047

Files:
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/lib/Target/AArch64/AArch64InstrInfo.td
  llvm/test/CodeGen/AArch64/aarch64-pmull2.ll
  llvm/test/CodeGen/AArch64/pmull-ldr-merge.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D131047.452557.patch
Type: text/x-patch
Size: 10854 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220814/2f54143a/attachment.bin>