[PATCH] D123379: [AArch64] Cost all perfect shuffles entries as cost 1

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 8 04:14:34 PDT 2022


dmgreen created this revision.
dmgreen added reviewers: SjoerdMeijer, labrinea, samtebbs, jaykang10.
Herald added subscribers: hiraditya, kristof.beyls.
Herald added a project: All.
dmgreen requested review of this revision.
Herald added a project: LLVM.

A brief introduction to perfect shuffles - AArch64 NEON has a number of shuffle operations - dups, zips, exts, movs etc that can in some way shuffle around the lanes of a vector. Given a shuffle of size 4 with 2 inputs, some shuffle masks can be easily codegen'd to a single instruction. A `<0,0,1,1>` mask for example is a `zip LHS, LHS`. This is great, but some masks are not so simple, like a `<0,0,1,2>`. It turns out we can generate that from `zip LHS, <0,2,0,2>`, and then generate `<0,2,0,2>` from `uzp LHS, LHS`, producing the result in 2 instructions.

It is not obvious from a given mask how to get there though. So we have a simple program (PerfectShuffle.cpp in the util folder) that can scan through all combinations of 4-element vectors and generate the "perfect" combination of results needed for each shuffle mask (for some definition of perfect). These then live in a table that is queried for generating shuffle instructions. (Because the table could get quite big, it is limited to 4 element vectors).

In the perfect shuffle tables zip, unz and trn shuffles were being cost as 2, which is higher than needed and skews the perfect shuffle tables to create inefficient combinations. This sets them to 1 and regenerates the tables for them. The codegen will usually be better and the costs should be more precise (but it can get less second-order re-use of values from multiple shuffles, these cases should be fixed up in subsequent patches.


https://reviews.llvm.org/D123379

Files:
  llvm/lib/Target/AArch64/AArch64PerfectShuffle.h
  llvm/test/CodeGen/AArch64/aarch64-wide-shuffle.ll
  llvm/test/CodeGen/AArch64/arm64-dup.ll
  llvm/test/CodeGen/AArch64/arm64-rev.ll
  llvm/test/CodeGen/AArch64/build-vector-extract.ll
  llvm/test/CodeGen/AArch64/insert-extend.ll
  llvm/test/CodeGen/AArch64/neon-wide-splat.ll
  llvm/test/CodeGen/AArch64/select-shuffle.ll
  llvm/test/CodeGen/AArch64/shuffle-tbl34.ll
  llvm/test/CodeGen/AArch64/shuffles.ll
  llvm/test/CodeGen/AArch64/sinksplat.ll
  llvm/utils/PerfectShuffle/PerfectShuffle.cpp



More information about the llvm-commits mailing list