[PATCH] D94230: [AArch64][SVE] Add SVE IR pass to coalesce ptrue instrinsic calls

Mon Jan 11 02:14:37 PST 2021

joechrisellis added a comment.

Hi @bsmith,

The poor codegen in that example is happening because we're factoring out a ptrue which is immediately converted to a 'sparse' predicate via a sequence of SVE reinterpret intrinsics. E.g.:

  %1 = call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
  // <1, 1, 1, 1>
  %2 = call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %1)
  // <1, 0, 0, 0, 1, 0 ,0, 0, 1, 0, 0, 0, 1, 0, 0, 0>
  %3 = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %2)
  // <1, 0, 1, 0, 1, 0, 1, 0> ('sparse' predicate)

In these specific circumstances it doesn't make sense to eliminate the ptrue because we're just going to create an even longer chain which cannot be reduced by the SVEIntrinsicOpts.cpp pass (also see D94074 <https://reviews.llvm.org/D94074>, which extends this pass to reduce long conversion chains). I've modified this pass to account for these situations, and the codegen that we get now is identical to before. I've added this case into the tests, too.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94230/new/

https://reviews.llvm.org/D94230