[PATCH] D94230: [AArch64][SVE] Add SVE IR pass to coalesce ptrue instrinsic calls

Thu Jan 7 07:01:37 PST 2021

bsmith added a comment.

After much discussion I'm actually incorrect in this assertion, as I mistakenly thought that the ptrue's were ending up being passed straight into the load rather than through the existing svbool convertions. That said this case with (%4, %5 and %7 made not redundant) does now produce worse codegen with this pass:

Currently:

  ptrue   p0.s
  ptrue   p1.h
  ld1w    { z0.s }, p0/z, [x0]
  ld1h    { z1.h }, p0/z, [x1]
  ld1h    { z8.h }, p1/z, [x1]
  ...

With patch:

  ptrue   p0.h
  ptrue   p1.s
  ptrue   p2.b
  and     p1.b, p2/z, p0.b, p1.b
  ld1w    { z0.s }, p0/z, [x0]
  ld1h    { z1.h }, p1/z, [x1]
  ld1h    { z8.h }, p0/z, [x1]
  ...

I do wonder whether this should be an MIR pass rather than an IR one?

In D94230#2484195 <https://reviews.llvm.org/D94230#2484195>, @bsmith wrote:

> I'm not sure this patch is correct as it's not taking into account how the predicates are used, for example in following case your patch replaces the `ptrue_b32()` predicate of the `%5` 8 x i16 load with a `ptrue_b16()`, which changes the behaviour.
>
>   declare <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 immarg)
>   declare <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 immarg)
>   
>   declare <vscale x 4 x i32> @llvm.aarch64.sve.ld1.nxv4i32(<vscale x 4 x i1>, i32*)
>   declare <vscale x 8 x i16> @llvm.aarch64.sve.ld1.nxv8i16(<vscale x 8 x i1>, i16*)
>   
>   declare <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1>)
>   declare <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1>)
>   
>   define <vscale x 8 x i16> @coalesce_test_basic(i32* %addr1, i16* %addr2) {
>     %1 = call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
>     %2 = call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %1)
>     %3 = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %2)
>   
>     %4 = call <vscale x 4 x i32> @llvm.aarch64.sve.ld1.nxv4i32(<vscale x 4 x i1> %1, i32* %addr1)
>     %5 = call <vscale x 8 x i16> @llvm.aarch64.sve.ld1.nxv8i16(<vscale x 8 x i1> %3, i16* %addr2)
>   
>     %6 = call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
>     %7 = call <vscale x 8 x i16> @llvm.aarch64.sve.ld1.nxv8i16(<vscale x 8 x i1> %6, i16* %addr2)
>   
>     ret <vscale x 8 x i16> %7
>   }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94230/new/

https://reviews.llvm.org/D94230