[PATCH] D94230: [AArch64][SVE] Add SVE IR pass to coalesce ptrue instrinsic calls
Bradley Smith via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 7 07:01:37 PST 2021
bsmith added a comment.
After much discussion I'm actually incorrect in this assertion, as I mistakenly thought that the ptrue's were ending up being passed straight into the load rather than through the existing svbool convertions. That said this case with (%4, %5 and %7 made not redundant) does now produce worse codegen with this pass:
Currently:
ptrue p0.s
ptrue p1.h
ld1w { z0.s }, p0/z, [x0]
ld1h { z1.h }, p0/z, [x1]
ld1h { z8.h }, p1/z, [x1]
...
With patch:
ptrue p0.h
ptrue p1.s
ptrue p2.b
and p1.b, p2/z, p0.b, p1.b
ld1w { z0.s }, p0/z, [x0]
ld1h { z1.h }, p1/z, [x1]
ld1h { z8.h }, p0/z, [x1]
...
I do wonder whether this should be an MIR pass rather than an IR one?
In D94230#2484195 <https://reviews.llvm.org/D94230#2484195>, @bsmith wrote:
> I'm not sure this patch is correct as it's not taking into account how the predicates are used, for example in following case your patch replaces the `ptrue_b32()` predicate of the `%5` 8 x i16 load with a `ptrue_b16()`, which changes the behaviour.
>
> declare <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 immarg)
> declare <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 immarg)
>
> declare <vscale x 4 x i32> @llvm.aarch64.sve.ld1.nxv4i32(<vscale x 4 x i1>, i32*)
> declare <vscale x 8 x i16> @llvm.aarch64.sve.ld1.nxv8i16(<vscale x 8 x i1>, i16*)
>
> declare <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1>)
> declare <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1>)
>
> define <vscale x 8 x i16> @coalesce_test_basic(i32* %addr1, i16* %addr2) {
> %1 = call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
> %2 = call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %1)
> %3 = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %2)
>
> %4 = call <vscale x 4 x i32> @llvm.aarch64.sve.ld1.nxv4i32(<vscale x 4 x i1> %1, i32* %addr1)
> %5 = call <vscale x 8 x i16> @llvm.aarch64.sve.ld1.nxv8i16(<vscale x 8 x i1> %3, i16* %addr2)
>
> %6 = call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
> %7 = call <vscale x 8 x i16> @llvm.aarch64.sve.ld1.nxv8i16(<vscale x 8 x i1> %6, i16* %addr2)
>
> ret <vscale x 8 x i16> %7
> }
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94230/new/
https://reviews.llvm.org/D94230
More information about the llvm-commits
mailing list