[PATCH] D120953: [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads

Tue Mar 8 07:05:34 PST 2022

Allen marked an inline comment as done.
Allen added a comment.

ping ?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1238

+    setLoadExtAction(ISD::SEXTLOAD, MVT::nxv4i64, MVT::nxv4i32, Expand);
+
----------------
paulwalker-arm wrote:
> david-arm wrote:
> > So this is really just an optimisation where you're trying to reduce the number of loads we perform, in favour of more unpklo/hi instructions. It seems to make sense and looks like an improvement, but this change is very specific to one set of types and one extension type. What about ISD::ZEXTLOAD and other extensions from legal types, i.e.
> > 
> >   %wide.masked.load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0nxv8i16(<vscale x 8 x i16>* %base, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)
> >   %res = sext <vscale x 8 x i16> %wide.masked.load to <vscale x 8 x i64>
> > 
> > I think if we go down this route it's worth adding all other extension types from legal inputs too.
> It seems we're currently marking all the scalable floating point types as Expand, so we should do the same for all the scalable integer types and then selectively enable the ones that make sense.  Although I'd actually prefer the common default to be Expand rather than forcing all targets to do the initialisation, but I guess that's outside the scope of this patch.
Thanks very much, and also update comment.
1、ISD::ZEXTLOAD added
2、types vscale x 16 x i8, vscale x 8 x i16 and vscale x 4 x i32 are considered, please let me know if more types need be added.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1238

+    setLoadExtAction(ISD::SEXTLOAD, MVT::nxv4i64, MVT::nxv4i32, Expand);
+
----------------
Allen wrote:
> paulwalker-arm wrote:
> > david-arm wrote:
> > > So this is really just an optimisation where you're trying to reduce the number of loads we perform, in favour of more unpklo/hi instructions. It seems to make sense and looks like an improvement, but this change is very specific to one set of types and one extension type. What about ISD::ZEXTLOAD and other extensions from legal types, i.e.
> > > 
> > >   %wide.masked.load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0nxv8i16(<vscale x 8 x i16>* %base, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)
> > >   %res = sext <vscale x 8 x i16> %wide.masked.load to <vscale x 8 x i64>
> > > 
> > > I think if we go down this route it's worth adding all other extension types from legal inputs too.
> > It seems we're currently marking all the scalable floating point types as Expand, so we should do the same for all the scalable integer types and then selectively enable the ones that make sense.  Although I'd actually prefer the common default to be Expand rather than forcing all targets to do the initialisation, but I guess that's outside the scope of this patch.
> Thanks very much, and also update comment.
> 1、ISD::ZEXTLOAD added
> 2、types vscale x 16 x i8, vscale x 8 x i16 and vscale x 4 x i32 are considered, please let me know if more types need be added.
Thanks very much.

As not all of the scalable integer types will be benifit from the disable of extending loads, so I only choose part of them, does it right ?

for example, now **ld1h	{ z0.s }, p0/z, [x0] **is already fine, and there is extra **and	z0.s, z0.s, #0xffff** after disable the extending loads.
```
%wide.masked.load = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %a, i32 4, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef)
  %res = zext <vscale x 4 x i16> %wide.masked.load to <vscale x 4 x i32>
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120953/new/

https://reviews.llvm.org/D120953