[PATCH] D70542: [AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets

Eli Friedman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 25 12:18:42 PST 2019


efriedma added inline comments.


================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-offset.ll:15
+; CHECK-NEXT: ret
+  %load = call <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1> %pg,
+                                                                     i8* %base,
----------------
andwar wrote:
> efriedma wrote:
> > This doesn't match the way the corresponding C intrinsics are defined in the ACLE spec.  Are you intentionally diverging here?
> Thank you for taking a look @efriedma! Yes, this is intentional. Well spotted!
> 
> If we used `<nxv2i64>` as the return type here then we wouldn't need `zext`. However, we'd need some other way to differentiate between `ld1b` and `ld1sb` later. That would basically  double the number of intrinsics. We felt that leaving the `sign/zero` here (to be folded using a simple DAGCombine) is a good compromise. I will be upstreaming that code shortly.
> 
> I should also point out the we have more intrinsics like this to upstream - this patch covers only 2 addressing modes.
My only concern with this is that it means we're committing to a specific layout for these types: specifically, that converting from `<vscale x 2 x i8>` to `<vscale x 2 x i64>` using an ANY_EXTEND is a no-op.  (Otherwise, we would be forced to generate some very nasty code to lower a load that isn't immediately extended.)

For other targets, we've found that doing legalization by promoting the integer type isn't the best approach.  For example, on x86, we used to legalize `<2 x i16>` by promoting it to `<2 x i64>`.  But that was changed to widen it to `<8 x i16>`, where the padding is all at the end, because the operations were  generally cheaper.

Maybe we could implement some hybrid approach to legalization, though, that widens `<vscale 2 x i8>` to `<vscale x 16 x i8>`, but interleaves the padding so the conversion to `<vscale 2 x i64>` is just a bitcast.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70542/new/

https://reviews.llvm.org/D70542





More information about the llvm-commits mailing list