[PATCH] D75674: [AArch64][SVE] Implement vector tuple intrinsics
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 22 10:43:27 PDT 2020
efriedma accepted this revision.
efriedma added a comment.
This revision is now accepted and ready to land.
LGTM
================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7038
+ assert(Src1.getOpcode() == ISD::CONCAT_VECTORS &&
+ "Unexpected operand for sve_tuple_get");
+
----------------
c-rhodes wrote:
> efriedma wrote:
> > c-rhodes wrote:
> > > efriedma wrote:
> > > > This assertion seems ambitious; I'm not sure how you plan to ensure that the sve.tuple.get intrinsic is in the same basic block as the intrinsic that produces the value.
> > > >
> > > > Is there some reason you're putting this handling into the target-independent SelectionDAGBuilder, as opposed to target-specific code?
> > > > This assertion seems ambitious; I'm not sure how you plan to ensure that the sve.tuple.get intrinsic is in the same basic block as the intrinsic that produces the value.
> > >
> > > Sander's patch D80139 fixes copying between BBs and I've updated this patch to use `EXTRACT_SUBVECTOR` to get the vector from the tuple type, rather than looking through the `CONCAT_VECTOR`, so this assert has been removed.
> > >
> > > > Is there some reason you're putting this handling into the target-independent SelectionDAGBuilder, as opposed to target-specific code?
> > >
> > > Tuple types for structured loads/stores were originally implemented with sizeless structs downstream but adding sizeless structs to the C standard isn't going to happen, so we chose to represent them as multiples of 128-bit vectors, e.g. `svint32x2_t` -> `<n x 8 x i32>`. For vector tuples containing 2 and 4 vectors (LD2/LD4, ST2/ST4) this was fine as the types are powers of 2 and can be broken down by the existing mechanisms, but for LD3/ST3 the types are odd sizes:
> > >
> > > - `svint8x3_t` -> `<vscale x 48 x i8>`
> > > - `svint16x3_t` -> `<vscale x 24 x i16>`
> > > - `svint32x3_t` -> `<vscale x 12 x i32>`
> > > - `svint64x2_t` -> `<vscale x 6 x i64>`
> > >
> > > Which are problematic for legalisation when it comes to widening/splitting. To do this lowering in target-specific AArch64ISelLowering MVTs would have to be defined for these types which aren't legal. By lowering here we avoid these issues as these types don't reach type legalization.
> > If you need to run before legalization, you can put code in AArch64TargetLowering::PerformDAGCombine.
> >
> > Alternatively, additional MVTs wouldn't be that terrible.
> > If you need to run before legalization, you can put code in AArch64TargetLowering::PerformDAGCombine.
>
> That's a much better idea :) I've implemented your suggestion, cheers!
>
> > Alternatively, additional MVTs wouldn't be that terrible.
> I'm a little confused about this, there's a comment at the top of `MachineValueType.h`:
>
> ```// This file defines the set of machine-level target independent types which
> // legal values in the code generator use.```
>
> which to me implies MVTs should only be implemented for legal types which tuple vectors aren't, although I've also seen `v3i32` which seems to be used in the AMDGPU backend but I don't know if the ISA specifies a register for that. I don't think I want to implements MVTs in this patch but I'm curious if it's ok to define them for illegal types?
We treat the "legal values in the code generator" bit a little loosely; really, it's any type that useful to define as an MVT for some backend.
And yes, if I recall that discussion correctly, I think AMDGPU actually does have operations on v3i32 registers.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13529
+ if (!DCI.isBeforeLegalizeOps())
+ return SDValue();
+
----------------
No point to explicitly checking isBeforeLegalizeOps(); the intrinsics should be gone after that.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D75674/new/
https://reviews.llvm.org/D75674
More information about the llvm-commits
mailing list