[PATCH] D75674: [AArch64][SVE] Implement vector tuple intrinsics
Cullen Rhodes via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 22 10:10:47 PDT 2020
c-rhodes added inline comments.
================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7038
+ assert(Src1.getOpcode() == ISD::CONCAT_VECTORS &&
+ "Unexpected operand for sve_tuple_get");
+
----------------
efriedma wrote:
> c-rhodes wrote:
> > efriedma wrote:
> > > This assertion seems ambitious; I'm not sure how you plan to ensure that the sve.tuple.get intrinsic is in the same basic block as the intrinsic that produces the value.
> > >
> > > Is there some reason you're putting this handling into the target-independent SelectionDAGBuilder, as opposed to target-specific code?
> > > This assertion seems ambitious; I'm not sure how you plan to ensure that the sve.tuple.get intrinsic is in the same basic block as the intrinsic that produces the value.
> >
> > Sander's patch D80139 fixes copying between BBs and I've updated this patch to use `EXTRACT_SUBVECTOR` to get the vector from the tuple type, rather than looking through the `CONCAT_VECTOR`, so this assert has been removed.
> >
> > > Is there some reason you're putting this handling into the target-independent SelectionDAGBuilder, as opposed to target-specific code?
> >
> > Tuple types for structured loads/stores were originally implemented with sizeless structs downstream but adding sizeless structs to the C standard isn't going to happen, so we chose to represent them as multiples of 128-bit vectors, e.g. `svint32x2_t` -> `<n x 8 x i32>`. For vector tuples containing 2 and 4 vectors (LD2/LD4, ST2/ST4) this was fine as the types are powers of 2 and can be broken down by the existing mechanisms, but for LD3/ST3 the types are odd sizes:
> >
> > - `svint8x3_t` -> `<vscale x 48 x i8>`
> > - `svint16x3_t` -> `<vscale x 24 x i16>`
> > - `svint32x3_t` -> `<vscale x 12 x i32>`
> > - `svint64x2_t` -> `<vscale x 6 x i64>`
> >
> > Which are problematic for legalisation when it comes to widening/splitting. To do this lowering in target-specific AArch64ISelLowering MVTs would have to be defined for these types which aren't legal. By lowering here we avoid these issues as these types don't reach type legalization.
> If you need to run before legalization, you can put code in AArch64TargetLowering::PerformDAGCombine.
>
> Alternatively, additional MVTs wouldn't be that terrible.
> If you need to run before legalization, you can put code in AArch64TargetLowering::PerformDAGCombine.
That's a much better idea :) I've implemented your suggestion, cheers!
> Alternatively, additional MVTs wouldn't be that terrible.
I'm a little confused about this, there's a comment at the top of `MachineValueType.h`:
```// This file defines the set of machine-level target independent types which
// legal values in the code generator use.```
which to me implies MVTs should only be implemented for legal types which tuple vectors aren't, although I've also seen `v3i32` which seems to be used in the AMDGPU backend but I don't know if the ISA specifies a register for that. I don't think I want to implements MVTs in this patch but I'm curious if it's ok to define them for illegal types?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D75674/new/
https://reviews.llvm.org/D75674
More information about the llvm-commits
mailing list