[PATCH] D82910: [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 1 12:27:10 PDT 2020
efriedma accepted this revision.
efriedma added a comment.
This revision is now accepted and ready to land.
LGTM
================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:19238
if ((NVT.getVectorNumElements() % DestSrcRatio) == 0) {
- unsigned NewExtNumElts = NVT.getVectorNumElements() / DestSrcRatio;
+ ElementCount NewExtEC = NVT.getVectorElementCount() / DestSrcRatio;
EVT ScalarVT = SrcVT.getScalarType();
----------------
efriedma wrote:
> sdesmalen wrote:
> > efriedma wrote:
> > > Does this math work correctly if we're extracting a fixed vector from a scalable vector?
> > Yes. To be sure, I tested this with some intrinsic that maps to EXTRACT_SUBVECTOR:
> > ```define <2 x i64> @extract_2i64_nxv16i8(<vscale x 16 x i8> %z0) {
> > %z0_bc = bitcast <vscale x 16 x i8> %z0 to <vscale x 2 x i64>
> > %ext = call <2 x i64> @llvm.experimental.extractsubvec.v2i64.nxv2i64(<vscale x 2 x i64> %z0_bc, i32 2)
> > ret <2 x i64> %ext
> > }
> > =>
> > Optimized lowered selection DAG: %bb.0 'extract_2i64_nxv16i8:'
> > SelectionDAG has 9 nodes:
> > t0: ch = EntryToken
> > t2: nxv16i8,ch = CopyFromReg t0, Register:nxv16i8 %0
> > t10: v16i8 = extract_subvector t2, Constant:i64<16>
> > t11: v2i64 = bitcast t10
> > t7: ch,glue = CopyToReg t0, Register:v2i64 $q0, t11
> > t8: ch = AArch64ISD::RET_FLAG t7, Register:v2i64 $q0, t7:1
> >
> > ```
> > and for the other:
> > ```
> > define <16 x i8> @extract_16i8_nxv2i64(<vscale x 2 x i64> %z0) {
> > %z0_bc = bitcast <vscale x 2 x i64> %z0 to <vscale x 16 x i8>
> > %ext = call <16 x i8> @llvm.experimental.extractsubvec.v16i8.nxv16i8(<vscale x 16 x i8> %z0_bc, i32 16)
> > ret <16 x i8> %ext
> > }
> > =>
> > Optimized lowered selection DAG: %bb.0 'extract_16i8_nxv2i64:'
> > SelectionDAG has 9 nodes:
> > t0: ch = EntryToken
> > t2: nxv2i64,ch = CopyFromReg t0, Register:nxv2i64 %0
> > t10: v2i64 = extract_subvector t2, Constant:i64<2>
> > t11: v16i8 = bitcast t10
> > t7: ch,glue = CopyToReg t0, Register:v16i8 $q0, t11
> > t8: ch = AArch64ISD::RET_FLAG t7, Register:v16i8 $q0, t7:1
> > ```
> >
> > I wasn't planning to add intrinsic as part of this patch to test the behaviour.
> I'm specifically concerned about cases where the number of lanes in the output fixed vector is greater than the number of lanes in the input scalable vector.
Actually, hmm, I think that's fine; the math operations doesn't care about the total number of elements in the output.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D82910/new/
https://reviews.llvm.org/D82910
More information about the llvm-commits
mailing list