[PATCH] D82910: [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR

Wed Jul 1 12:27:10 PDT 2020

efriedma accepted this revision.
efriedma added a comment.
This revision is now accepted and ready to land.

LGTM

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:19238
       if ((NVT.getVectorNumElements() % DestSrcRatio) == 0) {
-        unsigned NewExtNumElts = NVT.getVectorNumElements() / DestSrcRatio;
+        ElementCount NewExtEC = NVT.getVectorElementCount() / DestSrcRatio;
         EVT ScalarVT = SrcVT.getScalarType();
----------------
efriedma wrote:
> sdesmalen wrote:
> > efriedma wrote:
> > > Does this math work correctly if we're extracting a fixed vector from a scalable vector?
> > Yes. To be sure, I tested this with some intrinsic that maps to EXTRACT_SUBVECTOR:
> > ```define <2 x i64> @extract_2i64_nxv16i8(<vscale x 16 x i8> %z0) {
> >   %z0_bc = bitcast <vscale x 16 x i8> %z0 to <vscale x 2 x i64>
> >   %ext = call <2 x i64> @llvm.experimental.extractsubvec.v2i64.nxv2i64(<vscale x 2 x i64> %z0_bc, i32 2)
> >   ret <2 x i64> %ext
> > }
> > =>
> > Optimized lowered selection DAG: %bb.0 'extract_2i64_nxv16i8:'
> > SelectionDAG has 9 nodes:
> >   t0: ch = EntryToken
> >         t2: nxv16i8,ch = CopyFromReg t0, Register:nxv16i8 %0
> >       t10: v16i8 = extract_subvector t2, Constant:i64<16>
> >     t11: v2i64 = bitcast t10
> >   t7: ch,glue = CopyToReg t0, Register:v2i64 $q0, t11
> >   t8: ch = AArch64ISD::RET_FLAG t7, Register:v2i64 $q0, t7:1
> > 
> > ```
> > and for the other:
> > ```
> > define <16 x i8> @extract_16i8_nxv2i64(<vscale x 2 x i64> %z0) {
> >   %z0_bc = bitcast <vscale x 2 x i64> %z0 to <vscale x 16 x i8>
> >   %ext = call <16 x i8> @llvm.experimental.extractsubvec.v16i8.nxv16i8(<vscale x 16 x i8> %z0_bc, i32 16)
> >   ret <16 x i8> %ext
> > }
> > =>
> > Optimized lowered selection DAG: %bb.0 'extract_16i8_nxv2i64:'
> > SelectionDAG has 9 nodes:
> >   t0: ch = EntryToken
> >         t2: nxv2i64,ch = CopyFromReg t0, Register:nxv2i64 %0
> >       t10: v2i64 = extract_subvector t2, Constant:i64<2>
> >     t11: v16i8 = bitcast t10
> >   t7: ch,glue = CopyToReg t0, Register:v16i8 $q0, t11
> >   t8: ch = AArch64ISD::RET_FLAG t7, Register:v16i8 $q0, t7:1
> > ```
> > 
> > I wasn't planning to add intrinsic as part of this patch to test the behaviour.
> I'm specifically concerned about cases where the number of lanes in the output fixed vector is greater than the number of lanes in the input scalable vector.
Actually, hmm, I think that's fine; the math operations doesn't care about the total number of elements in the output.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82910/new/

https://reviews.llvm.org/D82910