[PATCH] D82910: [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR

Wed Jul 1 12:27:08 PDT 2020

efriedma added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:19238
       if ((NVT.getVectorNumElements() % DestSrcRatio) == 0) {
-        unsigned NewExtNumElts = NVT.getVectorNumElements() / DestSrcRatio;
+        ElementCount NewExtEC = NVT.getVectorElementCount() / DestSrcRatio;
         EVT ScalarVT = SrcVT.getScalarType();
----------------
sdesmalen wrote:
> efriedma wrote:
> > Does this math work correctly if we're extracting a fixed vector from a scalable vector?
> Yes. To be sure, I tested this with some intrinsic that maps to EXTRACT_SUBVECTOR:
> ```define <2 x i64> @extract_2i64_nxv16i8(<vscale x 16 x i8> %z0) {
>   %z0_bc = bitcast <vscale x 16 x i8> %z0 to <vscale x 2 x i64>
>   %ext = call <2 x i64> @llvm.experimental.extractsubvec.v2i64.nxv2i64(<vscale x 2 x i64> %z0_bc, i32 2)
>   ret <2 x i64> %ext
> }
> =>
> Optimized lowered selection DAG: %bb.0 'extract_2i64_nxv16i8:'
> SelectionDAG has 9 nodes:
>   t0: ch = EntryToken
>         t2: nxv16i8,ch = CopyFromReg t0, Register:nxv16i8 %0
>       t10: v16i8 = extract_subvector t2, Constant:i64<16>
>     t11: v2i64 = bitcast t10
>   t7: ch,glue = CopyToReg t0, Register:v2i64 $q0, t11
>   t8: ch = AArch64ISD::RET_FLAG t7, Register:v2i64 $q0, t7:1
> 
> ```
> and for the other:
> ```
> define <16 x i8> @extract_16i8_nxv2i64(<vscale x 2 x i64> %z0) {
>   %z0_bc = bitcast <vscale x 2 x i64> %z0 to <vscale x 16 x i8>
>   %ext = call <16 x i8> @llvm.experimental.extractsubvec.v16i8.nxv16i8(<vscale x 16 x i8> %z0_bc, i32 16)
>   ret <16 x i8> %ext
> }
> =>
> Optimized lowered selection DAG: %bb.0 'extract_16i8_nxv2i64:'
> SelectionDAG has 9 nodes:
>   t0: ch = EntryToken
>         t2: nxv2i64,ch = CopyFromReg t0, Register:nxv2i64 %0
>       t10: v2i64 = extract_subvector t2, Constant:i64<2>
>     t11: v16i8 = bitcast t10
>   t7: ch,glue = CopyToReg t0, Register:v16i8 $q0, t11
>   t8: ch = AArch64ISD::RET_FLAG t7, Register:v16i8 $q0, t7:1
> ```
> 
> I wasn't planning to add intrinsic as part of this patch to test the behaviour.
I'm specifically concerned about cases where the number of lanes in the output fixed vector is greater than the number of lanes in the input scalable vector.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82910/new/

https://reviews.llvm.org/D82910