[PATCH] D81139: [ARM] MVE VCVT lowering for f32->f16 truncs

Thu Jun 4 16:36:55 PDT 2020

dmgreen added a comment.

Hmm. shuffle_trunc1 comes from something like this:

      t17: v4f32 = vector_shuffle<0,4,1,5> t3, t6
      t18: v4f32 = vector_shuffle<2,6,3,7> t3, t6
    t19: v8f32 = concat_vectors t17, t18
  t12: v8f16 = fp_round t19, TargetConstant:i32<0>

Which then gets split into two halves because of the v8f32 and the two halves look like this in WidenVecRes_Convert:

    t17: v4f32 = vector_shuffle<0,4,1,5> t3, t6
  t20: v4f16 = fp_round t17, TargetConstant:i32<0>

Which need to be concat back together. The two halves of the BUILD_VECTOR are combined and that is what we end up lowering.

shuffle_trunc3 is that but twice as wide, starting with:

      t32: v8f32 = vector_shuffle<0,8,1,9,2,10,3,11> t13, t14
      t33: v8f32 = vector_shuffle<4,12,5,13,6,14,7,15> t13, t14
    t34: v16f32 = concat_vectors t32, t33
  t20: v16f16 = fp_round t34, TargetConstant:i32<0>

We end up with 4 BuildVectors that are combined back together into 2.

shuffle_trunc5 is this before we start legalizing the types:

      t8: v4f16 = fp_round t3, TargetConstant:i32<0>
    t11: v8f16 = concat_vectors t8, undef:v4f16
      t9: v4f16 = fp_round t6, TargetConstant:i32<0>
    t12: v8f16 = concat_vectors t9, undef:v4f16
  t13: v8f16 = vector_shuffle<0,8,1,9,2,10,3,11> t11, t12

So it's hard to see the shuffle from the fp_round. Again though, it creates BuildVectors, BuildVectors simplify, we lower from the BuildVectors. Perhaps that's a bit of a stranger case with the v4f16 vectors. But unfortunately they are likely to come up from the vectorizer at the moment.

shuffle_trunc7 is the same thing but double the width:

        t13: v8f32 = concat_vectors t3, t6
      t16: v8f16 = fp_round t13, TargetConstant:i32<0>
    t19: v16f16 = concat_vectors t16, undef:v8f16
        t14: v8f32 = concat_vectors t9, t12
      t17: v8f16 = fp_round t14, TargetConstant:i32<0>
    t20: v16f16 = concat_vectors t17, undef:v8f16
  t21: v16f16 = vector_shuffle<0,16,1,17,2,18,3,19,4,20,5,21,6,22,7,23> t19, t20

I guess I'm still having trouble seeing what we would reliably latch onto here.

I do have some old code that was using a dagcombine on a fptrunc(shufflevector), but that didn't handle all these cases and doing this from a buildvector seemed much simpler. It is the way that we lower all shuffles in the arm backend (like vext and vmovn) after all. The only difference here is that we have an fptrunc in the mix too.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81139/new/

https://reviews.llvm.org/D81139