[llvm] [LLVM][CodeGen][AArch64] Don't scalarise v8{f16,bf16} vsetcc operations. (PR #135398)

Tue Apr 15 05:10:38 PDT 2025

================
@@ -4236,9 +4236,11 @@ InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
 
   if (isa<FixedVectorType>(ValTy) && ISD == ISD::SETCC) {
     auto LT = getTypeLegalizationCost(ValTy);
-    // Cost v4f16 FCmp without FP16 support via converting to v4f32 and back.
+    // Cost v#f16 FCmp without FP16 support via converting to v#f32 and back.
     if (LT.second == MVT::v4f16 && !ST->hasFullFP16())
       return LT.first * 4; // fcvtl + fcvtl + fcmp + xtn
+    if (LT.second == MVT::v8f16 && !ST->hasFullFP16())
+      return LT.first * 8; // 2*(fcvtl + fcvtl2 + fcmp) + uzp1 + xtn
----------------
paulwalker-arm wrote:

Yes, the change to the lowering code triggered a change to the cost model (causing it to start returning 1) so I made this change to counteract it.

It does not look like I can precommit the cost model change because when using `getCmpSelCost` with the promoted type it assumes no scalarisation, returning a much lower cost than reality, which this PR fixes.  That said, presumably the existing `MVT::v4f16` is equally bad? I could fix that to use `getCastCost * 2 + getCmpSelCost` and then this PR would just remove the type restriction?

A side note on using `getCmpSelCost` is that is does not account for the truncation to the expected boolean vector type and so it would return a lower cost (8 -> 6).  We currently ignore this truncation for float/double (and half when fullfp16 is available) based vector setcc operations and so it seems fair for the promoted f16 path to do likewise? I assume the expectation is that in real code the truncation will never actually happen and that is why it is ignored.

https://github.com/llvm/llvm-project/pull/135398