[llvm] [LLVM][CodeGen][AArch64] Don't scalarise v8{f16,bf16} vsetcc operations. (PR #135398)

Tue Apr 15 02:29:50 PDT 2025

================
@@ -4236,9 +4236,11 @@ InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
 
   if (isa<FixedVectorType>(ValTy) && ISD == ISD::SETCC) {
     auto LT = getTypeLegalizationCost(ValTy);
-    // Cost v4f16 FCmp without FP16 support via converting to v4f32 and back.
+    // Cost v#f16 FCmp without FP16 support via converting to v#f32 and back.
     if (LT.second == MVT::v4f16 && !ST->hasFullFP16())
       return LT.first * 4; // fcvtl + fcvtl + fcmp + xtn
+    if (LT.second == MVT::v8f16 && !ST->hasFullFP16())
+      return LT.first * 8; // 2*(fcvtl + fcvtl2 + fcmp) + uzp1 + xtn
----------------
davemgreen wrote:

This would probably be best as separate calls to getCastCost * 2 + getCmpSelCost + getCastCost to the extended type, unless there is a need to override it because it is a special case. It can be slower due to the extra calls being made, but composes better and is more extendable as the cost model changes.

Either way it is likely better to separate out into another commit - they can go in at roughly the same time but it is better to keep patches smaller and the codegen changes can be added before the cost model is updated. Or was this updated as the generic cost model was making changes itself?

https://github.com/llvm/llvm-project/pull/135398