[llvm] [AArch64] Give a higher cost for more expensive SVE FCMP instructions (PR #153816)

Fri Aug 15 07:55:10 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

<details>
<summary>Changes</summary>

This tries to add a higher cost for SVE FCM** comparison instructions that often have a lower throughput than the Neon equivalents that can be exectured on more vector pipelines.

This patch takes the slightly unorthodox approach of using the information in the scheduling model to compare the throughput of a FCMEQ_PPzZZ_S (SVE) and a FCMEQv4f32 (Neon). This isn't how things will (probably) want to look in the long run, where all the information comes more directly from the scheduling model, but that still needs to be proven out. The downsides of this approach of using the scheduling model info is if the core does not have a scheduling model, but wants a different cost then an alternative approach will be needed (but then maybe that is a good reason to create a new scheduling model).

The alternative would either be to make a subtarget feature for the affected cores (as in https://github.com/llvm/llvm-project/commit/17bf27d57468c8a03477dabb99c3ecafe16d8727) or just always enable it, although that might need to change in the future.

---

Patch is 365.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153816.diff


16 Files Affected:

- (modified) llvm/lib/Target/AArch64/AArch64SchedA510.td (+1-1) 
- (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+34) 
- (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+5) 
- (modified) llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll (+5-5) 
- (modified) llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll (+197-197) 
- (modified) llvm/test/CodeGen/AArch64/sve-bf16-converts.ll (+2-2) 
- (modified) llvm/test/CodeGen/AArch64/sve-fptosi-sat.ll (+50-50) 
- (modified) llvm/test/CodeGen/AArch64/sve-fptoui-sat.ll (+33-33) 
- (modified) llvm/test/CodeGen/AArch64/sve-llrint.ll (+1030-1031) 
- (modified) llvm/test/CodeGen/AArch64/sve-lrint.ll (+1030-1031) 
- (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-compares.ll (+21-21) 
- (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ptest.ll (+74-74) 
- (modified) llvm/test/CodeGen/AArch64/sve2-bf16-converts.ll (+12-12) 
- (modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+9-16) 
- (modified) llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll (+7-7) 
- (modified) llvm/test/tools/llvm-mca/AArch64/Cortex/A510-sve-instructions.s (+103-103) 


``````````diff

diff --git a/llvm/lib/Target/AArch64/AArch64SchedA510.td b/llvm/lib/Target/AArch64/AArch64SchedA510.td
index b93d67f3091e7..356e3fa39c53f 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedA510.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedA510.td
@@ -1016,7 +1016,7 @@ def : InstRW<[CortexA510MCWrite<16, 13, CortexA510UnitVALU>], (instrs FADDA_VPZ_
 def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU>], (instrs FADDA_VPZ_D)>;
 
 // Floating point compare
-def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
+def : InstRW<[CortexA510MCWrite<4, 2, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
                                             "^FCM(EQ|GE|GT|NE)_PPzZ[0Z]_[HSD]",
                                             "^FCM(LE|LT)_PPzZ0_[HSD]",
                                             "^FCMUO_PPzZZ_[HSD]")>;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3042251cf754d..3ec45178af68e 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4365,6 +4365,34 @@ AArch64TTIImpl::getAddressComputationCost(Type *PtrTy, ScalarEvolution *SE,
   return 1;
 }
 
+/// Check whether Opcode1 has less throughput according to the scheduling
+/// model than Opcode2.
+bool AArch64TTIImpl::hasLessThroughputFromSchedulingModel(
+    unsigned Opcode1, unsigned Opcode2) const {
+  const MCSchedModel &Sched = ST->getSchedModel();
+  const TargetInstrInfo *TII = ST->getInstrInfo();
+  if (!Sched.hasInstrSchedModel())
+    return false;
+
+  auto ResolveVariant = [&](unsigned Opcode) {
+    unsigned SCIdx = TII->get(Opcode).getSchedClass();
+    while (SCIdx && Sched.getSchedClassDesc(SCIdx)->isVariant())
+      SCIdx = ST->resolveVariantSchedClass(SCIdx, nullptr, TII,
+                                           Sched.getProcessorID());
+    assert(SCIdx);
+    const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx];
+    const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc);
+    const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc);
+    unsigned Min = Sched.IssueWidth;
+    for (const MCWriteProcResEntry *I = B; I != E; I++)
+      Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits /
+                              (I->ReleaseAtCycle - I->AcquireAtCycle));
+    return Min;
+  };
+
+  return ResolveVariant(Opcode1) < ResolveVariant(Opcode2);
+}
+
 InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
     unsigned Opcode, Type *ValTy, Type *CondTy, CmpInst::Predicate VecPred,
     TTI::TargetCostKind CostKind, TTI::OperandValueInfo Op1Info,
@@ -4462,6 +4490,12 @@ InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
              (VecPred == FCmpInst::FCMP_ONE || VecPred == FCmpInst::FCMP_UEQ))
       Factor = 3; // fcmxx+fcmyy+or
 
+    if (isa<ScalableVectorType>(ValTy) &&
+        CostKind == TTI::TCK_RecipThroughput &&
+        hasLessThroughputFromSchedulingModel(AArch64::FCMEQ_PPzZZ_S,
+                                             AArch64::FCMEQv4f32))
+      Factor *= 2;
+
     return Factor * (CostKind == TTI::TCK_Latency ? 2 : LT.first);
   }
 
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 9c96fdd427814..fe36e43c616c1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -174,6 +174,11 @@ class AArch64TTIImpl final : public BasicTTIImplBase<AArch64TTIImpl> {
 
   bool prefersVectorizedAddressing() const override;
 
+  /// Check whether Opcode1 has less throughput according to the scheduling
+  /// model than Opcode2.
+  bool hasLessThroughputFromSchedulingModel(unsigned Opcode1,
+                                            unsigned Opcode2) const;
+
   InstructionCost
   getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
                         unsigned AddressSpace,
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
index 0789707a53b90..5029ef974681d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
@@ -58,10 +58,10 @@ define <vscale x 32 x i1> @cmp_nxv32i1() {
 ; Check fcmp for legal FP vectors
 define void @cmp_legal_fp() #0 {
 ; CHECK-LABEL: 'cmp_legal_fp'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %1 = fcmp oge <vscale x 2 x double> undef, undef
@@ -74,7 +74,7 @@ define void @cmp_legal_fp() #0 {
 ; Check fcmp for an illegal FP vector
 define <vscale x 16 x i1> @cmp_nxv16f16() {
 ; CHECK-LABEL: 'cmp_nxv16f16'
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret <vscale x 16 x i1> %res
 ;
   %res = fcmp oge <vscale x 16 x half> undef, undef
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
index a14176cd6d532..c48ca20949c21 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
@@ -3,15 +3,15 @@
 
 define void @fcmp_oeq(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oeq'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
@@ -28,9 +28,9 @@ define void @fcmp_oeq(i32 %arg) {
 
 define void @fcmp_oeq_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oeq_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
@@ -41,15 +41,15 @@ define void @fcmp_oeq_bfloat(i32 %arg) {
 
 define void @fcmp_ogt(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_ogt'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
@@ -66,9 +66,9 @@ define void @fcmp_ogt(i32 %arg) {
 
 define void @fcmp_ogt_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_ogt_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
@@ -79,15 +79,15 @@ define void @fcmp_ogt_bfloat(i32 %arg) {
 
 define void @fcmp_oge(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oge'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
@@ -104,9 +104,9 @@ define void @fcmp_oge(i32 %arg) {
 
 define void @fcmp_oge_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oge_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
@@ -117,15 +117,15 @@ define void @fcmp_oge_bfloat(i32 %arg) {
 
 define void @fcmp_olt(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_olt'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
@@ -142,9 +142,9 @@ define void @fcmp_olt(i32 %arg) {
 
 define void @fcmp_olt_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_olt_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp olt <vscale x 2 x bfloat> undef, unde...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/153816