<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/99760>99760</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [regression][AArch64] cannot build sparta (with -flto) for A64FX after PR #93300
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          pawosm-arm
      </td>
    </tr>
</table>

<pre>
    As in the title, after PR #93300 (commit 43100766f287185642a3ccbf1a629915f85575e2) I cannot build sparta (https://github.com/sparta/sparta.git) with `-flto` for A64FX:

```
mpicxx -O3 -fno-math-errno -mcpu=a64fx -ffp-contract=fast -flto -fdelayed-template-parsing -Wno-error=missing-template-arg-list-after-template-kw     adapt_grid.o balance_grid.o collide.o collide_vss.o collide_vss_kokkos.o comm.o comm_kokkos.o compute.o compute_boundary.o compute_bo
undary_kokkos.o compute_count.o compute_count_kokkos.o compute_distsurf_grid.o compute_distsurf_grid_kokkos.o compute_dt_grid.o compute_dt_grid_kokkos.o compute_eflux_grid.o compute_eflux_grid_kokkos.o compute_grid.o compute_grid_kokkos.o compute_isurf_grid.o compute_ke_particle.o compute_ke_particle_kokkos.o compu
te_lambda_grid.o compute_lambda_grid_kokkos.o compute_pflux_grid.o compute_pflux_grid_kokkos.o compute_property_grid.o compute_property_grid_kokkos.o compute_property_surf.o compute_react_boundary.o compute_react_isurf_grid.o compute_react_surf.o compute_reduce.o compute_sonine_grid.o compute_sonine_grid_kokkos.o c
ompute_surf.o compute_surf_kokkos.o compute_temp.o compute_temp_kokkos.o compute_thermal_grid.o compute_thermal_grid_kokkos.o compute_tvib_grid.o compute_tvib_grid_kokkos.o create_box.o create_grid.o create_isurf.o create_particles.o create_particles_kokkos.o custom.o cut2d.o cut3d.o domain.o domain_kokkos.o dump.o
 dump_grid.o dump_image.o dump_movie.o dump_particle.o dump_surf.o error.o finish.o fix.o fix_ablate.o fix_adapt.o fix_adapt_kokkos.o fix_ambipolar.o fix_ambipolar_kokkos.o fix_ave_grid.o fix_ave_grid_kokkos.o fix_ave_histo.o fix_ave_histo_kokkos.o fix_ave_histo_weight.o fix_ave_histo_weight_kokkos.o fix_ave_surf.o
 fix_ave_time.o fix_balance.o fix_balance_kokkos.o fix_dt_reset.o fix_dt_reset_kokkos.o fix_emit.o fix_emit_face.o fix_emit_face_file.o fix_emit_face_kokkos.o fix_emit_surf.o fix_field_grid.o fix_field_particle.o fix_grid_check.o fix_grid_check_kokkos.o fix_move_surf.o fix_move_surf_kokkos.o fix_print.o fix_surf_te
mp.o fix_surf_temp_kokkos.o fix_temp_global_rescale.o fix_temp_rescale.o fix_temp_rescale_kokkos.o fix_vibmode.o fix_vibmode_kokkos.o geometry.o grid.o grid_adapt.o grid_collate.o grid_comm.o grid_custom.o grid_custom_kokkos.o grid_id.o grid_id_kokkos.o grid_kokkos.o grid_surf.o hashlittle.o image.o input.o irregul
ar.o irregular_kokkos.o kokkos.o kokkos_scan.o library.o main.o marching_cubes.o marching_squares.o math_extra.o memory.o mixture.o modify.o modify_kokkos.o move_surf.o output.o particle.o particle_custom.o particle_custom_kokkos.o particle_kokkos.o rand_pool_wrap.o random_knuth.o random_mars.o rcb.o react.o react_
bird.o react_bird_kokkos.o react_qk.o react_tce.o react_tce_kokkos.o react_tce_qk.o read_grid.o read_isurf.o read_particles.o read_restart.o read_surf.o read_surf_kokkos.o region.o region_block.o region_cylinder.o region_intersect.o region_plane.o region_sphere.o region_union.o remove_surf.o run.o scale_particles.o
 sparta.o stats.o surf.o surf_collate.o surf_collide.o surf_collide_adiabatic.o surf_collide_cll.o surf_collide_diffuse.o surf_collide_diffuse_kokkos.o surf_collide_impulsive.o surf_collide_piston.o surf_collide_piston_kokkos.o surf_collide_specular.o surf_collide_specular_kokkos.o surf_collide_td.o surf_collide_tr
ansparent.o surf_collide_transparent_kokkos.o surf_collide_vanish.o surf_collide_vanish_kokkos.o surf_comm.o surf_custom.o surf_custom_kokkos.o surf_kokkos.o surf_react.o surf_react_adsorb.o surf_react_global.o surf_react_global_kokkos.o surf_react_prob.o surf_react_prob_kokkos.o timer.o universe.o update.o update_
kokkos.o variable.o write_grid.o write_isurf.o write_restart.o write_surf.o -lkokkos -ldl    -mtune=a64fx -mcpu=a64fx -fopenmp=libomp -L../Obj_astra   -o ../spa_astra
LLVM ERROR: Don't know how to widen the operands for INSERT_SUBVECTOR
clang++: error: unable to execute command: Aborted (core dumped)
clang++: error: linker command failed due to signal (use -v to see invocation)
make[1]: *** [Makefile:79: ../spa_astra] Error 1
```

The 43100766f287185642a3ccbf1a629915f85575e2 commit does not revert cleanly with today's top of `main`, but the problem goes away with the following reversion attempt (after conflict resoultion):

```
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4644,6 +4644,28 @@ bool LoopVectorizationPlanner::isMoreProfitable(

   unsigned MaxTripCount = PSE.getSE()->getSmallConstantMaxTripCount(OrigLoop);

+ if (!A.Width.isScalable() && !B.Width.isScalable() && MaxTripCount) {
+ // If the trip count is a known (possibly small) constant, the trip count
+    // will be rounded up to an integer number of iterations under
+    // FoldTailByMasking. The total cost in that case will be
+ // VecCost*ceil(TripCount/VF). When not folding the tail, the total
+ // cost will be VecCost*floor(TC/VF) + ScalarCost*(TC%VF). There will be
+ // some extra overheads, but for the purpose of comparing the costs of
+    // different VFs we can use this to compare the total loop-body cost
+    // expected after vectorization.
+    auto GetCostForTC = [MaxTripCount, this](unsigned VF,
+ InstructionCost VectorCost,
+ InstructionCost ScalarCost) {
+      return CM.foldTailByMasking() ? VectorCost * divideCeil(MaxTripCount, VF)
+ : VectorCost * (MaxTripCount / VF) +
+ ScalarCost * (MaxTripCount % VF);
+    };
+    auto RTCostA = GetCostForTC(A.Width.getFixedValue(), CostA, A.ScalarCost);
+    auto RTCostB = GetCostForTC(B.Width.getFixedValue(), CostB, B.ScalarCost);
+
+    return RTCostA < RTCostB;
+  }
+
   // Improve estimate for the vector width if it is scalable.
   unsigned EstimatedWidthA = A.Width.getKnownMinValue();
   unsigned EstimatedWidthB = B.Width.getKnownMinValue();
@@ -4657,39 +4679,14 @@ bool LoopVectorizationPlanner::isMoreProfitable(
   // Assume vscale may be larger than 1 (or the value being tuned for),
   // so that scalable vectorization is slightly favorable over fixed-width
   // vectorization.
-  bool PreferScalable = !TTI.preferFixedOverScalableIfEqualCost() &&
- A.Width.isScalable() && !B.Width.isScalable();
-
-  auto CmpFn = [PreferScalable](const InstructionCost &LHS,
- const InstructionCost &RHS) {
-    return PreferScalable ? LHS <= RHS : LHS < RHS;
-  };
+  if (!TTI.preferFixedOverScalableIfEqualCost() &&
+      A.Width.isScalable() && !B.Width.isScalable())
+    return (CostA * B.Width.getFixedValue()) <= (CostB * EstimatedWidthA);

   // To avoid the need for FP division:
-  //      (CostA / EstimatedWidthA) < (CostB / EstimatedWidthB)
-  // <=>  (CostA * EstimatedWidthB) < (CostB * EstimatedWidthA)
-  if (!MaxTripCount)
-    return CmpFn(CostA * EstimatedWidthB, CostB * EstimatedWidthA);
-
-  auto GetCostForTC = [MaxTripCount, this](unsigned VF,
- InstructionCost VectorCost,
- InstructionCost ScalarCost) {
-    // If the trip count is a known (possibly small) constant, the trip count
-    // will be rounded up to an integer number of iterations under
-    // FoldTailByMasking. The total cost in that case will be
-    // VecCost*ceil(TripCount/VF). When not folding the tail, the total
-    // cost will be VecCost*floor(TC/VF) + ScalarCost*(TC%VF). There will be
-    // some extra overheads, but for the purpose of comparing the costs of
-    // different VFs we can use this to compare the total loop-body cost
-    // expected after vectorization.
-    if (CM.foldTailByMasking())
-      return VectorCost * divideCeil(MaxTripCount, VF);
-    return VectorCost * (MaxTripCount / VF) + ScalarCost * (MaxTripCount % VF);
-  };
-
- auto RTCostA = GetCostForTC(EstimatedWidthA, CostA, A.ScalarCost);
-  auto RTCostB = GetCostForTC(EstimatedWidthB, CostB, B.ScalarCost);
-  return CmpFn(RTCostA, RTCostB);
+  //      (CostA / A.Width) < (CostB / B.Width)
+  // <=>  (CostA * B.Width) < (CostB * A.Width)
+  return (CostA * EstimatedWidthB) < (CostB * EstimatedWidthA);
 }

 static void emitInvalidCostRemarks(SmallVector<InstructionVFPair> InvalidCosts,
```

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0Oltz2zqPv4Z5wchjS74-5MF24v062247abZn3zyURNn8QpE6JOUk--t3QOouOz3ntNvJ2AQIgACIm1RTY_hJMnZPFjuyeLijpT0rfV_QV2XygOr8Llbp-_3WAJdgzwwst4KRcA80s0zDtycgYbSJoukUSLhOVJ5zC_NoNp2ulsssXK9m68VyHtIoSeJsRpfhZjNbZOvFYrVgIQk38AkSKqWyEJdcpGAKqi1FWWdrC0OiLQkPJDycuD2X8SRROQkPnqhZTE7coqhXbs9AltMgE1aR5RQypWG7nB_-B8VMH8i0_lxOqz8H5gVP3t4g-BpBkEkV5NSeA6a1VBDkSVGS6IEu59kbBFlWBImSVtPEkugho8aCOw2CLGWCvrM0sCwvBLUsKKg2XJ4g-EMqFKc0iR5ybhDZUlF9CgQ3NnAObfEvr4D_aEoLezxpnk4UxFRQmbAaTJQQPGXt6ngxpg8dX9TLi_LIPK--esiitKxdHWNVypTq9x7Ku8lvjJiPiSqlHcJjspQba0qdtdpfwV9hsyMGe4OUZaJ8G1K3yDHDgPQ6Eb-m9As7YujxRLDr2IEc70HLjoLmcUqH4jrYsQLFNbOKD8wqtCqYtu8jni7-AzY0uIPXjCb2WmD4jav-8VsjQWmZdP1llORydA0dbEdJ78GapC_YaTCyB1NpAF4hOjOdUzHUoYu-wnTh8YijxnXINaMuf95aoObyEK8N8WAdPOYKqiO2NFa5XC5tmPrvCL9TlVMum0XLkZboCe9BB9RquDXP6YnVQK4uvAE6Ie7gSltXyyYKMi65ObvFm_880hiLVw1g7equW40cKo95oQTVQ3hAdmm81gXHRGdurBrCN8iOr4yfzvYGeszkLa88WCMtz2tTq8rch_piUnvUzDA7APtELOe2szxmtBHawMeMizFyJKa-LYQzzkTa9aJHdO4Xkc6tyZklLyNEX3yuGpf04T5ZobmsrXG7ltUNt4_tJiaiHeYkVEwFOimhjYpu5zaqL-bC41ylrA-1JCemcmZdRas84-yto9Ybr0QV0BXomqhf12nYgTrCEdkK7cZrP3odVPnyTM1ZcGudbXVaclmUqA_Xmp1K4T3okqbCdBNmsDiahGJBEDzWvnZXFSKnOjlzeTomZezKTYMwf5ZUVyh7PrI3qykCLFdeAH-zpUbFcpXy7L1ZtEp0w0OV1qvfibWmRzYuHGBaUeN2qqlMj4VS4viqaVEhkEWW9tyCOdWOOonxE7tR_X30Hoy5ThsUAp0jHO7Pl2ZpXRI26yElomrqJsvcuq7vDuhWd4fQzFiqbQ12afvJpNmJK9ksjrFQyUsLJu-Cy5TpFsOlZdqwymiHKgSVrAVNcWa6A5eyPqF7e7pEnE-tjvpVHaymbwXGUot6VlxO-TZzGtDPql3wSFNOY2p5MtxIhBiiUp5lpRmJqNCtt3q7PC9KYfhlxFZgvZfXsTdkmYIlpe9XV_E32Gw6wugqiyW6kLkiOSBotm4IvdCq917Bjlhc0fLrOuM60IC8D9Wp0wJHmhql4z7OF-truGvycNSMx5iWFDsrurmU_IKBjMsi9eHkF1UWNxwXqjmNXXl51bwdtDxQ56GH2rzzcLUZCC8NApEKfPYKcltK1j79DZ4FVcFkXpDoQfBY5QUEnycTEh6-xv8-UmM1RQkKHM4U1OO81p8___gCj09PX59ItIUHJUm4svAi1Suc1StYBa88Zf6BG2dyKlPjHmY__df3x6fn4_f_3v143D9_ffLiEkHliYQ7_Iu2UD1qbqGU6BEUx95YUlrmnv6oTHFzGyttWeqf2jVzAx5LSbj5WKbg8oXpWhBklAuWQlq6Yww_SSpQZGkYBBeHYwy4vKiEWo6GVvJz-sLIYjcjiwcUS8Kt_wOy2H2hLwxnHBJtVxvcHfhw8QCPqA_Mrj7O-8_nM_vL7yGgem-RKmZAKguaXZi2kAhGpXj3LxasSuk7CVcGrCpAZUCWU-ymeG64h7i07r4wkAXL4YSy6Cutuc8MMiWEeuXy5OUbriRQi_OLRZ_5NyqJkpngCepgVClqp334AiMIAqAkPAhxyfGLxyQ8PGMNyZTODQkPP1hileb_y0h4-KxU0cCTpCgqmf6ywx3EvyxqPiXzKQTz5XxOwv0SSLir1uEaqt1YKQFdAS5AvgkqJcNQI9GWmy9Ks29aZdxiKJNw3fUCAJTSvb9K4Qt9e9a82KtSWiDRA3z7_jg5Mfv9EZnCTUCiRwRzKsReSWOptF0eEq6_an5CfZy7dz13hzvgGThJs-3kD57a84Sb7wkVtVrhBki4JCHaOtv9hKR_8AbIatce5N93wafMv2_TvAD3RgW4AeqKhERNCmUMj8U7GDQJpSSVWRiNfdZWOEAt_5ULATEDjc_1LIWywGylEnB2ODENssxjpjHQuWXaXY4BpNXXxB2USJ8pF7v3L9S8cHmaAGagVZYKSJSx_gUitZBQw-rTR1b_YMleGUvCbcK4IOG646fDjwMJNxP448yky9JMiRSzyRlLkbwyHA8diXZK1Fa352RCKY0H7esTMFrBXZyuaKrtRaXAM45ON00wKmfgxmZQF6bPjKamLhBYw12RKHWhDEPnJiovqK7NQCUNqOyai3HQYTgSwI-DgVcGCZWAhdaeORalShRrfQBCqSKIVfruBF8Tyt4KlmAf8NXn0s3FSY-BllbBfzCLPjko_bx3aebKdTea904dLOvhuslOdNy-lfZJGqvLBM9AaeArgHf2B2TdOxkkjfunmS21hP2XSTaMxjr_okPnMOw6kPILT9neR9vQFHfhnRuOtkP2ARO4IK6jqOVsVb_BtajOivpGkdXDEOXu4ekZZW3dFXTvhITruj6dmD3wN5b-oKKs6g-a5PhwsZ30_PnBMbtrx-x-fswOF7vbx_TOqy6vNWxfn97XDD3SF9AG86e80OrCgBnLc2pZk28-rHGqsmes5NxVU1MV58m4nTxWElJnpPdzx7H_iVX4C5ddo6Pdz8R4P-7-mpi2hy5WJNxHG99EV-jd2fx3NNHWb1tjypzBxT3oQU7fsUQKqrEN2DOVMMOIrX2JykLMXMkq0coMC-imSd1WrlG-5NeO7pcXdwWCn85WvENGL0o7GiyakGFIBe6-hkKvlagAvCe-aZYxXTddX6DC2fPzp0nhdlykfr20JJ-yxz9LKnxwdlp0LfYXmn1zk0Gjo8upfV4cZF07-wr7qun6-Kj4kXD5-V_fGycHcJPsCck69THopNfIQQf4_K_vmG2o0JNbbmsUwq0VV4pRMxP9Yw83tftXHN0t0a2pJFxXpSTcwofValPbX7HsHMugBIynwjYknxXQi-Kpyw_JfErA4ZtrLjjmN-N70PCAb8KNjocrB7pL6Cg1pNk1prdyvSkkeoS-B8acQ-lXTa6lN1c9mF3HIebi--Ojq_bwMzcPEufXZ4_gr0weY6Ibc0fQGaN-_7Qe_M5ZPfhNk3pXzu8e1buy_19n9e5Bv3FYD37_qB78vUHdkftEvT0D93O2ydq_PxVHV9rL3xmO__5MPGhBTXn42UQ8KjF_YQoOfj4B3yprH029wahMVnojUz3vDofxmx2jappXO8Wu2RtLutUjdjflbTtndeRda7X_sNE043M74Ff_y2Cp5Qm4_spybj_JCxU8RWFPLKf6xZBw7d7r-Ogj0b5TwH8cvlGu0dQOn2kfM_sv0u7S-yjdRBt6x-5nq3A2X89W0-nd-X6ZZYuMxnS9SBbJOl3PFvGCsoSupvE8nS3jO34fTsP5dBVOZ1E4i9aTZDPP5tNoTZeb1XwTUTKfspxyMRHikk-UPt1xY0p2v9msltM7QWMmjPsVVhhK9gpuk4QhWTzc6XvkCeLyZMh8KrixppXifpCFjJqdNDNu2lg8kMVuu9XJeTkni4dbv7ByLyb9j6XCTftjqfFPu-5KLe4_-D1W_bIQtSy0-jdLsAc4EwwJD97Ey334fwEAAP__i2PI1w">