[llvm] [X86] X86FixupVectorConstantsPass - use scheduler model to avoid regressions (PR #140028)

Wed May 28 01:13:07 PDT 2025

================
@@ -355,6 +356,36 @@ bool X86FixupVectorConstantsPass::processInstruction(MachineFunction &MF,
     std::function<Constant *(const Constant *, unsigned, unsigned, unsigned)>
         RebuildConstant;
   };
+
+  auto NewOpcPreferable = [&](const FixupEntry &Fixup,
+                              unsigned RegBitWidth) -> bool {
+    if (SM->hasInstrSchedModel()) {
+      unsigned NewOpc = Fixup.Op;
+      auto *OldDesc = SM->getSchedClassDesc(TII->get(Opc).getSchedClass());
+      auto *NewDesc = SM->getSchedClassDesc(TII->get(NewOpc).getSchedClass());
+      unsigned BitsSaved = RegBitWidth - (Fixup.NumCstElts * Fixup.MemBitWidth);
+
+      // Compare tput/lat - avoid any regressions, but allow extra cycle of
+      // latency in exchange for each 128-bit (or less) constant pool reduction
+      // (this is a very simple cost:benefit estimate - there will probably be
+      // better ways to calculate this).
+      double OldTput = MCSchedModel::getReciprocalThroughput(*ST, *OldDesc);
+      double NewTput = MCSchedModel::getReciprocalThroughput(*ST, *NewDesc);
+      if (OldTput != NewTput)
+        return NewTput < OldTput;
+
+      int LatTol = (BitsSaved + 127) / 128;
+      int OldLat = MCSchedModel::computeInstrLatency(*ST, *OldDesc);
+      int NewLat = MCSchedModel::computeInstrLatency(*ST, *NewDesc);
+      if (OldLat != NewLat)
+        return NewLat < (OldLat + LatTol);
----------------
RKSimon wrote:

Yes, a very basic one that I'm open to suggestions on how to improve - as it says in the comment, throughput must never regress (i.e. avoid extra Port5 usage etc.) but we permit an extra cycle of latency per 128-bits we save in constant pool size which is likely to translate to savings in LS unit stress (fewer cachelines, double pumped fetches etc.).

https://github.com/llvm/llvm-project/pull/140028