[llvm] [AArch64][Machine-Combiner] Split loads into lanes of neon vectors into multiple vectors when possible (PR #142941)

Mon Jun 16 01:21:38 PDT 2025

================
@@ -7317,6 +7319,57 @@ static bool getMiscPatterns(MachineInstr &Root,
   return false;
 }
 
+/// Search for patterns where we use LD1i32 instructions to load into
+/// 4 separate lanes of a 128 bit Neon register. We can increase ILP
+/// by loading into 2 Neon registers instead.
+static bool getLoadPatterns(MachineInstr &Root,
+                            SmallVectorImpl<unsigned> &Patterns) {
+  const MachineRegisterInfo &MRI = Root.getMF()->getRegInfo();
+  const TargetRegisterInfo *TRI =
+      Root.getMF()->getSubtarget().getRegisterInfo();
+  // Enable this only on Darwin targets, where it should be profitable. Other
+  // targets can remove this check if it is profitable there as well.
+  if (!Root.getMF()->getTarget().getTargetTriple().isOSDarwin())
----------------
davemgreen wrote:

CPU tuning subtarget features are the preferred way to tune for different systems. MachineCombiner can use scheduling depths to calculate when it should be profitable, so you might be able to enable it more generally. It feels like it might be OK as a DAG combine to be honest, although it would add more instructions IIUC.

https://github.com/llvm/llvm-project/pull/142941