[llvm] [ARM]Adjust cost of muls in SMLAL patterns (PR #122713)

Mon Mar 17 07:12:18 PDT 2025

================
@@ -1458,16 +1458,62 @@ InstructionCost ARMTTIImpl::getArithmeticInstrCost(
   if (LooksLikeAFreeShift())
     return 0;
 
+  // When targets have both DSP and MVE we find that the
+  // the compiler will attempt to vectorize as well as using
+  // scalar SMLAL operations. This is in cases where we have
+  // the pattern ext(mul(ext(i16), ext(i16))) we find
+  // that generated codegen performs better when only using SMLAL scalar
+  // ops instead of trying to mix vector ops with SMLAL ops. We therefore
+  // check if a mul instruction is used in a SMLAL pattern.
+  auto MulInSMLALPattern = [&](const Instruction *I, unsigned Opcode,
+                               Type *Ty) -> bool {
+    if (!ST->hasDSP() || !ST->hasMVEIntegerOps())
+      return false;
+    if (!I)
+      return false;
+
+    if (Opcode != Instruction::Mul)
+      return false;
+
+    if (Ty->isVectorTy())
+      return false;
+
+    auto IsSExtInst = [](const Value *V) -> bool {
+      return (dyn_cast<SExtInst>(V)) ? true : false;
----------------
nasherm wrote:

So by adding ZExt we do avoid mixing vector and scalar instructions. The generated code doesn't take advantage of UMULL instructions however and just sticks with simple scalar ops. This was the behavior before the changes to the SLPVectorizer were made.

I do think that once this is merged there is further improvement to be done on SMLAL code gen which I intend to investigate i.e  further code folding using DSP instructions. I could add UMULL codegen into my investigation as well. 

Regardless adding ZExt support will just return previous behavior so I don't believe it to be a concern.

https://github.com/llvm/llvm-project/pull/122713