[llvm] [SLP] Initial vectorization of non-power-of-2 ops. (PR #77790)

Tue Feb 20 08:42:07 PST 2024

================
@@ -13644,10 +13692,22 @@ bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
                           << "MinVF (" << MinVF << ")\n");
       }
 
-      // FIXME: Is division-by-2 the correct step? Should we assert that the
-      // register size is a power-of-2?
-      unsigned StartIdx = 0;
+      SmallVector<unsigned> CandidateVFs;
+      if (VectorizeNonPowerOf2) {
+        // First try vectorizing with a non-power-of-2 VF. At the moment, only
+        // consider cases where VF + 1 is a power-of-2, i.e. almost all vector
+        // lanes are used.
+        unsigned CandVF = Operands.size();
+        if (isPowerOf2_32(CandVF + 1) && CandVF <= MaxVF)
+          CandidateVFs.push_back(CandVF);
+      }
       for (unsigned Size = MaxVF; Size >= MinVF; Size /= 2) {
+        // FIXME: Is division-by-2 the correct step? Should we assert that the
+        // register size is a power-of-2?
+        CandidateVFs.push_back(Size);
+      }
+      unsigned StartIdx = 0;
+      for (unsigned Size : CandidateVFs) {
----------------
alexey-bataev wrote:

Better to implement the main part of this as a separate NFC

https://github.com/llvm/llvm-project/pull/77790