[llvm] [LV]: Teach LV to recursively (de)interleave. (PR #89018)

Paul Walker via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 17 08:22:15 PDT 2024


================
@@ -2126,10 +2127,50 @@ static Value *interleaveVectors(IRBuilderBase &Builder, ArrayRef<Value *> Vals,
   // Scalable vectors cannot use arbitrary shufflevectors (only splats), so
   // must use intrinsics to interleave.
   if (VecTy->isScalableTy()) {
-    VectorType *WideVecTy = VectorType::getDoubleElementsVectorType(VecTy);
-    return Builder.CreateIntrinsic(WideVecTy, Intrinsic::vector_interleave2,
-                                   Vals,
-                                   /*FMFSource=*/nullptr, Name);
+    if (Vals.size() == 2) {
+      VectorType *WideVecTy = VectorType::getDoubleElementsVectorType(VecTy);
+      return Builder.CreateIntrinsic(WideVecTy, Intrinsic::vector_interleave2,
+                                     Vals,
+                                     /*FMFSource=*/nullptr, Name);
+    }
+    unsigned InterleaveFactor = Vals.size();
+    SmallVector<Value *> InterleavingValues(Vals);
+    // The total number of nodes in a balanced binary tree is calculated as 2n -
+    // 1, where `n` is the number of leaf nodes (`InterleaveFactor`). In this
+    // context, we exclude the root node because it will serve as the final
+    // interleaved value. Thus, the number of nodes to be processed/interleaved
+    // is: (2n - 1) - 1 = 2n - 2.
+
+    unsigned NumInterleavingValues = 2 * InterleaveFactor - 2;
+    for (unsigned I = 1; I < NumInterleavingValues; I += 2) {
+      // values that haven't been processed yet:
+      unsigned Remaining = InterleavingValues.size() - I + 1;
+      if (Remaining > 2 && isPowerOf2_32(Remaining)) {
+
+        // The remaining values form a new level in the interleaving tree.
+        // Arrange these values in the correct interleaving order for this
+        // level. The interleaving order places alternating elements from the
----------------
paulwalker-arm wrote:

Whilst I agree the solution requires nested loops I would have expected the creation of `vector_interleave2` calls to be inside the deepest loop. By not doing this I think you've forced yourself into over complicating the induction variable calculations.  I might be oversimplifying it but I think something like the following should work?

```
if (VecTy->isScalableTy()) {
  assert(Factor_is_a_power_of_2)
  if (Factor == 2)
    return builder.create_vector_interleave2(Vals, Name)

  vector<Value *> R(Vals);
  for (unsigned Midpoint = Factor / 2; Factor > 0; Factor /= 2)
    for (unsigned i = 0; i < Factor; ++i)
      R[i] = builder.create_vector_interleave2(R[i], R[i + Factor], Name)
  }

  return R[0]
}
```

If this turns out to work then I suppose you might not even need the Factor=2 bailout given my original request was because the general case looked to be complex enough to warrant a faster return for the most common case.

https://github.com/llvm/llvm-project/pull/89018


More information about the llvm-commits mailing list