[llvm] [LV]: Teach LV to recursively (de)interleave. (PR #89018)
Paul Walker via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 17 08:22:15 PDT 2024
================
@@ -2126,10 +2127,50 @@ static Value *interleaveVectors(IRBuilderBase &Builder, ArrayRef<Value *> Vals,
// Scalable vectors cannot use arbitrary shufflevectors (only splats), so
// must use intrinsics to interleave.
if (VecTy->isScalableTy()) {
- VectorType *WideVecTy = VectorType::getDoubleElementsVectorType(VecTy);
- return Builder.CreateIntrinsic(WideVecTy, Intrinsic::vector_interleave2,
- Vals,
- /*FMFSource=*/nullptr, Name);
+ if (Vals.size() == 2) {
+ VectorType *WideVecTy = VectorType::getDoubleElementsVectorType(VecTy);
+ return Builder.CreateIntrinsic(WideVecTy, Intrinsic::vector_interleave2,
+ Vals,
+ /*FMFSource=*/nullptr, Name);
+ }
+ unsigned InterleaveFactor = Vals.size();
+ SmallVector<Value *> InterleavingValues(Vals);
+ // The total number of nodes in a balanced binary tree is calculated as 2n -
+ // 1, where `n` is the number of leaf nodes (`InterleaveFactor`). In this
+ // context, we exclude the root node because it will serve as the final
+ // interleaved value. Thus, the number of nodes to be processed/interleaved
+ // is: (2n - 1) - 1 = 2n - 2.
+
+ unsigned NumInterleavingValues = 2 * InterleaveFactor - 2;
+ for (unsigned I = 1; I < NumInterleavingValues; I += 2) {
+ // values that haven't been processed yet:
+ unsigned Remaining = InterleavingValues.size() - I + 1;
+ if (Remaining > 2 && isPowerOf2_32(Remaining)) {
+
+ // The remaining values form a new level in the interleaving tree.
+ // Arrange these values in the correct interleaving order for this
+ // level. The interleaving order places alternating elements from the
----------------
paulwalker-arm wrote:
Whilst I agree the solution requires nested loops I would have expected the creation of `vector_interleave2` calls to be inside the deepest loop. By not doing this I think you've forced yourself into over complicating the induction variable calculations. I might be oversimplifying it but I think something like the following should work?
```
if (VecTy->isScalableTy()) {
assert(Factor_is_a_power_of_2)
if (Factor == 2)
return builder.create_vector_interleave2(Vals, Name)
vector<Value *> R(Vals);
for (unsigned Midpoint = Factor / 2; Factor > 0; Factor /= 2)
for (unsigned i = 0; i < Factor; ++i)
R[i] = builder.create_vector_interleave2(R[i], R[i + Factor], Name)
}
return R[0]
}
```
If this turns out to work then I suppose you might not even need the Factor=2 bailout given my original request was because the general case looked to be complex enough to warrant a faster return for the most common case.
https://github.com/llvm/llvm-project/pull/89018
More information about the llvm-commits
mailing list