[llvm] [VPlan] Unroll VPReplicateRecipe by VF. (PR #142433)

Thu Jun 19 03:35:37 PDT 2025

================
@@ -2626,43 +2664,29 @@ static void scalarizeInstruction(const Instruction *Instr,
 
 void VPReplicateRecipe::execute(VPTransformState &State) {
   Instruction *UI = getUnderlyingInstr();
-  if (State.Lane) { // Generate a single instance.
-    assert((State.VF.isScalar() || !isSingleScalar()) &&
-           "uniform recipe shouldn't be predicated");
-    assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
-    scalarizeInstruction(UI, this, *State.Lane, State);
-    // Insert scalar instance packing it into a vector.
-    if (State.VF.isVector() && shouldPack()) {
-      // If we're constructing lane 0, initialize to start from poison.
-      if (State.Lane->isFirstLane()) {
-        assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
-        Value *Poison =
-            PoisonValue::get(VectorType::get(UI->getType(), State.VF));
-        State.set(this, Poison);
-      }
-      State.packScalarIntoVectorizedValue(this, *State.Lane);
-    }
-    return;
-  }
 
-  if (IsSingleScalar) {
-    // Uniform within VL means we need to generate lane 0.
+  if (!State.Lane) {
+    assert(IsSingleScalar &&
+           "VPReplicateRecipes outside replicate regions must be unrolled");
     scalarizeInstruction(UI, this, VPLane(0), State);
     return;
   }
 
-  // A store of a loop varying value to a uniform address only needs the last
-  // copy of the store.
-  if (isa<StoreInst>(UI) && vputils::isSingleScalar(getOperand(1))) {
-    auto Lane = VPLane::getLastLaneForVF(State.VF);
-    scalarizeInstruction(UI, this, VPLane(Lane), State);
-    return;
+  assert((State.VF.isScalar() || !isSingleScalar()) &&
+         "uniform recipe shouldn't be predicated");
+  assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
+  scalarizeInstruction(UI, this, *State.Lane, State);
+  // Insert scalar instance packing it into a vector.
+  if (State.VF.isVector() && shouldPack()) {
+    // If we're constructing lane 0, initialize to start from poison.
+    if (State.Lane->isFirstLane()) {
+      assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
+      Value *Poison =
+          PoisonValue::get(VectorType::get(UI->getType(), State.VF));
+      State.set(this, Poison);
+    }
+    State.packScalarIntoVectorizedValue(this, *State.Lane);
   }
----------------
ayalz wrote:

(Joint proposal with @aniragil)
Replicate recipes model several aspects: (a) multiple Instruction replicas will be generated, (b) all operands will be used as scalars, and (c) the "single def" result will be provided as scalars. These affect both cost and code-gen. Taking care of (a) as a VPlan-to-VPlan transform, i.e., replicating-by-VF prior to code-gen, helps simplify the latter, including VPTranslateState. In order to take care of (b) and (c) prior to code-gen, and prior to replicating-by-VF (i.e., before committing to a single VF), could we introduce a "pack" recipe that converts a single VPValue operand which represents multiple scalars into a single def VPValue which holds them in a vector, and a converse "unpack" recipe? The former will be converted to BuildVector (or BuildStructVector) recipe by replicateByVF, and the latter to a set of VF extracts? This would help clarify which VPValues represent vectors and which hold their VF elements as scalars, thereby potentially improving cost, simplifying replicateByVF, and possibly code-gen (as done here). WDYT?

https://github.com/llvm/llvm-project/pull/142433