[PATCH] D110235: [LoopVectorize] Support reductions that store intermediary result

Thu Nov 18 03:06:19 PST 2021

igor.kirillov marked 2 inline comments as done and an inline comment as not done.
igor.kirillov added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:902
+      if (isRecurringInvariantStore(SI, &IsPredicated)) {
+        if (IsPredicated)
+          UnhandledStores.push_back(SI);
----------------
david-arm wrote:
> Hi @igor.kirillov, I think you're missing a test case here. Suppose we have two stores in the loop to the same invariant address. If the first store is predicated, but the second isn't we should still vectorise because the second store wins and the first one can be removed. At least the code here suggests that. I couldn't find any test that exercised this code path.
Added `reduc_store_final_store_predicated` test that address both requests

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:910
+        }
+      } else if (!isUniform(SI->getValueOperand()))
+        UnhandledStores.push_back(SI);
----------------
david-arm wrote:
> I think you might be missing a test case for this. Can you make sure this code path is exercised please by your existing tests please?
It is now covered by new `reduc_store_final_store_predicated` test and an old test `reduc_store_inside_unrolled` also executes this path

================
Comment at: llvm/test/Transforms/LoopVectorize/reduction.ll:517
+  %1 = load float, float* %arrayidx1, align 4
+  %add = fadd float %0, %1
+  store float %add, float* %arrayidx, align 4
----------------
david-arm wrote:
> Hi @igor.kirillov, this doesn't look right. The test is called `@reduc_store_fadd_fast`, but there is not `fast` keyword on the `fadd` instruction. I don't think we should even be vectorising this because it requires ordered reductions, which you haven't enabled for this test. Can you change the name of this to `@reduc_store_fadd_ordered` and investigate why we are vectorising this?
> 
> Can you also add a separate test called `@reduc_store_fadd_fast` that actually has the `fast` keyword too?
I missed the `fast` keyword there. As for why this code gets vectorized - it happens because of `Hints->allowReordering()` returning true in `LoopVectorizationLegality::canVectorizeFPMath`. As you can see the test specifies vector width (-force-vector-width=4) and llvm allows to process ordered instructions in unordered manner (see also `LoopVectorizeHints::allowReordering` function) in that case.

I added a new test with `fast` keyword to `AArch64/strict-fadd.ll` and the loop is not vectorized there.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110235/new/

https://reviews.llvm.org/D110235