[PATCH] D110235: [LoopVectorize] Support reductions that store intermediary result

Tue Nov 23 02:48:52 PST 2021

fhahn added inline comments.

================
Comment at: llvm/include/llvm/Analysis/IVDescriptors.h:265
+  // Intermediate store of the reduction
+  StoreInst *IntermediateStore = nullptr;
+
----------------
nit: should be a doc-comment?

================
Comment at: llvm/include/llvm/Analysis/LoopAccessAnalysis.h:583
+  /// Return the list of stores to invariant addresses.
+  const SmallVectorImpl<StoreInst *> &getStoresToInvariantAddresses() const {
+    return StoresToInvariantAddresses;
----------------
nit: could return `ArrayRef`, not leaking any details about the underlying container.

================
Comment at: llvm/include/llvm/Analysis/LoopAccessAnalysis.h:642
+  /// List of stores to uniform addresses.
+  SmallVector<StoreInst *, 5> InvariantStores;
+
----------------
igor.kirillov wrote:
> david-arm wrote:
> > fhahn wrote:
> > > Are the stores invariant or to a uniform address? `InvariantStores` implies they are invariant, which may not be the case?
> > I assume here that InvariantStores refers to stores to an invariant (i.e. uniform?) address, rather than storing an invariant value to variant address. If so, perhaps it could be named StoresToInvariantAddress or something like that?
> @fhah Just to make sure we are on the same line - for me uniform and invariant are just synonyms, isn't it so?
nit: Is there a reason for choosing 5 for the size? if not, nowadays SmallVector can pick a good size automatically.

================
Comment at: llvm/include/llvm/Analysis/ScalarEvolution.h:1120
+  /// Returns true if A and B have same pointer operands or same SCEVs addresses
+  bool storeToSameAddress(StoreInst *A, StoreInst *B);
+
----------------
Do you anticipate this to be used outside `Transforms/Vectorize`? If not I'm not sure if it makes sense to live here, and extending the API interface of `ScalarEvolution`. Can this instead be defined somewhere in `Transforms/Vectorize`?

================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:240
   Instruction *ExitInstruction = nullptr;
+  StoreInst *IntermediateStore = nullptr;
+
----------------
needs documentation?

================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:496
+  // If there is an intermediate store, it must store the last reduction value.
+  if (ExitInstruction != nullptr && IntermediateStore) {
+    if (IntermediateStore->getValueOperand() != ExitInstruction) {
----------------
nit: for consistency with other code avoid using `!= nullptr`. This is more in line with the style use in LLVM in general (and you use `==/!= nullptr` in the adjacent code here).

================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:301
+  //  - By store instructions with a loop invariant address (safe).
+  //      * If there are several stores, all must have the same address.
   //  - By an instruction that is not part of the reduction (not safe).
----------------
igor.kirillov wrote:
> fhahn wrote:
> > What about existing users of `isReductionPHI` which currently may rely on the fact that all instruction in the loop must either be phis or reduction operations? Also, with respect to the store restriction, the important bit is that the final value is also stored, right?
> I checked and only `LoopInterchangeLegality` is using this function and it is not affected by stores. Anyway, I can add a parameter to `RecurrenceDescriptor::isReductionPHI` or a member to `RecurrenceDescriptor` allowing or not to handle stores. What do you think about it?
> 
> The comment is to be updated, yes.
I guess it would be good to have at least a test case for loop-interchange to make sure it can deal with the change properly?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:906
+          // Earlier stores to this address are effectively deadcode.
+          llvm::erase_if(UnhandledStores, [SE, SI](StoreInst *I) {
+            return SE->storeToSameAddress(SI, I);
----------------
nit: no need to use `llvm::`

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:977
+    StoreInst *DSI = DS.IntermediateStore;
+    if (DSI && (DSI == SI)) {
+      *IsPredicated = blockNeedsPredication(DSI->getParent());
----------------
nit: redundant `()`.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:978
+    if (DSI && (DSI == SI)) {
+      *IsPredicated = blockNeedsPredication(DSI->getParent());
+      return true;
----------------
could we instead just do the check at the call site or is there a benefit of doing it here?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:900
+    // of one of our reductions and is unconditional.
+    for (StoreInst *SI : LAI->getInvariantStores()) {
+      if (isRecurringInvariantStore(SI, &IsPredicated)) {
----------------
igor.kirillov wrote:
> fhahn wrote:
> > igor.kirillov wrote:
> > > fhahn wrote:
> > > > What about loads to the same address in the loop? At the moment, `LAA` cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC?
> > > Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message.
> > The case I was thinking about was something like the snippet below, where we have a load of the invariant address in the loop (`%lv = load...` in the example below).
> > 
> > ```
> > define void @reduc_store(i32* %dst, i32* readonly %src, i32* noalias %dst.2) {
> > entry:
> >   %arrayidx = getelementptr inbounds i32, i32* %dst, i64 42
> >   store i32 0, i32* %arrayidx, align 4
> >   br label %for.body
> > for.body:
> >   %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
> >   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
> >   %arrayidx1 = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
> >   %1 = load i32, i32* %arrayidx1, align 4
> >   %add = add nsw i32 %0, %1
> >   %lv = load i32, i32* %arrayidx
> >   store i32 %add, i32* %arrayidx, align 4
> >   
> >   %gep.dst.2  = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv
> >   store i32 %lv, i32* %gep.dst.2,
> > 
> >   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
> >   %exitcond = icmp eq i64 %indvars.iv.next, 1000
> >   br i1 %exitcond, label %for.cond.cleanup, label %for.body
> > 
> > for.cond.cleanup:
> >   ret void
> > }
> > ```
> In that case we do not vectorize, the rejection happens in `LoopAccessInfo::analyzeLoop` when loads ape processed:
> 
> ```
>   for (LoadInst *LD : Loads) {
>     Value *Ptr = LD->getPointerOperand();
> ...
>     // See if there is an unsafe dependency between a load to a uniform address and
>     // store to the same uniform address.
>     if (UniformStores.count(Ptr)) {
>       LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform "
>                            "load and uniform store to the same address!\n");
>       HasDependenceInvolvingLoopInvariantAddress = true;
>     }
> ...
> ```
> 
> I added `reduc_store_load` test anyway
Yeah but this is only due to some limitations that currently existing in LAA, right? I think we should at least make this clear somewhere, e.g. in a comment.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3054
+  // reduction variable, because this will be one to a uniform address.
+  if (StoreInst *SI = dyn_cast<StoreInst>(Instr)) {
+    for (auto &Reduction : Legal->getReductionVars()) {
----------------
We effectively sink the store outside the loop. In that case, I don't think we should create a recipe and also we should not consider its cost.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110235/new/

https://reviews.llvm.org/D110235