[PATCH] D110235: [LoopVectorize] Support reductions that store intermediary result

Thu Nov 18 07:22:33 PST 2021

igor.kirillov marked 8 inline comments as done.
igor.kirillov added inline comments.

================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:457
+        else {
+          if (auto *SI = dyn_cast<StoreInst>(UI)) {
+            // Reduction variable chain can only be stored somewhere but it
----------------
david-arm wrote:
> nit: Could you remove an extra level of indentation here before merging the patch?
> 
> i.e. this can be written like this:
> 
>   if (isa<PHINode>(UI))
>     PHIs.push_back(UI);
>   else if (auto *SI = dyn_cast<StoreInst>(UI)) {
>     ...
>   } else
>     NonPHIs.push_back(UI);
I made an extra step and simplified it even more

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:900
+    // of one of our reductions and is unconditional.
+    for (StoreInst *SI : LAI->getInvariantStores()) {
+      if (isRecurringInvariantStore(SI, &IsPredicated)) {
----------------
fhahn wrote:
> igor.kirillov wrote:
> > fhahn wrote:
> > > What about loads to the same address in the loop? At the moment, `LAA` cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC?
> > Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message.
> The case I was thinking about was something like the snippet below, where we have a load of the invariant address in the loop (`%lv = load...` in the example below).
> 
> ```
> define void @reduc_store(i32* %dst, i32* readonly %src, i32* noalias %dst.2) {
> entry:
>   %arrayidx = getelementptr inbounds i32, i32* %dst, i64 42
>   store i32 0, i32* %arrayidx, align 4
>   br label %for.body
> for.body:
>   %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
>   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
>   %arrayidx1 = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
>   %1 = load i32, i32* %arrayidx1, align 4
>   %add = add nsw i32 %0, %1
>   %lv = load i32, i32* %arrayidx
>   store i32 %add, i32* %arrayidx, align 4
>   
>   %gep.dst.2  = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv
>   store i32 %lv, i32* %gep.dst.2,
> 
>   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>   %exitcond = icmp eq i64 %indvars.iv.next, 1000
>   br i1 %exitcond, label %for.cond.cleanup, label %for.body
> 
> for.cond.cleanup:
>   ret void
> }
> ```
In that case we do not vectorize, the rejection happens in `LoopAccessInfo::analyzeLoop` when loads ape processed:

```
  for (LoadInst *LD : Loads) {
    Value *Ptr = LD->getPointerOperand();
...
    // See if there is an unsafe dependency between a load to a uniform address and
    // store to the same uniform address.
    if (UniformStores.count(Ptr)) {
      LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform "
                           "load and uniform store to the same address!\n");
      HasDependenceInvolvingLoopInvariantAddress = true;
    }
...
```

I added `reduc_store_load` test anyway

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110235/new/

https://reviews.llvm.org/D110235