[PATCH] D110235: [LoopVectorize] Support reductions that store intermediary result
Igor Kirillov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 18 07:22:33 PST 2021
igor.kirillov marked 8 inline comments as done.
igor.kirillov added inline comments.
================
Comment at: llvm/lib/Analysis/IVDescriptors.cpp:457
+ else {
+ if (auto *SI = dyn_cast<StoreInst>(UI)) {
+ // Reduction variable chain can only be stored somewhere but it
----------------
david-arm wrote:
> nit: Could you remove an extra level of indentation here before merging the patch?
>
> i.e. this can be written like this:
>
> if (isa<PHINode>(UI))
> PHIs.push_back(UI);
> else if (auto *SI = dyn_cast<StoreInst>(UI)) {
> ...
> } else
> NonPHIs.push_back(UI);
I made an extra step and simplified it even more
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:900
+ // of one of our reductions and is unconditional.
+ for (StoreInst *SI : LAI->getInvariantStores()) {
+ if (isRecurringInvariantStore(SI, &IsPredicated)) {
----------------
fhahn wrote:
> igor.kirillov wrote:
> > fhahn wrote:
> > > What about loads to the same address in the loop? At the moment, `LAA` cannot analyze dependences with invariant addresses. But if this limitation gets removed the code here may become incorrect, because it relies on this limitation IIUC?
> > Loads from or stores to the same address in the loop? I'm sorry could you clarify what the problem is. As it is I don't understand the message.
> The case I was thinking about was something like the snippet below, where we have a load of the invariant address in the loop (`%lv = load...` in the example below).
>
> ```
> define void @reduc_store(i32* %dst, i32* readonly %src, i32* noalias %dst.2) {
> entry:
> %arrayidx = getelementptr inbounds i32, i32* %dst, i64 42
> store i32 0, i32* %arrayidx, align 4
> br label %for.body
> for.body:
> %0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
> %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
> %arrayidx1 = getelementptr inbounds i32, i32* %src, i64 %indvars.iv
> %1 = load i32, i32* %arrayidx1, align 4
> %add = add nsw i32 %0, %1
> %lv = load i32, i32* %arrayidx
> store i32 %add, i32* %arrayidx, align 4
>
> %gep.dst.2 = getelementptr inbounds i32, i32* %dst.2, i64 %indvars.iv
> store i32 %lv, i32* %gep.dst.2,
>
> %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
> %exitcond = icmp eq i64 %indvars.iv.next, 1000
> br i1 %exitcond, label %for.cond.cleanup, label %for.body
>
> for.cond.cleanup:
> ret void
> }
> ```
In that case we do not vectorize, the rejection happens in `LoopAccessInfo::analyzeLoop` when loads ape processed:
```
for (LoadInst *LD : Loads) {
Value *Ptr = LD->getPointerOperand();
...
// See if there is an unsafe dependency between a load to a uniform address and
// store to the same uniform address.
if (UniformStores.count(Ptr)) {
LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform "
"load and uniform store to the same address!\n");
HasDependenceInvolvingLoopInvariantAddress = true;
}
...
```
I added `reduc_store_load` test anyway
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D110235/new/
https://reviews.llvm.org/D110235
More information about the llvm-commits
mailing list