[PATCH] D155520: [LV] Complete load groups and release store groups in presence of dependency

Fri Jul 21 10:09:28 PDT 2023

anna marked an inline comment as done.
anna added inline comments.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:1167
         // Skip B if no new instructions can be added to its load group.
         continue;
       }
----------------
Ayal wrote:
> Ahh, when B belongs to a completed load group, no A can be added to it, but **early-continuing here** also prevents releasing the store group of any conflicting A. As outlined above: "Even if we don't create a group for B, we continue with the bottom-up algorithm to ensure we don't break any of B's dependences" - continuing the bottom-up algorithm even if the group of B is complete would fix avoid_sinking_store_across_load: considering %l2 as B would release the store group. But other cases may evade this "avoid early-continue here" fix - when the conflict to release a store group is encountered before the store group is formed; here's an example, worth adding to the documentation:
> 
> ```
>       // For example, assume we have the sequence of accesses shown below in a
>       // stride-3 loop:
>       //
>       // (1, 3) is a group | A[i] = a;   // (1)
>       //                     b = A[i+1]; // (2) |
>       //                   | A[i+2] = c; // (3)
>       //                     d = A[i];   // (4) | (2, 4) is a group
>       // but cannot have both (1,3) and (2,4) groups!
>       // Because the former sinks (1) to (3), the latter hoists (4) to (2),
>       // and there's a dependence between (1) and (4).
>       // Whenever B is a member of a load group, consider it along with
>       // all its members, because they will all be hoisted to B, or earlier.
> ```
> 
> Note: exact full overlap of load-after-store and store-after-store dependencies, as shown in these examples, are best optimized by eliminating the redundant load or first store, respectively. Partial overlap dependencies may be more involved to resolve, by converting to fixed-order-recurrences and/or hoisting a load above a conditional store.
> 
> Note: this algorithm of building interleave groups is worth revisiting, possibly by actually moving group members using VPlan, to both simplify and potentially optimize its decisions, e.g.: should a load group be built or a store group, if either is possible but not both? Keeping a conflicting member out of a group (thereby allowing several others to join) may be better than Completing it.
> But other cases may evade this "avoid early-continue here" fix - when the conflict to release a store group is encountered before the store group is formed;
Yes, this is the exact case I faced for the last miscompile (which I haven't fixed yet). The store group wasn't formed yet and your added example for the doc is exactly what happens. 
Note that when we switch around (3) and (4) in the example, we will not have a miscompile since we formed an interleave store group first (1,3). And then we come to load with B  as `d = A[i]` and identify that the store group (1,3) must be released (since we will traverse backwards and reach A as ` A[i] = a` which is a dependency with B and is part of a store group).
 My idea to fix that was to record such stores and state they are the "last insertion point" (i.e. other stores can be sunk to the store, but we cannot sink store further down).

Also, thanks for catching  the "early continue" here. 

>  here's an example, worth adding to the documentation:
I will add this documentation when I precommit the miscompile test, if that's okay with you (since it would show the IR example as well)? We will need a separate fix for it as well.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:1194
       // the boundaries of the (2, 4) group.
-      if (!canReorderMemAccessesForInterleavedGroups(&*AI, &*BI)) {
+      //
+      //
----------------
Ayal wrote:
> anna wrote:
> > Ayal wrote:
> > > nit: one empty line suffices?
> > > 
> > > Would it be clearer to do something like:
> > > 
> > > ```
> > >       auto GroupA = getInterleaveGroup(A);
> > >       if (GroupA && StoreGroups.contains(GroupA) &&
> > >           !canReorderMemAccessesForInterleavedGroups(&*AI, &*BI)) {
> > >           LLVM_DEBUG(dbgs() << "LV: Invalidated store group due to "
> > >                                "dependence between " << *A << " and "<< *B << '\n');
> > >         StoreGroups.remove(GroupA);
> > >         releaseGroup(GroupA);
> > >       }
> > >       if (A->mayWriteToMemory() && GroupB && LoadGroups.contains(GroupB)) {
> > >         bool CompleteGroupB = false;
> > >         for (uint32_t Index = 0; Index < GroupB->getFactor(); ++Index) {
> > >           Instruction *MemberOfGroupB = GroupB->getMember(Index);
> > >           if (MemberOfGroupB &&
> > >               !canReorderMemAccessesForInterleavedGroups(
> > >                   &*AI, &*AccessStrideInfo.find(MemberOfGroupB))) {
> > >             CompleteGroupB = true;
> > >             break;
> > >           }
> > >         }
> > >         if (CompleteGroupB) {
> > >           LLVM_DEBUG(dbgs() << "LV: Marking interleave group for " << *B
> > >                             << " as complete.\n");
> > >             CompletedLoadGroups.insert(GroupB);
> > >           break;
> > >         }
> > >       }
> > > 
> > > ```
> > > Would be nicer with an `if (std::any_of(...)) { ...; break; }`, with support from InterleaveGroup to Iterate over its members.
> > > 
> > > A test that requires both releasing GroupA and completing GroupB would help emphasize that the former must precede the latter, due to its `break`.
> > Actually, this wouldn't be enough AFAICT (it fixes the first test case added as a follow-on to the original review: pr63602_2). The test I added in `interleaved-accesses-sink-store-across-load.ll` won't be fixed with this. The reason is we need to "release store group" if there is a dependency between AI and *any load in the (load) GroupB*. 
> > So, this patch does two things:
> > 1. retains the order that store group is released first and LoadGroup is marked completed second (we just needed to do both, the order doesn't matter since I left the break where it is: line 1248).
> > 2. Make sure that we check all loads in GroupB for a dependency against `AI`, for both `GroupA` release and `GroupB` completion.
> > 
> > I've added more details in that test case comment below to show why #2 is needed. 
> > 
> > > Would be nicer with an if (std::any_of(...)) { ...; break; }, with support from InterleaveGroup to Iterate over its members.
> > I had that locally, but when recording a `dependentInst` didn't find a good need for it. 
> > 
> OK, very well. Perhaps the following would be clearer, provided it is correct:
> 
> 
> ```
>   auto DependentMember = [&](InterleaveGroup<Instruction> *Group,
>                              StrideEntry *A) -> Instruction* {
>     for (uint32_t Index = 0; Index < Group->getFactor(); ++Index) {
>       Instruction *MemberOfGroupB = Group->getMember(Index);
>       if (MemberOfGroupB &&
>           !canReorderMemAccessesForInterleavedGroups(
>               A, &*AccessStrideInfo.find(MemberOfGroupB)))
>         return MemberOfGroupB;
>     }
>     return nullptr;
>   };
> 
> if (A->mayWriteToMemory()) { // Otherwise dependencies are tolerable.
>   Instruction *DependentInst = nullptr;
>   if (GroupB && LoadGroups.contains(GroupB)) // Check all GroupB members.
>     DependentInst = DependentMember(GroupB, &*AI);
>   else if (!canReorderMemAccessesForInterleavedGroups(&*AI, &*BI))
>     DependentInst = B;
>   if (DependentInst) {
>     auto GroupA = getInterleaveGroup(A);
>     if (GroupA && StoreGroups.contains(GroupA)) {
>       LLVM_DEBUG(dbgs() << "LV: Invalidated store group due to "
>                            "dependence between " << *A << " and "<< *DependentInst << '\n');
>       StoreGroups.remove(GroupA);
>       releaseGroup(GroupA);
>     }
>     if (GroupB && LoadGroups.contains(GroupB)) {
>       LLVM_DEBUG(dbgs() << "LV: Marking interleave group for " << *B
>                         << " as complete.\n");
>       CompletedLoadGroups.insert(GroupB);
>     }
>   }
> }
> if (CompletedLoadGroups.contains(GroupB)) {
>   // Skip trying to add A to B, continue to look for other conflicting A's in groups to be released.
>   continue;
> }
> ```
yes, this works and keeps the test cases results the same as we have currently.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155520/new/

https://reviews.llvm.org/D155520