[PATCH] D155520: [LV] Complete load groups and release store groups in presence of dependency

Thu Jul 20 12:36:41 PDT 2023

Ayal added inline comments.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:1165-1167
       } else if (CompletedLoadGroups.contains(GroupB)) {
         // Skip B if no new instructions can be added to its load group.
         continue;
----------------
Avoid too early continue here, scan A's in search of potential GroupA's to release.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:1167
         // Skip B if no new instructions can be added to its load group.
         continue;
       }
----------------
Ahh, when B belongs to a completed load group, no A can be added to it, but **early-continuing here** also prevents releasing the store group of any conflicting A. As outlined above: "Even if we don't create a group for B, we continue with the bottom-up algorithm to ensure we don't break any of B's dependences" - continuing the bottom-up algorithm even if the group of B is complete would fix avoid_sinking_store_across_load: considering %l2 as B would release the store group. But other cases may evade this "avoid early-continue here" fix - when the conflict to release a store group is encountered before the store group is formed; here's an example, worth adding to the documentation:

```
      // For example, assume we have the sequence of accesses shown below in a
      // stride-3 loop:
      //
      // (1, 3) is a group | A[i] = a;   // (1)
      //                     b = A[i+1]; // (2) |
      //                   | A[i+2] = c; // (3)
      //                     d = A[i];   // (4) | (2, 4) is a group
      // but cannot have both (1,3) and (2,4) groups!
      // Because the former sinks (1) to (3), the latter hoists (4) to (2),
      // and there's a dependence between (1) and (4).
      // Whenever B is a member of a load group, consider it along with
      // all its members, because they will all be hoisted to B, or earlier.
```

Note: exact full overlap of load-after-store and store-after-store dependencies, as shown in these examples, are best optimized by eliminating the redundant load or first store, respectively. Partial overlap dependencies may be more involved to resolve, by converting to fixed-order-recurrences and/or hoisting a load above a conditional store.

Note: this algorithm of building interleave groups is worth revisiting, possibly by actually moving group members using VPlan, to both simplify and potentially optimize its decisions, e.g.: should a load group be built or a store group, if either is possible but not both? Keeping a conflicting member out of a group (thereby allowing several others to join) may be better than Completing it.

================
Comment at: llvm/lib/Analysis/VectorUtils.cpp:1194
       // the boundaries of the (2, 4) group.
-      if (!canReorderMemAccessesForInterleavedGroups(&*AI, &*BI)) {
+      //
+      //
----------------
anna wrote:
> Ayal wrote:
> > nit: one empty line suffices?
> > 
> > Would it be clearer to do something like:
> > 
> > ```
> >       auto GroupA = getInterleaveGroup(A);
> >       if (GroupA && StoreGroups.contains(GroupA) &&
> >           !canReorderMemAccessesForInterleavedGroups(&*AI, &*BI)) {
> >           LLVM_DEBUG(dbgs() << "LV: Invalidated store group due to "
> >                                "dependence between " << *A << " and "<< *B << '\n');
> >         StoreGroups.remove(GroupA);
> >         releaseGroup(GroupA);
> >       }
> >       if (A->mayWriteToMemory() && GroupB && LoadGroups.contains(GroupB)) {
> >         bool CompleteGroupB = false;
> >         for (uint32_t Index = 0; Index < GroupB->getFactor(); ++Index) {
> >           Instruction *MemberOfGroupB = GroupB->getMember(Index);
> >           if (MemberOfGroupB &&
> >               !canReorderMemAccessesForInterleavedGroups(
> >                   &*AI, &*AccessStrideInfo.find(MemberOfGroupB))) {
> >             CompleteGroupB = true;
> >             break;
> >           }
> >         }
> >         if (CompleteGroupB) {
> >           LLVM_DEBUG(dbgs() << "LV: Marking interleave group for " << *B
> >                             << " as complete.\n");
> >             CompletedLoadGroups.insert(GroupB);
> >           break;
> >         }
> >       }
> > 
> > ```
> > Would be nicer with an `if (std::any_of(...)) { ...; break; }`, with support from InterleaveGroup to Iterate over its members.
> > 
> > A test that requires both releasing GroupA and completing GroupB would help emphasize that the former must precede the latter, due to its `break`.
> Actually, this wouldn't be enough AFAICT (it fixes the first test case added as a follow-on to the original review: pr63602_2). The test I added in `interleaved-accesses-sink-store-across-load.ll` won't be fixed with this. The reason is we need to "release store group" if there is a dependency between AI and *any load in the (load) GroupB*. 
> So, this patch does two things:
> 1. retains the order that store group is released first and LoadGroup is marked completed second (we just needed to do both, the order doesn't matter since I left the break where it is: line 1248).
> 2. Make sure that we check all loads in GroupB for a dependency against `AI`, for both `GroupA` release and `GroupB` completion.
> 
> I've added more details in that test case comment below to show why #2 is needed. 
> 
> > Would be nicer with an if (std::any_of(...)) { ...; break; }, with support from InterleaveGroup to Iterate over its members.
> I had that locally, but when recording a `dependentInst` didn't find a good need for it. 
> 
OK, very well. Perhaps the following would be clearer, provided it is correct:

```
  auto DependentMember = [&](InterleaveGroup<Instruction> *Group,
                             StrideEntry *A) -> Instruction* {
    for (uint32_t Index = 0; Index < Group->getFactor(); ++Index) {
      Instruction *MemberOfGroupB = Group->getMember(Index);
      if (MemberOfGroupB &&
          !canReorderMemAccessesForInterleavedGroups(
              A, &*AccessStrideInfo.find(MemberOfGroupB)))
        return MemberOfGroupB;
    }
    return nullptr;
  };

if (A->mayWriteToMemory()) { // Otherwise dependencies are tolerable.
  Instruction *DependentInst = nullptr;
  if (GroupB && LoadGroups.contains(GroupB)) // Check all GroupB members.
    DependentInst = DependentMember(GroupB, &*AI);
  else if (!canReorderMemAccessesForInterleavedGroups(&*AI, &*BI))
    DependentInst = B;
  if (DependentInst) {
    auto GroupA = getInterleaveGroup(A);
    if (GroupA && StoreGroups.contains(GroupA)) {
      LLVM_DEBUG(dbgs() << "LV: Invalidated store group due to "
                           "dependence between " << *A << " and "<< *DependentInst << '\n');
      StoreGroups.remove(GroupA);
      releaseGroup(GroupA);
    }
    if (GroupB && LoadGroups.contains(GroupB)) {
      LLVM_DEBUG(dbgs() << "LV: Marking interleave group for " << *B
                        << " as complete.\n");
      CompletedLoadGroups.insert(GroupB);
    }
  }
}
if (CompletedLoadGroups.contains(GroupB)) {
  // Skip trying to add A to B, continue to look for other conflicting A's in groups to be released.
  continue;
}
```

================
Comment at: llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-sink-store-across-load.ll:8
 ; that store into the last store (by creating an interleaved store group). This
-; means the loaded %l2 has incorrect value.
-; We do not release this store group correctly because the next interleave group
-; chosen compares only the memory access of last load in program (%l3) against the dependent store location
-; (%gep.iv.1.plus.2) and they are different, thereby incorrectly assuming no
-; dependency. We need to compare against all loads in that interleaved group
-; (%l2 is part of it).
+; means the loaded %l2 will have incorrect value.
 define void @avoid_sinking_store_across_load(ptr %arr) {
----------------
anna wrote:
> anna wrote:
> > Ayal wrote:
> > > This additional/distinct testcase of preventing the sinking of a store is fixed by the same patch that compares all member of a load-groupB with a storeA, right?
> > Unfortunately, no. That only marks `GroupB` for completion, thereby preventing hoisting of a load to an earlier load group. 
> > 
> > To show what happens in this test case: we reverse traverse the memory access. So, the first interleave group created is:
> > ```
> > store i32 %add, ptr %gep.iv.2 <-- first access
> > store i32 %mul, ptr %gep.iv.1.plus.2 <-- added into the (store) groupB. 
> > ```
> > Let's call this StoreGroup1.
> > 
> > We next come to B: `%l3 = load i32, ptr %gep.iv.2`
> > Interleave group created with that inst.
> > We create next interleave group:
> > ```
> > %l3 = load i32, ptr %gep.iv.2
> > %l2 = load i32, ptr %gep.iv.1.plus.2
> > ```
> > Call this `LoadGroup1`.
> > 
> > Wit that same `B`, we continue `A` accesses with A as `store i32 %mul, ptr %gep.iv.1.plus.2`.
> > 
> > With the patch I have we will do two things:
> > 1. release `StoreGroup1` because we see the store has a dependency with `%l2`
> > 2. Mark `LoadGroup1` as complete (and break from this inner loop traversing `AI`).
> > 
> > If we were to *only* compare `BI` when deciding for store release:
> > ```
> > if (GroupA && StoreGroups.contains(GroupA) &&
> >     !canReorderMemAccessesForInterleavedGroups(&*AI, &*BI))
> > ```
> > then, we will not release StoreGroup1 since BI (%l3) has no dependency. 
> > 
> > Which means even though the load group is correctly completed (and we will not hoist %l2 to location of %l1), we will sink the store (store i32 %mul, ptr %gep.iv.1.plus.2) below the dependent load (%l2). 
> > 
> @Ayal For completeness, here's the diff showing why the fix suggested in the comments won't be enough: https://reviews.llvm.org/differential/diff/542078/. The test in `interleaved-accesses-sink-store-across-load.ll` shows that we still sunk the store incorrectly (see the masked.store at the end). 
Ah, that's what I was curious about! Thanks for the detailed and enlightening description! Led to the revised proposal above.

nit: suggest to rename `%iv.2` as `%iv.1.plus.3`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155520/new/

https://reviews.llvm.org/D155520