[PATCH] D76132: [LoopUnrollAndJam] Changed safety checks to consider more than 2-levels loop nest.

Wed Mar 25 21:39:25 PDT 2020

Meinersbur added a comment.

Here is my suggested dependency checker:

  // Check whether it is semantically safe Src and Dst considering any potential
  // dependency between them.
  //
  // @param UnrollLevel The level of the loop being unrolled
  // @param JamLevel    The level of the loop being jammed; if Src and Dst are on
  // different levels, the outermost common loop counts as jammed level
  //
  // @return true if is safe and false if there is a dependency violation.
  static bool checkDependency(Instruction *Src, Instruction *Dst,
                              unsigned UnrollLevel, unsigned JamLevel,
                              bool Sequentialized, DependenceInfo &DI) {
    assert(UnrollLevel <= JamLevel);

    if (Src == Dst)
      return true;
    if (isa<LoadInst>(Src) && isa<LoadInst>(Dst))
      return true;

    // Check whether unroll-and-jam may violate a dependency.
    // By construction, every dependency will be lexicographically non-negative
    // (if it was, it would violate the current execution order), such as
    //   (0,0,>,*,*)
    // Unroll-and-jam changes the GT execution of two executions to the same
    // iteration of the chosen unroll level. That is, a GT dependence becomes a GE
    // dependence (or EQ, if we fully unrolled the loop) at the loop's position:
    //   (0,0,>=,*,*)
    // Now, the dependency is not necessarily non-negative anymore, i.e.
    // unroll-and-jam may violate correctness.
    std::unique_ptr<Dependence> D = DI.depends(Src, Dst, true);
    if (!D)
      return true;
    assert(D->isOrdered() && "Expected an output, flow or anti dep.");

    // Quick bail-out.
    if (D->isConfused())
      return false;

    for (unsigned d = 1; d < UnrollLevel; ++d) {
      // Check if dependence is carried by an outer loop.
      // That is, changing
      //   (0,>,>,*,*)
      // to
      //   (0,>,>=,*,*)
      // will still not violate the dependency.
      if (D->getDirection(d) == Dependence::DVEntry::GT)
        return true;
    }

    if (!(D->getDirection(UnrollLevel) & Dependence::DVEntry::GT)) {
      // If the unrolled loop did not carry the dependency in the first place
      // (i.e. being <, <= or =), it will also not need after unroll-and-jam.
      // Note: The standard case is the dependence vector
      //   (0,0,=,>,*)
      // where the unrolled level is already EQ. Changing LT and LE should also
      // not affect the semantics, since these didn't help to fulfill the
      // dependence in the first place.
      return true;
    }

    // Check if unrolled level becomes an EQ dependence at the unroll level,
    // whether one of the inner loops would carry the dependence.
    for (unsigned d = UnrollLevel + 1; d <= JamLevel; ++d) {
      unsigned Dir = D->getDirection(d);

      // Check whether the jammed level will carry the dependency.
      if (Dir == Dependence::DVEntry::GT)
        return true;

      // A possible backwards direction will violate the dependency
      if (Dir & Dependence::DVEntry::LT)
        return false;
    }

    // We previously already already checked whether the instructions are in the
    // same region. Reaching this means that they are not, hence we have a
    // dependency violation.
    return Sequentialized;
  }

  static bool
  checkDependencies(Loop &Root, const BasicBlockSet &SubLoopBlocks,
                    const DenseMap<Loop *, BasicBlockSet> &ForeBlocksMap,
                    const DenseMap<Loop *, BasicBlockSet> &AftBlocksMap,
                    DependenceInfo &DI, LoopInfo &LI) {
    SmallVector<BasicBlockSet, 8> AllBlocks;
    for (Loop *L : Root.getLoopsInPreorder())
      if (ForeBlocksMap.find(L) != ForeBlocksMap.end())
        AllBlocks.push_back(ForeBlocksMap.lookup(L));
    AllBlocks.push_back(SubLoopBlocks);
    for (Loop *L : Root.getLoopsInPreorder())
      if (AftBlocksMap.find(L) != AftBlocksMap.end())
        AllBlocks.push_back(AftBlocksMap.lookup(L));

    unsigned LoopDepth = Root.getLoopDepth();
    SmallVector<Instruction *, 4> EarlierLoadsAndStores;
    SmallVector<Instruction *, 4> CurrentLoadsAndStores;
    for (BasicBlockSet &Blocks : AllBlocks) {
      CurrentLoadsAndStores.clear();
      if (!getLoadsAndStores(Blocks, CurrentLoadsAndStores))
        return false;

      Loop *CurLoop = LI.getLoopFor((*Blocks.begin())->front().getParent());
      unsigned CurLoopDepth = CurLoop->getLoopDepth();

      for (auto Earlier : EarlierLoadsAndStores) {
        Loop *EarlierLoop = LI.getLoopFor(Earlier->getParent());
        unsigned EarlierDepth = EarlierLoop->getLoopDepth();
        unsigned CommonLoopDepth = std::min(EarlierDepth, CurLoopDepth);
        for (auto Later : CurrentLoadsAndStores) {
          if (!checkDependency(Earlier, Later, LoopDepth, CommonLoopDepth, false,
                               DI))
            return false;
        }
      }

      size_t NumInsts = CurrentLoadsAndStores.size();
      for (size_t i = 0; i < NumInsts; ++i) {
        for (size_t j = i + 1; j < NumInsts; ++j) {
          if (!checkDependency(CurrentLoadsAndStores[i], CurrentLoadsAndStores[j],
                               LoopDepth, CurLoopDepth, true, DI))
            return false;
        }
      }

      EarlierLoadsAndStores.append(CurrentLoadsAndStores.begin(),
                                   CurrentLoadsAndStores.end());
    }
    return true;
  }

I think I discovered a bug in the current dependency checker. LoopUnrollAndJam transforms `sub_sub_more` in the check `dependencies.ll` which I dint think it is allowed to do.

================
Comment at: llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll:439
-; CHECK-NOT: %j.1 = phi
-define void @sub_sub_more(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
 entry:
----------------
Whitney wrote:
> This test was orignally testing
> ```
> for i
>   for j
>     A[i]
>     A[i+1]
> ```
> which should be **safe** to unroll and jam. 
> 
> I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is more.
> ```
> for i
>   for j
>     A[i][j]
>     A[i+1][j+1]
> ```
I think this is NOT safe to unroll-and jam. The unrolled equivalent is:
```
for i += 2
  for j
    S1:  A[i]
    S2:  A[i+1]
    S1': A[i+1]
    S2': A[i+2]
```

At S1', the lasst access of A[i+1] is S2 from the same itertation. instead if S1 from the previous j-iteration is in the original loop.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76132/new/

https://reviews.llvm.org/D76132