[PATCH] D73129: [LoopUnrollAndJam] Correctly update LoopInfo when unroll and jam more than 2-levels loop nests.

Tue Jan 28 04:59:43 PST 2020

dmgreen added a comment.

Put more generally, I was expecting this:

  for i
    A(i)
    for j
      B(i, j)
      for k
        C(i, j, k)
      D(i, j)
    E(i)

To be unrolled in i:

  for i +=2
    A(i)
    for j
      B(i, j)
      for k
        C(i, j, k)
      D(i, j)
    E(i)
    A(i+1)
    for j
      B(i+1, j)
      for k
        C(i+1, j, k)
      D(i+1, j)
    E(i+1)
  for i remainder
    A(i)
    for j
      B(i, j)
      for k
        C(i, j, k)
      D(i, j)
    E(i)

And then the j loops to be jammed:

  for i +=2
    A(i)
    A(i+1)
    for j
      B(i, j)
      for k
        C(i, j, k)
      D(i, j)
      B(i+1, j)
      for k
        C(i+1, j, k)
      D(i+1, j)
    E(i)
    E(i+1)
  for i remainder
    A(i)
    for j
      B(i, j)
      for k
        C(i, j, k)
      D(i, j)
    E(i)

You are saying that we should also fuse the inner loops? That sounds sensible if we can do it and prove legality (and if we can somehow prove that UnJing i is better than UnJing j, which I can see could come up in places and might out-weight the extra code bloat).

When Unroll And Jam was written we did not have general loop fusion. We now do. Can we make use of it here to fuse any sub-loops together? I believe that is how gcc writes their algorithm, but last I looked they only supported perfectly nested loops which would be a big regression over what is here now. We might just be able to attempt sub-loop fusing, using the loop fusion infrastructure we have?

The alternative like you said would be trying to prove it is valid beforehand, which would mean checking that more blocks inside subloops can be moved past each other and all the extra memory dependencies are safe.

================
Comment at: llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp:186
+  // Clone Blocks.
+  for (BasicBlock *BB : Blocks) {
+    if (BB == FirstBlock)
----------------
Whitney wrote:
> dmgreen wrote:
> > I think that it will matter that this isn't a deterministic order
> Doesn't matter functionally, but it may affect the order in LIT tests. That's why I used CHECK-DAG in unroll-and-jam.ll. I agree that could be problematic, thinking to change Blocks from a set to a vector, so the caller of the function decide on a order. 
Non-determinism is usually considered a problem, even if both outputs are equally valid (as they would be in this case). Especially in a functional safety context where you need to be testing what you are running.

Maybe use a SetVector? A vector may be fine too, if we don't need constant time lookup/removal.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73129/new/

https://reviews.llvm.org/D73129