[PATCH] D144226: [Loop-Interchange] Allow inner-loop only reductions

Mon May 1 16:40:59 PDT 2023

quic_aankit added a comment.

In D144226#4301073 <https://reviews.llvm.org/D144226#4301073>, @congzhe wrote:

> In addition to Michael's comment, I'd like to suggest several comments:
>
> 1. This patch "undo" LICM within loop interchange in order to form a tightly nest loopnest. This sounds like a phase ordering problem, i.e., you could just place loop interchange before LICM and then you'll be able to interchange the loopnest. If it is a phase ordering problem then I'm not sure if it makes perfect sense to undo pass A within pass B, because things could quickly get messy if we choose to develop in this way.
>
> 2. Also, have you considered the "loopnest" version of loop passes? For example if you use the loopnest invariant code motion (LNICM) pass instead of LICM pass in the pipeline, you'll not hoist `z[j]` into `int tmp = z[j]` (in the example from your summary), because LNICM will only do hoisting if a variable is invariant across the entire loopnest.
>
> 3. In D53027 <https://reviews.llvm.org/D53027>, support for inner-only reductions is removed because of miscompilation of certain cases, as well as profitability concerns since interchange would move loads and stores from the outer loop into the inner loop. Have you thought about addressing these problems?
>
> One possible miscompilation, as mentioned in https://reviews.llvm.org/D53027#1272786, is that it would miscompile if we interchange the following code, given that y is a global variable.
>
>   for (unsigned i = 0; i < N; i++) {
>     y = 0;
>     for (unsigned j = 0; j < N; j++) {
>       A[i][j] += 1;
>       y += A[i][j];
>     }
>   }

@congzhe Thank you for the review.

1. If we just force a run of LoopInterchangePass before LICM then I do see the interchanged loop, but then if the programmer tries to write the program in a different manner, it might not work. For eg:

  for (j = 0; j < n; j++) {
      for (k = 0; k < l; k++) {
          z[j] += x[k]*y[n*k+j];
      }
  }

would work, but this would not:

  for (j = 0; j < n; j++) {
      int tmp = z[j];
      for (k = 0; k < l; k++) {
          tmp += x[k]*y[n*k+j];
      }
      z[j] = tmp;
  }

2. Arguement 1 holds here too, if the programmer writes the program a little differently then the interchange won't happen.

3. I don't think there should be miscompiles as we are just undoing LICM only in very specific cases. With regards to profitability, we still go through the same profitability analysis The mentioned code does not miscompile. Here are the debug logs:

  	Processing LoopList of size = 2
  	Found 2 Loads and Stores to analyze
  	Found anti dependency between Src and Dst
  	Src:  %1 = load i32, ptr %arrayidx5, align 4, !tbaa !5
  	Dst:  store i32 %add, ptr %arrayidx5, align 4, !tbaa !5
  	Processing InnerLoopId = 1 and OuterLoopId = 0
  	Inner loop PHI is not part of reductions across the outer loop.
  	Only inner loops with induction or reduction PHI nodes are supported currently.
  	Not legal because of current transform limitation
  	Not interchanging loops. Cannot prove legality.

I haven't done an extensive analysis of the performance characteristics of this
change. Can you please suggest some way on how I can check if this patch causes
any degradations? The pass is OFF by default unless one enables it explicitly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144226/new/

https://reviews.llvm.org/D144226