[PATCH] D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate

Tue Apr 12 21:47:45 PDT 2022

Carrot added a comment.

Now I understand how this patch caused missing vectorization in our code. In my previous comment I have analyzed that different GVN result caused different SLPVectorization behavior. This time let's focus on how this patch generates different GVN results.

Following is simplified IR before the first LICM. The original code is much more complex, and multiple optimizations are involved, so only related instructions and control flow are listed.

  LoopHeader1:
    br %cond1, label %PreHeader2, %LoopExit1

  PreHeader2:
    br label %LoopHeader2

  LoopHeader2:
    br %cond2, label %LoopBody2, label %LoopExit2

  LoopBody2:
    %100 = or i64 %24, 1
    ... // uses of %100
    br label %LoopHeader2

  LoopExit2:
    %200 = or i64 %24, 1
    ... // uses of %200
    br label %LoopHeader1

Without this patch:

- First LICM of loop2, the definition of %100 is moved to PreHeader2. So now the definition of %100 dominates %200.
- LoopRotate of loop1, BB LoopHeader1 is duplicated into predecessors and deleted, PreHeader2 becomes the new loop header of loop1, so now it's more obvious that %100 dominates %200.
- GVN, because the definition of %100 dominates %200, %200 is deleted, all uses of %200 are replaced by %100.

With this patch, we have following different behavior:

- First LICM of loop2, speculation is disabled, the definition of %100 is not moved, so code is not changed.
- LoopRotate of loop2, LoopHeader2 is duplicated into predecessors and deleted. Pay attention that there is a new pre header for loop2 is created. Now we have following code

  LoopHeader1:
    br %cond1, label %PreHeader2, %LoopExit1

  PreHeader2:
    br %cond2, label %NewPreHeader2, label %LoopExit2

  NewPreHeader2:
    br %LoopBody2

  LoopBody2:
    %100 = or i64 %24, 1
    ... // uses of %100
    br %cond2, label %LoopBody2, label %_crit_edge

  _crit_edge
    br label %LoopExit2

  LoopExit2:
    %200 = or i64 %24, 1
    ... // uses of %200
    br label %LoopHeader1

- Second LICM of loop2, this time speculation is allowed, so the definition of %100 is moved to NewPreHeader2, but this time it doesn't dominate %200.

  LoopHeader1:
    br %cond1, label %PreHeader2, %LoopExit1

  PreHeader2:
    br %cond2, label %NewPreHeader2, label %LoopExit2

  NewPreHeader2:
    %100 = or i64 %24, 1
    br %LoopBody2

  LoopBody2:
    ... // uses of %100
    br %cond2, label %LoopBody2, label %_crit_edge

  _crit_edge
    br label %LoopExit2

  LoopExit2:
    %200 = or i64 %24, 1
    ... // uses of %200
    br label %LoopHeader1

- LoopRotate of loop1, LoopHeader1 is duplicated into predecessors and deleted, PreHeader2 becomes the new loop header of loop1. %100 still doesn't dominates %200

  NewPreHeader1:
    br label %PreHeader2

  PreHeader2:                        // It's actually loop header of loop1
    br %cond2, label %NewPreHeader2, label %LoopExit2

  NewPreHeader2:
    %100 = or i64 %24, 1
    br %LoopBody2

  LoopBody2:
    ... // uses of %100
    br %cond2, label %LoopBody2, label %_crit_edge

  _crit_edge
    br label %LoopExit2

  LoopExit2:
    %200 = or i64 %24, 1
    ... // uses of %200
    br %cond1, label %PreHeader2, label %LoopExit1

- GVN, because the definition of %100 can reach %200 but doesn't dominate %200, so GVN adds a new definition in the other path PreHeader2 -> LoopExit2, it's a critical edge, so it's splitted. New definition of the "or" instruction is inserted in the new BB. A PHI instruction is inserted in LoopExit2.

  NewPreHeader1:
    br label %PreHeader2

  PreHeader2:                        // It's actually loop header of loop1
    br %cond2, label %NewPreHeader2, label %LoopExit2.crit_edge

  NewPreHeader2:
    %100 = or i64 %24, 1
    br %LoopBody2

  LoopBody2:
    ... // uses of %100
    br %cond2, label %LoopBody2, label %_crit_edge

  _crit_edge:
    br label %LoopExit2

  LoopExit2.crit_edge:
    %150 = or i64 %24, 1
    br label %LoopExit2

  LoopExit2:
    %200 = phi i64 [%150, LoopExit2.crit_edge], [%100, _crit_edge]
    ... // uses of %200
    br %cond1, label %PreHeader2, label %LoopExit1

This is how we got different IR at the end of GVN. And later SLPVectorization makes different decision with these IR.

Any suggestions on how to fix it?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119965/new/

https://reviews.llvm.org/D119965