[PATCH] D119965: [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate

Guozhi Wei via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 12 21:47:45 PDT 2022


Carrot added a comment.

Now I understand how this patch caused missing vectorization in our code. In my previous comment I have analyzed that different GVN result caused different SLPVectorization behavior. This time let's focus on how this patch generates different GVN results.

Following is simplified IR before the first LICM. The original code is much more complex, and multiple optimizations are involved, so only related instructions and control flow are listed.

  LoopHeader1:
    br %cond1, label %PreHeader2, %LoopExit1
  
  PreHeader2:
    br label %LoopHeader2
  
  LoopHeader2:
    br %cond2, label %LoopBody2, label %LoopExit2
  
  LoopBody2:
    %100 = or i64 %24, 1
    ... // uses of %100
    br label %LoopHeader2
  
  LoopExit2:
    %200 = or i64 %24, 1
    ... // uses of %200
    br label %LoopHeader1

Without this patch:

- First LICM of loop2, the definition of %100 is moved to PreHeader2. So now the definition of %100 dominates %200.
- LoopRotate of loop1, BB LoopHeader1 is duplicated into predecessors and deleted, PreHeader2 becomes the new loop header of loop1, so now it's more obvious that %100 dominates %200.
- GVN, because the definition of %100 dominates %200, %200 is deleted, all uses of %200 are replaced by %100.

With this patch, we have following different behavior:

- First LICM of loop2, speculation is disabled, the definition of %100 is not moved, so code is not changed.
- LoopRotate of loop2, LoopHeader2 is duplicated into predecessors and deleted. Pay attention that there is a new pre header for loop2 is created. Now we have following code

  LoopHeader1:
    br %cond1, label %PreHeader2, %LoopExit1
  
  PreHeader2:
    br %cond2, label %NewPreHeader2, label %LoopExit2
  
  NewPreHeader2:
    br %LoopBody2
  
  LoopBody2:
    %100 = or i64 %24, 1
    ... // uses of %100
    br %cond2, label %LoopBody2, label %_crit_edge
  
  _crit_edge
    br label %LoopExit2
  
  LoopExit2:
    %200 = or i64 %24, 1
    ... // uses of %200
    br label %LoopHeader1

- Second LICM of loop2, this time speculation is allowed, so the definition of %100 is moved to NewPreHeader2, but this time it doesn't dominate %200.

  LoopHeader1:
    br %cond1, label %PreHeader2, %LoopExit1
  
  PreHeader2:
    br %cond2, label %NewPreHeader2, label %LoopExit2
  
  NewPreHeader2:
    %100 = or i64 %24, 1
    br %LoopBody2
  
  LoopBody2:
    ... // uses of %100
    br %cond2, label %LoopBody2, label %_crit_edge
  
  _crit_edge
    br label %LoopExit2
  
  LoopExit2:
    %200 = or i64 %24, 1
    ... // uses of %200
    br label %LoopHeader1

- LoopRotate of loop1, LoopHeader1 is duplicated into predecessors and deleted, PreHeader2 becomes the new loop header of loop1. %100 still doesn't dominates %200

  NewPreHeader1:
    br label %PreHeader2
  
  PreHeader2:                        // It's actually loop header of loop1
    br %cond2, label %NewPreHeader2, label %LoopExit2
  
  NewPreHeader2:
    %100 = or i64 %24, 1
    br %LoopBody2
  
  LoopBody2:
    ... // uses of %100
    br %cond2, label %LoopBody2, label %_crit_edge
  
  _crit_edge
    br label %LoopExit2
  
  LoopExit2:
    %200 = or i64 %24, 1
    ... // uses of %200
    br %cond1, label %PreHeader2, label %LoopExit1

- GVN, because the definition of %100 can reach %200 but doesn't dominate %200, so GVN adds a new definition in the other path PreHeader2 -> LoopExit2, it's a critical edge, so it's splitted. New definition of the "or" instruction is inserted in the new BB. A PHI instruction is inserted in LoopExit2.

  NewPreHeader1:
    br label %PreHeader2
  
  PreHeader2:                        // It's actually loop header of loop1
    br %cond2, label %NewPreHeader2, label %LoopExit2.crit_edge
  
  NewPreHeader2:
    %100 = or i64 %24, 1
    br %LoopBody2
  
  LoopBody2:
    ... // uses of %100
    br %cond2, label %LoopBody2, label %_crit_edge
  
  _crit_edge:
    br label %LoopExit2
  
  LoopExit2.crit_edge:
    %150 = or i64 %24, 1
    br label %LoopExit2
    
  LoopExit2:
    %200 = phi i64 [%150, LoopExit2.crit_edge], [%100, _crit_edge]
    ... // uses of %200
    br %cond1, label %PreHeader2, label %LoopExit1

This is how we got different IR at the end of GVN. And later SLPVectorization makes different decision with these IR.

Any suggestions on how to fix it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119965/new/

https://reviews.llvm.org/D119965



More information about the llvm-commits mailing list