[llvm] 2d1c6e0 - [LAA] Remove block order sensitivity in LAA algorithm. PR56672

Wed Jul 27 23:37:29 PDT 2022

Author: Max Kazantsev
Date: 2022-07-28T13:36:56+07:00
New Revision: 2d1c6e0b4418e92df286cba428a3e4cf56d7aa2f

URL: https://github.com/llvm/llvm-project/commit/2d1c6e0b4418e92df286cba428a3e4cf56d7aa2f
DIFF: https://github.com/llvm/llvm-project/commit/2d1c6e0b4418e92df286cba428a3e4cf56d7aa2f.diff

LOG: [LAA] Remove block order sensitivity in LAA algorithm. PR56672

As test in PR56672 shows, LAA produces different results which lead to either
positive or negative vectorization decisions depending on the order of blocks
in loop. The exact reason of this is not clear to me, however this makes investigation
of related bugs extremely complex.

Current order of blocks in the loop is arbitrary. It may change, for example, if loop
info analysis is dropped and recomputed. Seems that it interferes with LAA's logic.
This patch chooses fixed traversal order of blocks in loops, making it RPOT.

Note: this is *not* a fix for bug with incorrect analysis result. It just makes
the answer more robust to make the investigation easier.

Differential Revision: https://reviews.llvm.org/D130482
Reviewed By: aeubanks, fhahn

Added: 
    llvm/test/Analysis/LoopAccessAnalysis/pr56672.ll

Modified: 
    llvm/lib/Analysis/LoopAccessAnalysis.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index aa35f253bc5f0..547e9a8123054 100644

--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -27,6 +27,7 @@
 #include "llvm/Analysis/AliasSetTracker.h"
 #include "llvm/Analysis/LoopAnalysisManager.h"
 #include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/LoopIterator.h"
 #include "llvm/Analysis/MemoryLocation.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/Analysis/ScalarEvolution.h"
@@ -2127,8 +2128,11 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
       EnableMemAccessVersioning &&
       !TheLoop->getHeader()->getParent()->hasOptSize();
 
-  // For each block.
-  for (BasicBlock *BB : TheLoop->blocks()) {
+  // Traverse blocks in fixed RPOT order, regardless of their storage in the
+  // loop info, as it may be arbitrary.
+  LoopBlocksRPO RPOT(TheLoop);
+  RPOT.perform(LI);
+  for (BasicBlock *BB : RPOT) {
     // Scan the BB and collect legal loads and stores. Also detect any
     // convergent instructions.
     for (Instruction &I : *BB) {

diff  --git a/llvm/test/Analysis/LoopAccessAnalysis/pr56672.ll b/llvm/test/Analysis/LoopAccessAnalysis/pr56672.ll
new file mode 100644
index 0000000000000..585b270687eef
--- /dev/null
+++ b/llvm/test/Analysis/LoopAccessAnalysis/pr56672.ll
@@ -0,0 +1,39 @@
+; RUN: opt -passes='loop(loop-rotate),print-access-info' -S %s 2>&1 | FileCheck %s
+; RUN: opt -passes='loop(loop-rotate),invalidate<loops>,print-access-info' -S %s 2>&1 | FileCheck %s
+
+; Make sure that the result of analysis is consistent regardless of blocks
+; order as they are stored in loop. This test demonstrates the situation when
+; recomputation of LI produces loop with 
diff erent blocks order, and LA gives
+; a 
diff erent result for it. The reason of this bug hasn't been found yet, but
+; the algorithm is somehow dependent on blocks order.
+define void @test_01(i32* %p) {
+; CHECK-LABEL: test_01
+; CHECK:       Report: unsafe dependent memory operations in loop.
+; CHECK-NOT:   Memory dependences are safe
+entry:
+  br label %loop
+
+loop.progress:                                              ; preds = %loop
+  br label %loop.backedge
+
+loop.backedge:                                              ; preds = %loop.progress
+  store i32 1, i32* %tmp7, align 4
+  %tmp = add nuw i64 %tmp5, 1
+  %tmp3 = icmp ult i64 %tmp, 1000
+  br i1 %tmp3, label %loop, label %loop.progress1
+
+loop:                                              ; preds = %loop.backedge, %entry
+  %tmp5 = phi i64 [ %tmp, %loop.backedge ], [ 16, %entry ]
+  %tmp6 = phi i64 [ %tmp5, %loop.backedge ], [ 15, %entry ]
+  %tmp7 = getelementptr inbounds i32, i32* %p, i64 %tmp5
+  %tmp8 = load i32, i32* %tmp7, align 4
+  %tmp9 = add i32 %tmp8, -5
+  store i32 %tmp9, i32* %tmp7, align 4
+  br i1 false, label %never, label %loop.progress
+
+never:                                             ; preds = %loop
+  unreachable
+
+loop.progress1:                                             ; preds = %loop.backedge
+  ret void
+}