[PATCH] D50985: [SCEV] LoopsUsed memoization

Roman Tereshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 20 11:41:22 PDT 2018


rtereshin created this revision.
rtereshin added reviewers: sanjoy, mkazantsev, efriedma.
Herald added a subscriber: javed.absar.

Currently ScalarEvolution::getUsedLoops traverses a SCEV expression upon each call
w/o caching the results. As it's called by `addToLoopUseLists` for every new SCEV node created
it is time consuming. This patch adds a memoization map to speed up the calls and tries
to do so in the least invasive manner possible.

This partially addresses https://bugs.llvm.org/show_bug.cgi?id=32731

On the `large-SCEVs-shallow-getSCEV-stack.ll` test case attached to the Bugzilla bug I see
~70% reduction in overall (wall) time of

  time ./bin/opt -slsr large-SCEVs-shallow-getSCEV-stack.ll -o /dev/null

run (the rest of the time (about 98% give or take) is still spent in `checkValidity`).

CTMark shows either a small improvement or is within noise, it's hard to tell even
on a 100 runs (x86):

| Name                                          | Prev    | Current | %      | Δ       | MAD    |  | Prev    | Current | %      | Δ       | MAD    |
| --------------------------------------------- | ------- | ------- | ------ | ------- | ------ |  | ------- | ------- | ------ | ------- | ------ |
| CTMark/ClamAV/clamscan                        | 9.8202  | 9.8004  | -0.20% | -0.0198 | 0.0754 |  | 9.9054  | 9.8899  | -0.16% | -0.0155 | 0.0754 |
| CTMark/kimwitu++/kc                           | 11.0440 | 11.0182 | -0.23% | -0.0258 | 0.0268 |  | 11.0805 | 11.0802 | 0.00%  | -0.0003 | 0.0268 |
| CTMark/tramp3d-v4/tramp3d-v4                  | 11.6199 | 11.5813 | -0.33% | -0.0386 | 0.0388 |  | 11.6906 | 11.6903 | 0.00%  | -0.0003 | 0.0388 |
| CTMark/7zip/7zip-benchmark                    | 27.3075 | 27.2473 | -0.22% | -0.0602 | 0.1511 |  | 27.8942 | 27.8941 | 0.00%  | -0.0001 | 0.1511 |
| CTMark/sqlite3/sqlite3                        | 4.7515  | 4.7446  | -0.15% | -0.0069 | 0.0278 |  | 4.7894  | 4.7893  | 0.00%  | -0.0001 | 0.0278 |
| CTMark/7zip/7zip-benchmark-link               | 0.0443  | 0.0438  | -1.13% | -0.0005 | 0.0010 |  | 0.0457  | 0.0457  | 0.00%  | 0.0000  | 0.0010 |
| CTMark/Bullet/bullet-link                     | 0.0320  | 0.0319  | -0.31% | -0.0001 | 0.0006 |  | 0.0329  | 0.0329  | 0.00%  | 0.0000  | 0.0006 |
| CTMark/ClamAV/clamscan-link                   | 0.0234  | 0.0234  | 0.00%  | 0.0000  | 0.0003 |  | 0.0238  | 0.0238  | 0.00%  | 0.0000  | 0.0003 |
| CTMark/SPASS/SPASS-link                       | 0.0227  | 0.0227  | 0.00%  | 0.0000  | 0.0002 |  | 0.0230  | 0.0230  | 0.00%  | 0.0000  | 0.0002 |
| CTMark/consumer-typeset/consumer-typeset-link | 0.0239  | 0.0239  | 0.00%  | 0.0000  | 0.0003 |  | 0.0246  | 0.0246  | 0.20%  | 0.0000  | 0.0003 |
| CTMark/kimwitu++/kc-link                      | 0.0528  | 0.0528  | 0.00%  | 0.0000  | 0.0003 |  | 0.0534  | 0.0534  | 0.09%  | 0.0000  | 0.0003 |
| CTMark/lencod/lencod-link                     | 0.0238  | 0.0238  | 0.00%  | 0.0000  | 0.0003 |  | 0.0246  | 0.0246  | 0.00%  | 0.0000  | 0.0003 |
| CTMark/mafft/pairlocalalign-link              | 0.0174  | 0.0174  | 0.00%  | 0.0000  | 0.0001 |  | 0.0176  | 0.0176  | 0.28%  | 0.0000  | 0.0001 |
| CTMark/sqlite3/sqlite3-link                   | 0.0141  | 0.0141  | 0.00%  | 0.0000  | 0.0001 |  | 0.0143  | 0.0143  | 0.00%  | 0.0000  | 0.0001 |
| CTMark/tramp3d-v4/tramp3d-v4-link             | 0.0238  | 0.0238  | 0.00%  | 0.0000  | 0.0001 |  | 0.0240  | 0.0240  | 0.00%  | 0.0000  | 0.0001 |
| CTMark/consumer-typeset/consumer-typeset      | 7.6772  | 7.6499  | -0.36% | -0.0273 | 0.0343 |  | 7.7051  | 7.7058  | 0.01%  | 0.0006  | 0.0343 |
| CTMark/mafft/pairlocalalign                   | 4.9020  | 4.8927  | -0.19% | -0.0093 | 0.0199 |  | 4.9349  | 4.9355  | 0.01%  | 0.0006  | 0.0199 |
| CTMark/SPASS/SPASS                            | 9.0386  | 9.0233  | -0.17% | -0.0153 | 0.0716 |  | 9.1167  | 9.1181  | 0.02%  | 0.0014  | 0.0716 |
| CTMark/lencod/lencod                          | 8.5570  | 8.5427  | -0.17% | -0.0143 | 0.0397 |  | 8.5944  | 8.5960  | 0.02%  | 0.0016  | 0.0397 |
| CTMark/Bullet/bullet                          | 20.3327 | 20.3012 | -0.15% | -0.0315 | 0.1420 |  | 20.7366 | 20.7445 | 0.04%  | 0.0079  | 0.1420 |
|

(the left half of the table uses minimum as an aggregate function, the right half - median, the underlying data are the same 100 samples,
rows are sorted by the absolute delta of medians)

If the memory consumption becomes a concern we can switch from having a full set of loops referenced attached to every SCEV node
to a mirrored tree (DAG really) structure in a skip-list fashion so every node contains a set of unique references to all closest AddRec
nodes. This way getUsedLoops will be able to traverse a compressed tree (DAG) containing AddRec's only for any SCEV expression.


Repository:
  rL LLVM

https://reviews.llvm.org/D50985

Files:
  include/llvm/Analysis/ScalarEvolution.h
  lib/Analysis/ScalarEvolution.cpp


Index: lib/Analysis/ScalarEvolution.cpp
===================================================================
--- lib/Analysis/ScalarEvolution.cpp
+++ lib/Analysis/ScalarEvolution.cpp
@@ -11760,6 +11760,7 @@
   ExprValueMap.erase(S);
   HasRecMap.erase(S);
   MinTrailingZerosCache.erase(S);
+  LoopsRefd.erase(S);
 
   for (auto I = PredicatedSCEVRewrites.begin();
        I != PredicatedSCEVRewrites.end();) {
@@ -11790,24 +11791,33 @@
 ScalarEvolution::getUsedLoops(const SCEV *S,
                               SmallPtrSetImpl<const Loop *> &LoopsUsed) {
   struct FindUsedLoops {
-    FindUsedLoops(SmallPtrSetImpl<const Loop *> &LoopsUsed)
-        : LoopsUsed(LoopsUsed) {}
+    FindUsedLoops(SmallPtrSetImpl<const Loop *> &LoopsUsed, ScalarEvolution &SE)
+        : LoopsUsed(LoopsUsed), SE(SE) {}
     SmallPtrSetImpl<const Loop *> &LoopsUsed;
+    ScalarEvolution &SE;
+
     bool follow(const SCEV *S) {
+      auto It = SE.LoopsRefd.find(S);
+      if (It != SE.LoopsRefd.end() && &It->second != &LoopsUsed) {
+        LoopsUsed.insert(It->second.begin(), It->second.end());
+        return false;
+      }
       if (auto *AR = dyn_cast<SCEVAddRecExpr>(S))
         LoopsUsed.insert(AR->getLoop());
       return true;
     }
 
     bool isDone() const { return false; }
   };
 
-  FindUsedLoops F(LoopsUsed);
+  FindUsedLoops F(LoopsUsed, *this);
   SCEVTraversal<FindUsedLoops>(F).visitAll(S);
 }
 
 void ScalarEvolution::addToLoopUseLists(const SCEV *S) {
-  SmallPtrSet<const Loop *, 8> LoopsUsed;
+  assert(LoopsRefd.find(S) == LoopsRefd.end() &&
+         "addToLoopUseLists should be called exactly once per every new SCEV");
+  SmallPtrSetImpl<const Loop *> &LoopsUsed = LoopsRefd[S];
   getUsedLoops(S, LoopsUsed);
   for (auto *L : LoopsUsed)
     LoopUsers[L].push_back(S);
Index: include/llvm/Analysis/ScalarEvolution.h
===================================================================
--- include/llvm/Analysis/ScalarEvolution.h
+++ include/llvm/Analysis/ScalarEvolution.h
@@ -1856,6 +1856,9 @@
   /// This maps loops to a list of SCEV expressions that (transitively) use said
   /// loop.
   DenseMap<const Loop *, SmallVector<const SCEV *, 4>> LoopUsers;
+  /// The inverse of LoopUsers map; maps a SCEV expression to a set of
+  /// (transitively) referenced Loops.
+  DenseMap<const SCEV *, SmallPtrSet<const Loop *, 4>> LoopsRefd;
 
   /// Cache tentative mappings from UnknownSCEVs in a Loop, to a SCEV expression
   /// they can be rewritten into under certain predicates.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D50985.161516.patch
Type: text/x-patch
Size: 2506 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180820/71933bc1/attachment.bin>


More information about the llvm-commits mailing list