[PATCH] D28549: Global DCE performance improvement

serge via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 12 03:07:49 PST 2017


serge-sans-paille marked 9 inline comments as done.
serge-sans-paille added a comment.

In https://reviews.llvm.org/D28549#643081, @davide wrote:

> Do you have numbers showing up the improvement?


The complexity of the original algorithm is roughly in O(|live function instruction|), and the complexity of this algorithm is roughly in O(|globals users|).

So if there's a big global function, previous algorithm is more costly (and that's the original motivating example).
If there's a big dead function, this algorithm is more costly.

On average test cases, it does not make much difference, It's only noticeable on very large functions (> 1000000 instructions) which happens to be my use case.

> What's the effect of memoization on memory usage?

memoization of constants was already implemented in the previous algorithm, so not much difference here.
This algorithm contains an additional map that holds the global dependencies, so it should have a bigger memory consumption.



================
Comment at: include/llvm/Transforms/IPO/GlobalDCE.h:36
+  SmallPtrSet<GlobalValue*, 16> LiveGVs_;
+  DenseMap<GlobalValue*, SmallPtrSet<GlobalValue*, 8>> GVDependencies_;
+  std::unordered_map<Constant*, SmallPtrSet<GlobalValue*, 8>> DependenciesCache_;
----------------
mehdi_amini wrote:
> `DenseMap` is intended to contains small value, `SmallPtrSet` seems a bit `fat` here?
> 
> Also, after reading the algorithm, this could be a vector, you should not need dedup by construction.
According to my tests, DenseMap performs better on large test case than std::unorderd_map (by a factor of two)

Correct for Set->Vector


================
Comment at: lib/Transforms/IPO/GlobalDCE.cpp:103
+      Deps.insert(LocalDeps.begin(), LocalDeps.end());
+    }
+  }
----------------
mehdi_amini wrote:
> So the memorization is only used for constants (GV initializers, GEP, what else?), I'd rename it to make it explicit that this cache only applies to constants.
> 
> Also can such constant have multiple uses? I'm failing to remember in which case right now. If they only have a single use we could just traverse and remove the memorization entirely.
See r179458


================
Comment at: lib/Transforms/IPO/GlobalDCE.cpp:113
+  for(auto* GVU : Deps) {
+    GVDependencies_[GVU].insert(&GV);
+  }
----------------
mehdi_amini wrote:
> Why not passing GV to ComputeDependencies instead of Deps and populate directly `GVDependencies_` ?
I think it's clearer like this, as it separates the dependencies build from the GVDependencies update.


================
Comment at: lib/Transforms/IPO/GlobalDCE.cpp:126
+      for (auto &&CM : make_range(ComdatMembers_.equal_range(C)))
+        MarkLive(*CM.second, Updates);
+    }
----------------
mehdi_amini wrote:
> What's the bound on the recursion? How do we guarantee that we won't run out of stack? Worth a comment. 
There's a finite number of globals and they can only be visited once because of the set check.


Repository:
  rL LLVM

https://reviews.llvm.org/D28549





More information about the llvm-commits mailing list