[llvm] [MemDep] Optimize SortNonLocalDepInfoCache sorting strategy for large caches with few unsorted entries (PR #143107)

Thu Jun 26 04:25:15 PDT 2025

================
@@ -983,33 +983,41 @@ MemDepResult MemoryDependenceResults::getNonLocalInfoForBlock(
 static void
 SortNonLocalDepInfoCache(MemoryDependenceResults::NonLocalDepInfo &Cache,
                          unsigned NumSortedEntries) {
-  switch (Cache.size() - NumSortedEntries) {
-  case 0:
-    // done, no new entries.
-    break;
-  case 2: {
-    // Two new entries, insert the last one into place.
-    NonLocalDepEntry Val = Cache.back();
-    Cache.pop_back();
-    MemoryDependenceResults::NonLocalDepInfo::iterator Entry =
-        std::upper_bound(Cache.begin(), Cache.end() - 1, Val);
-    Cache.insert(Entry, Val);
-    [[fallthrough]];
+
+  // Output number of sorted entries and size of cache for each sort.
+  LLVM_DEBUG(dbgs() << "NumSortedEntries: " << NumSortedEntries
+                    << ", Cache.size: " << Cache.size() << "\n");
+
+  // If only one entry, don't sort.
+  if (Cache.size() < 2)
+    return;
+
+  unsigned s = Cache.size() - NumSortedEntries;
+
+  // If the cache is already sorted, don't sort it again.
+  if (s == 0)
+    return;
+
+  // If no entry is sorted, sort the whole cache.
+  if (NumSortedEntries == 0) {
+    llvm::sort(Cache);
+    return;
   }
-  case 1:
-    // One new entry, Just insert the new value at the appropriate position.
-    if (Cache.size() != 1) {
+
+  // If the number of unsorted entires is small and the cache size is big, use
+  // insertion sort is faster. Here use Log2_32 to quickly choose the sort
+  // method.
+  if (s < Log2_32(Cache.size())) {
----------------
DingdWang wrote:

The choice of using log2 here is based on empirical experience. The main goal is to have a relatively fast way to determine whether the number of unsorted entries is significantly smaller than the cache size. To tune this condition, I experimented with the following four options, and based on the timing results, using log2 proved to be the fastest. The benchmark results are as follows:
1. s < NumSortedEntries: https://llvm-compile-time-tracker.com/compare.php?from=26f3f24a4f0a67eb23d255aba7a73a12bee1db11&to=e174118c88ee3d9d31fb3ed4e29b9ae2fcac46fa&stat=instructions%3Au
2. s < Log2_32(Cache.size()) * llvm::numbers::ln2 / llvm::numbers::ln10: https://llvm-compile-time-tracker.com/compare.php?from=26f3f24a4f0a67eb23d255aba7a73a12bee1db11&to=9368621b42fa8b68e1e3081110f82ae9a5d57458&stat=instructions%3Au
3. s < Log2_32(Cache.size()) * llvm::numbers::ln2: https://llvm-compile-time-tracker.com/compare.php?from=26f3f24a4f0a67eb23d255aba7a73a12bee1db11&to=19a8584d14dbb95b4a71a92a43da3a2c5d5e550a&stat=instructions%3Au
4. s < Log2_32(Cache.size()): https://llvm-compile-time-tracker.com/compare.php?from=26f3f24a4f0a67eb23d255aba7a73a12bee1db11&to=0fa6bc6bdf1c9c5464e81970e973f2c43edac874&stat=instructions%3Au


https://github.com/llvm/llvm-project/pull/143107