[llvm] [llvm-nm] Improve performance while faking symbols from function starts (PR #162755)

Daniel Rodríguez Troitiño via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 9 17:36:42 PDT 2025


https://github.com/drodriguez updated https://github.com/llvm/llvm-project/pull/162755

>From 87e7ec3365d741111b50c1fb63bbf502332dc9b9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20Rodr=C3=ADguez?= <danielrodriguez at meta.com>
Date: Thu, 9 Oct 2025 17:24:56 -0700
Subject: [PATCH] [llvm-nm] Improve performance while faking symbols from
 function starts

By default `nm` will look into `LC_FUNCTION_STARTS` for binaries that
have the flag `MH_NLIST_OUTOFSYNC_WITH_DYLDINFO` set unless
`--no-dyldinfo` flag is passed.

The implementation that looked for those `LC_FUNCTION_STARTS` in the
symbol list was a double nested loop that checked the symbol list over
and over again for each of the `LC_FUNCTION_STARTS` entries. For
binaries with couple million function starts and hundreds of thousands
of symbols, the double nested loop doesn't seem to finish and takes
hours even in powerful machines.

Instead of the nested loop, exchange time for memory and add all the
addresses of the symbols into a set that can be checked then for each of
the `LC_FUNCTION_STARTS` very quickly. What took hours and hours and did
not seem to finish now takes less than 10 seconds.

Fixes #93944
---
 llvm/tools/llvm-nm/llvm-nm.cpp | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/llvm/tools/llvm-nm/llvm-nm.cpp b/llvm/tools/llvm-nm/llvm-nm.cpp
index ff07fbbaa5351..2cc8faeae6e48 100644
--- a/llvm/tools/llvm-nm/llvm-nm.cpp
+++ b/llvm/tools/llvm-nm/llvm-nm.cpp
@@ -15,6 +15,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/BinaryFormat/COFF.h"
 #include "llvm/BinaryFormat/MachO.h"
@@ -1615,12 +1616,11 @@ static void dumpSymbolsFromDLInfoMachO(MachOObjectFile &MachO,
     }
     // See if these addresses are already in the symbol table.
     unsigned FunctionStartsAdded = 0;
+    SmallSet<uint64_t, 32> SymbolAddresses;
+    for (unsigned J = 0; J < SymbolList.size(); ++J)
+      SymbolAddresses.insert(SymbolList[J].Address);
     for (uint64_t f = 0; f < FoundFns.size(); f++) {
-      bool found = false;
-      for (unsigned J = 0; J < SymbolList.size() && !found; ++J) {
-        if (SymbolList[J].Address == FoundFns[f] + BaseSegmentAddress)
-          found = true;
-      }
+      bool found = SymbolAddresses.contains(FoundFns[f] + BaseSegmentAddress);
       // See this address is not already in the symbol table fake up an
       // nlist for it.
       if (!found) {



More information about the llvm-commits mailing list