[llvm] 3f46a5c - [llvm-nm] Improve performance while faking symbols from function starts (#162755)
via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 11 20:45:51 PDT 2025
Author: Daniel RodrÃguez Troitiño
Date: 2025-10-11T20:45:47-07:00
New Revision: 3f46a5cf438ce6eaad406b8780289ee9b8a6941a
URL: https://github.com/llvm/llvm-project/commit/3f46a5cf438ce6eaad406b8780289ee9b8a6941a
DIFF: https://github.com/llvm/llvm-project/commit/3f46a5cf438ce6eaad406b8780289ee9b8a6941a.diff
LOG: [llvm-nm] Improve performance while faking symbols from function starts (#162755)
By default `nm` will look into `LC_FUNCTION_STARTS` for binaries that
have the flag `MH_NLIST_OUTOFSYNC_WITH_DYLDINFO` set unless
`--no-dyldinfo` flag is passed.
The implementation that looked for those `LC_FUNCTION_STARTS` in the
symbol list was a double nested loop that checked the symbol list over
and over again for each of the `LC_FUNCTION_STARTS` entries. For
binaries with couple million function starts and hundreds of thousands
of symbols, the double nested loop doesn't seem to finish and takes
hours even in powerful machines.
Instead of the nested loop, exchange time for memory and add all the
addresses of the symbols into a set that can be checked then for each of
the `LC_FUNCTION_STARTS` very quickly. What took hours and hours and did
not seem to finish now takes less than 10 seconds.
Fixes #93944
Added:
Modified:
llvm/tools/llvm-nm/llvm-nm.cpp
Removed:
################################################################################
diff --git a/llvm/tools/llvm-nm/llvm-nm.cpp b/llvm/tools/llvm-nm/llvm-nm.cpp
index ff07fbbaa5351..dcfc0f92590d5 100644
--- a/llvm/tools/llvm-nm/llvm-nm.cpp
+++ b/llvm/tools/llvm-nm/llvm-nm.cpp
@@ -15,6 +15,7 @@
//
//===----------------------------------------------------------------------===//
+#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/StringSwitch.h"
#include "llvm/BinaryFormat/COFF.h"
#include "llvm/BinaryFormat/MachO.h"
@@ -1615,15 +1616,18 @@ static void dumpSymbolsFromDLInfoMachO(MachOObjectFile &MachO,
}
// See if these addresses are already in the symbol table.
unsigned FunctionStartsAdded = 0;
+ // The addresses from FoundFns come from LC_FUNCTION_STARTS. Its contents
+ // are delta encoded addresses from the start of __TEXT, ending when zero
+ // is found. Because of this, the addresses should be unique, and even if
+ // we create fake entries on SymbolList in the second loop, SymbolAddresses
+ // should not need to be updated there.
+ SmallSet<uint64_t, 32> SymbolAddresses;
+ for (const auto &S : SymbolList)
+ SymbolAddresses.insert(S.Address);
for (uint64_t f = 0; f < FoundFns.size(); f++) {
- bool found = false;
- for (unsigned J = 0; J < SymbolList.size() && !found; ++J) {
- if (SymbolList[J].Address == FoundFns[f] + BaseSegmentAddress)
- found = true;
- }
- // See this address is not already in the symbol table fake up an
- // nlist for it.
- if (!found) {
+ // See if this address is already in the symbol table, otherwise fake up
+ // an nlist for it.
+ if (!SymbolAddresses.contains(FoundFns[f] + BaseSegmentAddress)) {
NMSymbol F = {};
F.Name = "<redacted function X>";
F.Address = FoundFns[f] + BaseSegmentAddress;
More information about the llvm-commits
mailing list