[llvm] [llvm-nm] Improve performance while faking symbols from function starts (PR #162755)
Daniel RodrÃguez Troitiño via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 9 17:33:48 PDT 2025
https://github.com/drodriguez created https://github.com/llvm/llvm-project/pull/162755
By default `nm` will look into `LC_FUNCTION_STARTS` for binaries that have the flag `MH_NLIST_OUTOFSYNC_WITH_DYLDINFO` set unless `--no-dyldinfo` flag is passed.
The implementation that looked for those `LC_FUNCTION_STARTS` in the symbol list was a double nested loop that checked the symbol list over and over again for each of the `LC_FUNCTION_STARTS` entries. For binaries with couple million function starts and hundreds of thousands of symbols, the double nested loop doesn't seem to finish and takes hours even in powerful machines.
Instead of the nested loop, exchange time for memory and add all the addresses of the symbols into a set that can be checked then for each of the `LC_FUNCTION_STARTS` very quickly. What took hours and hours and did not seem to finish now takes less than 10 seconds.
Fixes #93944
>From 5aa6b87f17664bae8761d4d395a20d81be81278c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20Rodr=C3=ADguez?= <danielrodriguez at meta.com>
Date: Thu, 9 Oct 2025 17:24:56 -0700
Subject: [PATCH] [llvm-nm] Improve performance while faking symbols from
function starts
By default `nm` will look into `LC_FUNCTION_STARTS` for binaries that
have the flag `MH_NLIST_OUTOFSYNC_WITH_DYLDINFO` set unless
`--no-dyldinfo` flag is passed.
The implementation that looked for those `LC_FUNCTION_STARTS` in the
symbol list was a double nested loop that checked the symbol list over
and over again for each of the `LC_FUNCTION_STARTS` entries. For
binaries with couple million function starts and hundreds of thousands
of symbols, the double nested loop doesn't seem to finish and takes
hours even in powerful machines.
Instead of the nested loop, exchange time for memory and add all the
addresses of the symbols into a set that can be checked then for each of
the `LC_FUNCTION_STARTS` very quickly. What took hours and hours and did
not seem to finish now takes less than 10 seconds.
Fixes #93944
---
llvm/tools/llvm-nm/llvm-nm.cpp | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/llvm/tools/llvm-nm/llvm-nm.cpp b/llvm/tools/llvm-nm/llvm-nm.cpp
index ff07fbbaa5351..1a0d045d8daa3 100644
--- a/llvm/tools/llvm-nm/llvm-nm.cpp
+++ b/llvm/tools/llvm-nm/llvm-nm.cpp
@@ -16,6 +16,7 @@
//===----------------------------------------------------------------------===//
#include "llvm/ADT/StringSwitch.h"
+#include "llvm/ADT/SmallSet.h"
#include "llvm/BinaryFormat/COFF.h"
#include "llvm/BinaryFormat/MachO.h"
#include "llvm/BinaryFormat/XCOFF.h"
@@ -1615,12 +1616,11 @@ static void dumpSymbolsFromDLInfoMachO(MachOObjectFile &MachO,
}
// See if these addresses are already in the symbol table.
unsigned FunctionStartsAdded = 0;
+ SmallSet<uint64_t, 32> SymbolAddresses;
+ for (unsigned J = 0; J < SymbolList.size(); ++J)
+ SymbolAddresses.insert(SymbolList[J].Address);
for (uint64_t f = 0; f < FoundFns.size(); f++) {
- bool found = false;
- for (unsigned J = 0; J < SymbolList.size() && !found; ++J) {
- if (SymbolList[J].Address == FoundFns[f] + BaseSegmentAddress)
- found = true;
- }
+ bool found = SymbolAddresses.contains(FoundFns[f] + BaseSegmentAddress);
// See this address is not already in the symbol table fake up an
// nlist for it.
if (!found) {
More information about the llvm-commits
mailing list