[PATCH] D112735: export unique symbol list with llvm-nm new option "--export-symbols"
James Henderson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 3 02:19:49 PST 2022
jhenderson added a comment.
As mentioned a bit inline, I think you should do some of the refactoring in separate patches, to simplify the number of changes being added here.
1. Pull out the `isSymbolDefined` code as you've already done.
2. Split out/update the symbol sorting code as you've done/as I've suggested in the inline comments in this patch.
I think you should also do a refactor whereby SymbolList is no longer a global variable - having it as a global variable is confusing, at best, and at worst risks cases where it is still populated when it shouldn't be.
Finally, I think having some printing happen before the symbols have been gathered may be a bad idea. At the moment, things like the input filename are printed quite early on, which isn't what we want in some cases. There's no reason this couldn't be deferred until the printing stage, so I'd look at a refactor that does this.
At the moment, the regular behaviour is essentially:
for each input file (and archive member):
get all symbols
sort them
print them according to output format, filtering according to options
A possible additional refactor to the existing code would be to move filtering to the "getting" point, so that they are never added to the symbol list in the first place.
For this patch, I think you essentially want to reuse the getting, sorting and filtering stages. The difference is that rather than print then throw away the symbol list, you want to keep it and do additional stuff to it. In essence, the structure for export symbol list is:
for each input file (and archive member):
add the symbols to a single global list
sort the full list
uniquify the full list
filter the full list
print the full list
Does this match your understanding? You'll notice that most of these steps, and the general structure are pretty similar. Consequently, I think we can restructure your new code a bit more logically. At the end of main, instead of where we currently have the llvm::for_each to dump symbol names, and then a separate if, you do something like the following:
if (ExportSymbols)
printExportSymbolList();
else
llvm::for_each(InputFilenames, dumpSymbolNamesFromFile);
`dumpSymbolNamesFromFile` (or more precisely `dumpSymbolNamesFromObject`) will then call a function (created by moving code out of `dumpSymbolNamesFromObject`) which gets the symbols and adds them to the symbol list. Let's call this `getSymbolsFromObject` for our purposes here. Beyond that, the process remains essentially the same. `printExportSymbolList` loops over all inputs and calls `getSymbolsFromObject` on each, storing the resultant symbols in a single combined list. It then sorts and merges the list before printing. The merging and printing is unique to this code path. The gathering, filtering and sorting is shared code with the regular code path.
Does that make sense? This isn't too far from the current implementation, but there are some subtle differences that should stop the need for lots of `if(ExportSymbols)` type checks (the only one necessary should be the one in `main`).
================
Comment at: llvm/docs/CommandGuide/llvm-nm.rst:280-281
+
+ Exclude the resource file symbol (__rsrc) from export symbol list for XCOFF
+ object file.
+
----------------
Another suggested rewording - you're now in "XCOFF-specific", so don't need to mention "for XCOFF...". Also, this option will apply to multiple inputs as much as one single one, so use the plural.
That being said, I don't think this option should be restricted to the export symbol list, and instead should always apply. This will make it similar to options like `--no-weak`, which makes implementation more natural.
================
Comment at: llvm/test/tools/llvm-nm/Inputs/bitcode-sym64.ll:13
+}
+
----------------
Nit: delete extra trailing blank line.
================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export-symbols.test:1
+## Test the option "--export-symbols" of llvm-nm.
+## The option merges all the output of input files, sorts and prints out unique symbols from the input files.
----------------
Sorry, another suggested rewording.
================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export-symbols.test:10
+## Do not export hidden and internal symbols.
+## Remove name prefixes of global symbols beginning with "__tf1" and "__tf9".
+# RUN: llvm-nm --export-symbols %t1.o | FileCheck %s --check-prefixes=COMMON,WEAK --implicit-check-not={{.}}
----------------
Is it supposed to explicitly be "\_\_tf1" and "\_\_tf9" only, or should this include e.g. \_\_tf2, \_\_tf3 etc?
If it should include, I'd reword this comment. If it shouldn't, I'd add an additional symbol called __tf2... to show that the prefix isn't removed in that case.
================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export-symbols.test:16
+
+## Show that only unique symbols (with the different name or visibility) are exported.
+## RUN: llvm-nm --export-symbols %t1.o %t2.o | FileCheck --check-prefixes=COMMON,WEAK,OBJ2 %s --implicit-check-not={{.}}
----------------
================
Comment at: llvm/tools/llvm-nm/Opts.td:55-56
+
+def no_rsrc : FF<"no-rsrc", "Exclude the resource file symbol (__rsrc) symbol from the export symbol list "
+ "for xcoff (requires --export-symbols)">, Group<grp_xcoff_o>;
+
----------------
(and reflow as appropriate)
Explanation is the same as the suggested change in the CommandGuide.
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:235
-static bool compareSymbolAddress(const NMSymbol &A, const NMSymbol &B) {
- bool ADefined;
- // Symbol flags have been checked in the caller.
- if (A.Sym.getRawDataRefImpl().p) {
- uint32_t AFlags = cantFail(A.Sym.getFlags());
- ADefined = !(AFlags & SymbolRef::SF_Undefined);
- } else {
- ADefined = A.TypeChar != 'U';
- }
- bool BDefined;
- // Symbol flags have been checked in the caller.
- if (B.Sym.getRawDataRefImpl().p) {
- uint32_t BFlags = cantFail(B.Sym.getFlags());
- BDefined = !(BFlags & SymbolRef::SF_Undefined);
- } else {
- BDefined = B.TypeChar != 'U';
+ bool isSymbolDefined() const {
+ if (Sym.getRawDataRefImpl().p) {
----------------
The addition of this new function is a useful improvement, but can be its own prerequisite patch. Please split it out and rebase this patch on top of that one.
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:243
-static bool compareSymbolSize(const NMSymbol &A, const NMSymbol &B) {
- return std::make_tuple(A.Size, A.Name, A.Address) <
- std::make_tuple(B.Size, B.Name, B.Address);
-}
+ bool operator<(const NMSymbol &B) const {
+ if (NumericSort)
----------------
Sorry, didn't think through my comment from yesterday enough. Make this an out-of-class operator, i.e.
```
bool operator<(const NMSymbol &A, const NMSymbol &B) { ... }
```
Then implement the equality operator simply as follows:
```
bool operator==(const NMSymbol &A, const NMSymbol &B) {
return !(A < B) && !(B < A);
}
```
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:680
+ llvm::stable_sort(SymbolList, [=](const NMSymbol &A, const NMSymbol &B) {
+ return B < A;
+ });
----------------
Consider implementing `operator>` as follows:
```
bool operator!=(const NMSymbol &A, const NMSymbol &B) {
return !(A == B);
}
bool operator>(const NMSymbol &A, const NMSymbol &B) {
return A != B && !(A < B);
}
```
and then using that directly as the predicate without the need for the lamdba:
```
llvm::stable_sort(SymbolList, operator>);
```
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:686
+
+static bool setSymFlag(NMSymbol &S, const SymbolicFile &Obj) {
+ Expected<uint32_t> SymFlagsOrErr = S.Sym.getFlags();
----------------
Let's make this a member function of `NMSymbol`.
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:697
+static bool isSymPrintSkip(const uint32_t SymFlags) {
+ bool Undefined = SymFlags & SymbolRef::SF_Undefined;
----------------
We don't usually bother const-ifying local variables that aren't pointers/references.
It's not clear to me what this function name is trying to communicate. I'm guessing it's "should skip printing this symbol" so I'd rename it `shouldSkipPrintingSymbol`. I'd also consider, if it makes sense, making this a member function of NMSymbol, rather than passing in the flags (this only applies if all references use the NMSymbol flags member).
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:710
+static void printSymbolList() {
+ for (const auto &Sym : SymbolList) {
+ if (isSymPrintSkip(Sym.SymFlags))
----------------
Spell out the type, rather than using `auto` here.
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:713
+ continue;
+ outs() << Sym.Name;
+ if (!Sym.Visibility.empty())
----------------
Should this be demangled?
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1733-1737
+ if (SymName.startswith("__tf1")) {
+ SymName = SymName.substr(6);
+ } else if (SymName.startswith("__tf9")) {
+ SymName = SymName.substr(14);
+ }
----------------
As per MaskRay's comment above.
================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1792
if (!(MachO && DyldInfoOnly)) {
+
+ if (ExportSymbols && Obj.isXCOFF()) {
----------------
Please don't start blocks with a blank line.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D112735/new/
https://reviews.llvm.org/D112735
More information about the llvm-commits
mailing list