[PATCH] D112735: export unique symbol list with llvm-nm new option "--export-symbols"

Digger Lin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 3 10:57:04 PST 2022


DiggerLin added a comment.

In D112735#3292985 <https://reviews.llvm.org/D112735#3292985>, @jhenderson wrote:

> As mentioned a bit inline, I think you should do some of the refactoring in separate patches, to simplify the number of changes being added here.
>
> 1. Pull out the `isSymbolDefined` code as you've already done.
> 2. Split out/update the symbol sorting code as you've done/as I've suggested in the inline comments in this patch.



  I think it is too late  create a refactor patch now ,we already have done several review on it.  the code is already in the patch, I do not think it is a good idea  to spend time to create a separate refactor patch and rebase current patch now. 


> I think you should also do a refactor whereby SymbolList is no longer a global variable - having it as a global variable is confusing, at best, and at worst risks cases where it is still populated when it shouldn't be.

  for the export list, I think we need a global variable , otherwise we need a variable in main() , and then pass into each function.  The llvm-nm is c style code, I do not think we can do all the refactor in this patch.   if you have have good idea, please let me know, I will create  a separate refactor patch after current patch land.

> Finally, I think having some printing happen before the symbols have been gathered may be a bad idea. At the moment, things like the input filename are printed quite early on, which isn't what we want in some cases. There's no reason this couldn't be deferred until the printing stage, so I'd look at a refactor that does this.
>
> At the moment, the regular behavior is essentially:
>
>   for each input file (and archive member):
>       get all symbols
>       sort them
>       print them according to output format, filtering according to options
>
> A possible additional refactor to the existing code would be to move filtering to the "getting" point, so that they are never added to the symbol list in the first place.



  Agree with you, we should do filter when get the all the symbols a separate refactor patch after the patch landed. there are several functions need to change include function dumpSymbolsFromDLInfoMachO()  

> For this patch, I think you essentially want to reuse the getting, sorting and filtering stages. The difference is that rather than print then throw away the symbol list, you want to keep it and do additional stuff to it.  In essence, the structure for export symbol list is:
>
>   for each input file (and archive member):
>     add the symbols to a single global list
>   sort the full list
>   uniquify the full list
>   filter the full list
>   print the full list
>
> Does this match your understanding?

  Yes.

> You'll notice that most of these steps, and the general structure are pretty similar. Consequently, I think we can restructure your new code a bit more logically. At the end of main, instead of where we currently have the llvm::for_each to dump symbol names, and then a separate if, you do something like the following:
>
>   if (ExportSymbols)
>     printExportSymbolList();
>   else
>     llvm::for_each(InputFilenames, dumpSymbolNamesFromFile);
>
> `dumpSymbolNamesFromFile` (or more precisely `dumpSymbolNamesFromObject`) will then call a function (created by moving code out of `dumpSymbolNamesFromObject`) which gets the symbols and adds them to the symbol list. Let's call this `getSymbolsFromObject` for our purposes here. Beyond that, the process remains essentially the same. `printExportSymbolList` loops over all inputs and calls `getSymbolsFromObject` on each, storing the resultant symbols in a single combined list. It then sorts and merges the list before printing. The merging and printing is unique to this code path. The gathering, filtering and sorting is shared code with the regular code path.
>
> Does that make sense? This isn't too far from the current implementation, but there are some subtle differences that should stop the need for lots of `if(ExportSymbols)` type checks (the only one necessary should be the one in `main`).



1. Actually , there is only two  additional  if(ExportSymbols) in the dumpSymbolNamesFromObject() , I do not think it is big problem.
2. printExportSymbolList loops over all inputs and calls getSymbolsFromObject , I think it need to share the code the dumpSymbolNamesFromFile. and we need to added a additional  if(ExportSymbols) in the dumpSymbolNamesFromFile to go to different function .

So I do think there is benefit.



================
Comment at: llvm/docs/CommandGuide/llvm-nm.rst:280-281
+
+  Exclude the resource file symbol (__rsrc) from export symbol list for XCOFF 
+  object file.
+
----------------
jhenderson wrote:
> Another suggested rewording - you're now in "XCOFF-specific", so don't need to mention "for XCOFF...". Also, this option will apply to multiple inputs as much as one single one, so use the plural.
> 
> That being said, I don't think this option should be restricted to the export symbol list, and instead should always apply. This will make it similar to options like `--no-weak`, which makes implementation more natural.
thanks, There maybe have multi "__rsrc" symbols but has one export symbol list.

in the aix nm tools, it do not have the similar option "--no-rsrc" .
only in IBM CreateExportList  has the option. 
[[ https://www.ibm.com/docs/en/xl-c-aix/13.1.0?topic=library-exporting-symbols-createexportlist-utility  | name ]]


================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export-symbols.test:10
+## Do not export hidden and internal symbols.
+## Remove name prefixes of global symbols beginning with "__tf1" and "__tf9".
+# RUN: llvm-nm --export-symbols %t1.o | FileCheck %s --check-prefixes=COMMON,WEAK --implicit-check-not={{.}}
----------------
jhenderson wrote:
> Is it supposed to explicitly be "\_\_tf1" and "\_\_tf9" only, or should this include e.g. \_\_tf2, \_\_tf3 etc?
> 
> If it should include, I'd reword this comment. If it shouldn't, I'd add an additional symbol called __tf2... to show that the prefix isn't removed in that case.
1. it is only "__tf1....." and "__tf9...." 
2. for "__tf1...."  remove the first 6 char prefix
 3. for "__tf9..." remove the first 15 char prefix.

the shell script of IBM original CreateExportList , $NF is symbol name.

....

if (substr ($NF, 1, 7) != "__sinit" &&
                 substr ($NF, 1, 7) != "__sterm" &&
                 match($NF,/^__[0-9]+__/)==0 ) {
                if (substr ($NF, 1, 5) == "__tf1")
                                         print (substr ($NF, 7)) VISIBILITY
                else if (substr ($NF, 1, 5) == "__tf9")
                                         print (substr ($NF, 15)) VISIBILITY
                else if (!('$incl_resource' && $NF == "__rsrc"))
                                         print $NF VISIBILITY
             }
....


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:243
 
-static bool compareSymbolSize(const NMSymbol &A, const NMSymbol &B) {
-  return std::make_tuple(A.Size, A.Name, A.Address) <
-         std::make_tuple(B.Size, B.Name, B.Address);
-}
+  bool operator<(const NMSymbol &B) const {
+    if (NumericSort)
----------------
jhenderson wrote:
> Sorry, didn't think through my comment from yesterday enough. Make this an out-of-class operator, i.e.
> 
> ```
> bool operator<(const NMSymbol &A, const NMSymbol &B) { ... }
> ```
> 
> Then implement the equality operator simply as follows:
> ```
> bool operator==(const NMSymbol &A, const NMSymbol &B) {
>   return !(A < B) && !(B < A);
> }
> ```
1. I just wonder what is benefit to change to out-of-class operator ?  the benefit is we can use llvm::stable_sort(SymbolList, operator>); ?

since I do not think it is effecient to implement 

```
bool operator>(const NMSymbol &A, const NMSymbol &B) {
  return A != B && !(A < B);
} 
```

I explain below. 


2 .
```
bool operator==(const NMSymbol &A, const NMSymbol &B) {
  return !(A < B) && !(B < A);
}
```
the code is not efficient,  for example , when we compare to string equal, 

str1==str2 is more efficient than !(str1 < str2) && !(str2 <str1)

using str1 == str2 only go through each byte of the string once, 
but 
!(str1 < str2) && !(str2 <str1) 
need go through each byte of the string twice.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:680
+    llvm::stable_sort(SymbolList, [=](const NMSymbol &A, const NMSymbol &B) {
+      return B < A;
+    });
----------------
jhenderson wrote:
> Consider implementing `operator>` as follows:
> ```
> bool operator!=(const NMSymbol &A, const NMSymbol &B) {
>   return !(A == B);
> }
> 
> bool operator>(const NMSymbol &A, const NMSymbol &B) {
>   return A != B && !(A < B);
> }
> ```
> and then using that directly as the predicate without the need for the lamdba:
> ```
> llvm::stable_sort(SymbolList, operator>);
> ```
same reason 
bool operator>(const NMSymbol &A, const NMSymbol &B) {
  return A != B && !(A < B);
} 

this need to go through each bytes twice , not efficient.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:686
+
+static bool setSymFlag(NMSymbol &S, const SymbolicFile &Obj) {
+  Expected<uint32_t> SymFlagsOrErr = S.Sym.getFlags();
----------------
jhenderson wrote:
> Let's make this a member function of `NMSymbol`.
thanks


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:697
 
+static bool isSymPrintSkip(const uint32_t SymFlags) {
+  bool Undefined = SymFlags & SymbolRef::SF_Undefined;
----------------
jhenderson wrote:
> We don't usually bother const-ifying local variables that aren't pointers/references.
> 
> It's not clear to me what this function name is trying to communicate. I'm guessing it's "should skip printing this symbol" so I'd rename it `shouldSkipPrintingSymbol`. I'd also consider, if it makes sense, making this a member function of NMSymbol, rather than passing in the flags (this only applies if all references use the NMSymbol flags member).
yes, the purpose of the function is "should skip printing this symbol" , thanks I will change the name , and I will it as member function.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:713
+      continue;
+    outs() << Sym.Name;
+    if (!Sym.Visibility.empty())
----------------
jhenderson wrote:
> Should this be demangled?
As I know , it do not need demangle for create export list now, if it need later, we can create a new patch for it later.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112735/new/

https://reviews.llvm.org/D112735



More information about the llvm-commits mailing list