[PATCH] D112735: export unique symbol list for xcoff with llvm-nm new option "--export-symbols"

Thu Jan 13 02:26:33 PST 2022

jhenderson added a comment.

> The functionality of "export symbol list" was originally planned to be completed by the end of last year. but considering the workload, we postponed it to the end of January this year. I think it still a long distance to have the "big archive write" patch landed(we not ye begin to review the patch ). I am sorry that I have to use canned archive in the patch.

This isn't really how the LLVM community works. Whilst I understand you may have company deadlines to meet, we want to ensure the quality of the code/tests etc that land in LLVM is good enough, without incurring unreasonable technical debt. In this case, this would be unreasonable, in my opinion: you can implement the majority of the export symbol list functionality without needing archive support. Archive support can and should be a separate and later patch. Actually, now that I think about it, you don't really need Big Archive support at all for this patch: use llvm-ar to create a regular Unix-style archive containing XCOFF objects at test time, and just use that to test the functionality provided in this patch. A separate patch, as noted inline, could include the idenetify magic code for Big archives, and this can be unit tested using gtest. This should be sufficient test coverage (combined with testing of big archive reading in the patch that is already under review) to cover this, without needing to ever have a canned big archive for this test. At a later point, if you wish, you could update the regular archive to a Big archive, once llvm-ar can create them.

With regards to the big archive writing patch, I've deliberately avoided reviewing that until the reading patch has landed. Please ping me on the writing patch, once it has, so that I don't forget about it. Also, for your information, I am off for two weeks from Monday. Maybe @MaskRay could continue the review in my absence?

================
Comment at: llvm/docs/CommandGuide/llvm-nm.rst:231-232
+
+ Export sorted and unique symbol list for object files or archives. Sort and
+ unique symbol are based on the symbol name and visibility(if there is).
+
----------------
This fixes various grammar issues, but also I think simplifies what you're saying to the bare minimum. Please rewrap to 80 columns too.

Also, please note that options in this document are ordered alphabetically, although actually this option should be in the XCOFF-specific section.

================
Comment at: llvm/lib/BinaryFormat/Magic.cpp:91-94
+  case '<':
+    if (startswith(Magic, "<bigaf>\n"))
+      return file_magic::archive;
+    break;
----------------
DiggerLin wrote:
> jhenderson wrote:
> > This is unrelated, I think?
> yes, it need ,  in the function 
> 
> ```
> static void dumpSymbolNamesFromFile(std::string &Filename) {
>  .....
>    Expected<std::unique_ptr<Binary>> BinaryOrErr =
>        **createBinary(BufferOrErr.get()->getMemBufferRef(), ContextPtr);**
> ...
>  }
> ```
> createBinary will call identify_magic()
> 
> 
> 
> 
> 
> 
> 
It's still not really related to this patch: this should be part of the Big Archive support patch series. Note that for object file support, you don't need this code.

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_ar.test:1
+## Test exporting symbol list from AIX archive file with "llvn-nm --export-symbols"
+
----------------
I strongly recommend you move archive support into a separate patch.

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:1
+## Test the option "--export-symbols" of llvm-nm.
+## The option merge all the output of input files,sort and print out unique symbols from xcoff object file.
----------------
Use `-` not `_` in test names. I think you should just drop the "obj" part of the name, and merge all the test cases into one file, as it keeps the functionality tightly focused.

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:7-8
+## Test the following cases:
+## Do not export global symbols begin with "__sinit" , "__sterm" , "." , "(".
+## Do not export global symbols begin with regular expression "^__[0-9]+__".
+## Do not export hidden symbols and internal symbols.
----------------

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:9
+## Do not export global symbols begin with regular expression "^__[0-9]+__".
+## Do not export hidden symbols and internal symbols.
+## Export substring of the global symbol name beginning with "__tf1" and "__tf9".
----------------

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:10
+## Do not export hidden symbols and internal symbols.
+## Export substring of the global symbol name beginning with "__tf1" and "__tf9".
+# RUN: llvm-nm --export-symbols %t.o | FileCheck %s --implicit-check-not={{.}}
----------------

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:13
+
+## Show that only unique symbols(with the same name and visibility) are exported. 
+# RUN: llvm-nm --export-symbols %t.o %t.o | FileCheck %s --implicit-check-not={{.}}
----------------
I've said this multiple times before: add a space before opening parentheses in English prose (i.e. not code).

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:14
+## Show that only unique symbols(with the same name and visibility) are exported. 
+# RUN: llvm-nm --export-symbols %t.o %t.o | FileCheck %s --implicit-check-not={{.}}
+
----------------
To rephrase my earlier comment, use two independent object files, rather than two identical ones as inputs. This will show the merging functionality better.

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:24-25
+
+## Show that __rsrc symbols are not exported when using the "--no-rsrc" option.
+# RUN: llvm-nm --export-symbols --no-rsrc %t.o  | FileCheck --check-prefixes=NO_RSRC %s --implicit-check-not={{.}}
+# NO_RSRC:      export_protested_var export
----------------
I still think you'd be better off grouping this with the test case above. If you really don't want to though, I'd use `NO-RSRC`, not `NO_RSRC`.

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:34
+## Show that weak symbols are not exported when using the "--no-weak" option. 
+# RUN: llvm-nm --export-symbols --no-weak %t.o | FileCheck --check-prefixes=NO-WEAK %s --implicit-check-not={{.}}
+# NO-WEAK:      __rsrc
----------------
Ditto.

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/export_sym_list_obj.test:57
+Symbols:
+  - Name:            export_protested_var
+    Section:         .data
----------------
Also below.

================
Comment at: llvm/test/tools/llvm-nm/XCOFF/not_export_shared_obj.test:1
+## Show that symbols in shared object files are not exported. 
+## Generate XCOFF shared object file.
----------------
I'd probably merge this test into the main test. You can then use the same YAML for all cases.

================
Comment at: llvm/tools/llvm-nm/Opts.td:36
 def undefined_only : FF<"undefined-only", "Show only undefined symbols">;
+def export_symbols : FF<"export-symbols", "Export symbol list for object files or archives.">;
 def version : FF<"version", "Display the version">;
----------------
These options are supposed to be in alphabetical order. Please fix.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:222
+  StringRef Visibility;
   // The Sym field above points to the native symbol in the object file,
   // for Mach-O when we are creating symbols from the dyld info the above
----------------
Restore the blank line before this comment.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:661

-static void sortAndPrintSymbolList(SymbolicFile &Obj, bool printName,
-                                   StringRef ArchiveName,
-                                   StringRef ArchitectureName) {
-  if (!NoSort) {
-    using Comparator = bool (*)(const NMSymbol &, const NMSymbol &);
-    Comparator Cmp;
-    if (NumericSort)
-      Cmp = &compareSymbolAddress;
-    else if (SizeSort)
-      Cmp = &compareSymbolSize;
-    else
-      Cmp = &compareSymbolName;
+using Comparator = bool (*)(const NMSymbol &, const NMSymbol &);

----------------

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:666-671
+  else if (SizeSort)
+    return &compareSymbolSize;
+  else if (ExportSymbols)
+    return &compareSymbolNameVisibility;
+  else
+    return &compareSymbolName;
----------------
No `else` after `return`.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:675
+// The function only work for a sorted SymbolList.
+static void removeDuplicateSortedSymbolList() {
+  if (SymbolList.size() == 0)
----------------
Perhaps `removeAdjacentDuplicates`, then you don't need the comment.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:679
+
+  Comparator lessThan = getSymbolComparator();
+  std::vector<NMSymbol>::const_iterator Iter = SymbolList.begin();
----------------
This is a variable at this point, so should be named as such.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:701
+
+static void printMergedSymbolList() {
+  for (const auto &Sym : SymbolList) {
----------------
Rather than having separate `printMergedSymbolList` and `printSymbolList` which both print the symbols, I think you just need one `printSymbolList` function. The `SymbolList` is cleared in the calling function, where it really belongs, rather than where it is currently cleared at the end of `printSymbolList`. There are then two calling functions, `printAllSymbols`, which builds up a single list of symbols, and is used for this new option you're implementing, and the existing `dumpSymbolNamesFromObject`.

Further, I realise that the format printed for the --export-symbols is actually just another `OutputFormat`, like `posix` or `sysv`. By making the small structural change above, you can then just implement it much like the differences in these formats is implemented, within `printSymbolList`.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:709
   }
+  SymbolList.clear();
+}
----------------
There's no need to clear the `SymbolList` here, as it's only used in one place.

More generally though, I think `SymbolList` should be a local variable passed around rather than a global variable (but that's probably another piece of work).

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1673
+static void exportSymbolsForXCOFF(XCOFFObjectFile *Obj, StringRef ArchiveName) {
+
+  // Skip Shared object file.
----------------
No blank lines at start of function.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1734
+
+    // if the symbol is not in text or data section, not be exported.
+    if (SecIter == Obj->section_end())
----------------

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1805
+      XCOFFObjectFile *XCOFFObj = dyn_cast<XCOFFObjectFile>(&Obj);
+      assert(XCOFFObj && "Not a XCOFF Object file.");
+      exportSymbolsForXCOFF(XCOFFObj, ArchiveName);
----------------

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1884

-  sortAndPrintSymbolList(Obj, printName, ArchiveName, ArchitectureName);
+  if (!NoSort)
+    sortSymbolList();
----------------
I think the if clause should be inside `sortSymbolList`.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:2388

+  if (ExportSymbols) {
+    sortSymbolList();
----------------
This looks like you're going to print both the regular list of symbol names, and the exported list, which isn't what I was expecting. Are you sure it's what you meant to do?

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:109-110
 static bool WithoutAliases;
+static bool ExportSymbols; // The option is xcoff specific option.
+static bool NoRsrc;        // The option is xcoff specific option.

----------------
jhenderson wrote:
> 
Add a blank line before the comment (as per the suggested edit)

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:667

+void printExportSymboList(raw_fd_ostream &OS) {
+  if (SymbolList.size() == 0)
----------------
DiggerLin wrote:
> DiggerLin wrote:
> > jhenderson wrote:
> > > It seems to me like you're doing several separate things in this function which are different to llvm-nm's defaults:
> > > 
> > > 1) Sorting based on the visibility
> > > 2) Filtering out duplicates
> > > 3) Filtering out symbols with certain visibilities
> > > 4) Printing name and visibility columns (and not other columns like type code).
> > > 5) Merging input file symbols into a single list.
> > > 
> > > I feel like you could achieve this in a more natural way, without needing to have a completely independent method for printing the symbols. Specifically, you could implement each of these as individual new functionality in the core logic (possibly with a user-facing option per new functionality), which "--export-symbols" then specifies automatically.
> > > 
> > > For example:
> > > 1) The symbol visibility sorting is just another sorting mode, like --size-sort, so could be added at the point where --size-sort does something.
> > > 2) Duplicate removal would be done based on all to-be-printed pieces of information, and could be done when printing there instead.
> > > 3) Visibility filtering could be done at the same time as e.g. weak filtering.
> > > 4) Column printing would be controlled by a series of flags (some on by default, others off by default), which control whether each column would be printed.
> > > 5) Input file merging could be common behaviour.
> > > 
> > > The advantage of dividing things up like this is that you can implement and test each of them independently of each other, which makes reviewing much easier. Additionally, by adding some or all of these as user-level switches, it allows users to get exactly the symbols they want in the format they want.
> > I think over your suggestion.  My opinion is that we should  adding new functionality or options only when it is required.
> > 
> > > Input file merging could be common behavior.
> > 
> >   I  added new function PrintMergedSymbolList() (which only print symbol name and visibility(if there is)) without adding the new option "--merge-output" .  If anyone or other object file format also need the functionality only, we can add a new patch for new option at that time  to on/off the function PrintMergedSymbolList()  and deciding which others fields need to be printed out at that time.
> > 
> > > Duplicate removal would be done based on all to-be-printed pieces of information, and could be done when printing there instead.
> > 
> >   if we add a new option  "--no-duplicate" to remove the duplication symbol, the option is no sense to a single xcoff object file(I think a single object file never has duplication symbol).  the option "--no-duplicate" only has sense on the merging output together.  I  added a new function removeDuplicateSortedSymbolList() and we can add a new option "--no-duplicate" in later patch to on/off removeDuplicateSortedSymbolList() if it is need.
> > 
> > > Column printing would be controlled by a series of flags 
> > 
> > I think this is a good idea, we can add the functionality in a separate and not urgent patch,  and current patch will not depend on it.
> > 
> > > The symbol visibility sorting is just another sorting mode, like --size-sort, so could be added at the point where --size-sort does something.
> > 
> >   I add a new functionality compareSymbolNameVisibility, it only enable on the --option "--export-symbols" . if elf object file also need the functionality,  we can add a new option to enable compareSymbolNameVisibility() at that time.
> > 
> > the option will  "--export-symbols" will enable the three functions at same times in current patch. 
> > I think other new options only need to be added when not all of the three functions need to be enabled at the same time. 
> > 
> > > Visibility filtering could be done at the same time as e.g. weak filtering. 
> >   Visibility only be printed at "export symbol list" and it is always need to be printed out if there is.  I will add new option "Visibility filtering"  in later patch only there is requirement.  
> > 
> > 
> sorry , it should be
> "My opinion is that we should add new functionality or options only when it is need."
> in above comment.
To be clear, I'm not suggesting you need the actual user flags, but if you wrote the code as if those flags existed, I think the logic would be much cleaner (I see this is more-or-less what you've done, which is great).

> I think this is a good idea, we can add the functionality in a separate and not urgent patch, and current patch will not depend on it.

See my comment below about `OutputFormat` actually (no need necessarily to implement a front-end switch, but I think the functionality of printing symbol name + visibility is just another kind of that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112735/new/

https://reviews.llvm.org/D112735