[PATCH] D116787: [llvm-readobj][MachO] Add option to sort the symbol table before dumping (MachO only, for now).

Mon Mar 28 00:17:36 PDT 2022

jhenderson added inline comments.

================
Comment at: llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml:23-31
+# TYPE-NAME-NEXT:  Symbol {
+# TYPE-NAME-NEXT:    Name: _a (19)
+# TYPE-NAME-NEXT:    Type: Section (0xE)
+# TYPE-NAME-NEXT:    Section: __data (0x3)
+# TYPE-NAME-NEXT:    RefType: UndefinedNonLazy (0x0)
+# TYPE-NAME-NEXT:    Flags [ (0x0)
+# TYPE-NAME-NEXT:    ]
----------------
oontvoo wrote:
> jhenderson wrote:
> > Related to my above comment re. formatting, we don't need to show that the entire symbol printing is correct - that's the responsibility of a test that is testing the `--symbols` option rather than the `--sort-symbols` option. Instead, I'd just check the name and type fields. Something like this:
> > 
> > ```
> > TYPE-NAME:      Name: _a
> > TYPE-NAME-NEXT: Type: Section
> > TYPE-NAME:      Name: _d
> > TYPE-NAME-NEXT: Type: Section
> > ```
> i can drop this in the other test (the NAME-TYPE one) but this was needed here to ensure the whitespace padding was done correctly.
Whitespace padding isn't relevant to this test: the formatting should be tested in a basic symbol printing test, not in a test that is about the `--sort-symbols` option. Besides: without the `--strict-whitespace` and `--match-full-lines` FileCheck options, this doesn't actually verify the whitespace properly.

================
Comment at: llvm/tools/llvm-readobj/MachODumper.cpp:637
+                                         SymbolRange.end());
+    stable_sort(SortedSymbols, *SymComp);
+    for (SymbolRef Symbol : SortedSymbols)
----------------
We usually explicitly use `llvm::` prefix for `std::` algorithm variations like this, to show it's an LLVM extension we're making use of. It also facilitates find and replace, should a `std::` version ever be added in the future.

================
Comment at: llvm/tools/llvm-readobj/MachODumper.cpp:647

-void MachODumper::printDynamicSymbols() {
+void MachODumper::printDynamicSymbols(const SymbolComparator * /*unused*/) {
   ListScope Group(W, "DynamicSymbols");
----------------
I might be mistaken, but I believe LLVM usually doesn't bother commenting out unused parameter names.

================
Comment at: llvm/tools/llvm-readobj/MachODumper.cpp:623
+    const llvm::SmallVector<opts::SortSymbolKeyTy> *SortKeys) {
+
+  if (SortKeys && !SortKeys->empty()) {
----------------
oontvoo wrote:
> jhenderson wrote:
> > Nit: don't start functions with blank lines.
> > 
> > I don't this is the place to be adding the predicates, because much of this code will end up being duplicated when other format support is added. Instead, I recommend moving it to just after the code that creates the ObjDumper class in llvm-readobj.cpp, and use virtual functions for the predicates, using std::bind tto handle the fact that they're member functions.
> > 
> > If you do that, there's no need for the sort key enum. Instead, you could delay processing the command-line option until you need it to identify which predicates to add.
> > If you do that, there's no need for the sort key enum. Instead, you could delay processing the command-line option until you need it to identify which predicates to add.
> 
> Implemented the virtual func parts as suggested but still keepng the enum (albeit local/static now) because it's weird to move the opts processing part outside of the `parseOptions`  function.
> 
> 
> 
Yeah, I understand the concern. The downside with keeping it separate is that you have to effectively repeat the switch/case involved in option parsing in two places, which leads to ugly warts like the need for the `assert(false)` in the second one. Up to you though.

================
Comment at: llvm/tools/llvm-readobj/ObjDumper.h:99

+  // Symbols comparisons.
+  virtual bool canCompareSymbols() const { return false; }
----------------

================
Comment at: llvm/tools/llvm-readobj/llvm-readobj.cpp:90
+  TYPE = 1,
+  UNSPEC = 100,
+  // TODO: add ADDRESS, SIZE as needed.
----------------
`UNSUPPORTED` or `UNKNOWN` instead of `UNSPEC`.

================
Comment at: llvm/tools/llvm-readobj/llvm-readobj.cpp:359
   ObjDumper *Dumper;
+  std::unique_ptr<SymbolComparator> SymComp;
   Expected<std::unique_ptr<ObjDumper>> DumperOrErr = createDumper(Obj, Writer);
----------------
`llvm::Optional` would be the more expressive form here. You could then pass it directly, rather than via the pointer, and have a `None` check instead of a `nullptr` check.

================
Comment at: llvm/tools/llvm-readobj/llvm-readobj.cpp:371
+        case NAME:
+          SymComp->addPredicate([Dumper](SymbolRef LHS, SymbolRef RHS) -> bool {
+            return Dumper->compareSymbolsByName(LHS, RHS);
----------------
Here and below, I don't think you need the trailing return types.

================
Comment at: llvm/tools/llvm-readobj/llvm-readobj.cpp:380
+          break;
+        default:
+          assert(false && "unsupported sort key");
----------------
Use `case UNSPEC` (renamed according to my earlier comment) here, instead of `default`, to take advantage of compiler warnings about not all cases being filled. See also https://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations.

================
Comment at: llvm/tools/llvm-readobj/llvm-readobj.cpp:381
+        default:
+          assert(false && "unsupported sort key");
+        }
----------------
Use `llvm_unreachable` rather than `assert(false)`.

================
Comment at: llvm/tools/llvm-readobj/llvm-readobj.cpp:386
+    } else {
+      error("--sort-symbols is not supported yet for this format.");
+    }
----------------
1) Error and warning messages shouldn't end with a "."
2) I have concerns here: this will stop dumping as soon as an unsupported file type is encountered (even in an archive), yet that is unlikely what the user wants to happen. They more likely want to continue dumping and just not sort in this case (imagine if they had mutliple different file format objects in their input). This should be at most a warning (including the input file name), although I could even see an argument for not having it at all. It also needs testing.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116787/new/

https://reviews.llvm.org/D116787