[PATCH] D112735: export unique symbol list with llvm-nm new option "--export-symbols"

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 8 01:58:20 PST 2022


jhenderson added a comment.

In D112735#3297519 <https://reviews.llvm.org/D112735#3297519>, @DiggerLin wrote:

>   I created a refactor patch for above and rebase current patch based on it.

Thanks.

>>>> I think you should also do a refactor whereby SymbolList is no longer a global variable - having it as a global variable is confusing, at best, and at worst risks cases where it is still populated when it shouldn't be.
>>>
>>>   for the export list, I think we need a global variable , otherwise we need a variable in main() , and then pass into each function.  The llvm-nm is c style code, I do not think we can do all the refactor in this patch.   if you have have good idea, please let me know, I will create  a separate refactor patch after current patch land.
>>
>> My suggested design I already provided in my previous comment actively needs to avoid the global variable - the proposed `getSymbolsFromObject` function would return this list for use by the respective calling functions. Global variables make code re-use harder, and lead to temporal dependencies, both of which this patch is inflicted with, in part due to the global `SymbolList`. Doing the refactor up-front would significantly improve this patch.
>
>   Since we have not reach an agreement on the getSymbolsFromObject() , I am prefer to  do "removing the global variable SymbolList" in a later separate refactor patch. Our team is been blocking on the "export symbol list", I understand your concern about the quality of the code,  but I do not think "removing a global variable SymbolList" is in the scope of the patch. 



>>> 2. printExportSymbolList loops over all inputs and calls getSymbolsFromObject , I think it need to share the code the dumpSymbolNamesFromFile. and we need to added a additional  if(ExportSymbols) in the dumpSymbolNamesFromFile to go to different function .
>>
>> I'm not sure I follow. In my proposed design, `dumpSymbolNamesFromFile` should never be touched by the export symbols code route, only `getSymbolsFromObject`.
>
>   when printExportSymbolList loops over all inputs ,  we still need to get object files from archive or from  MachOUniversalBinary, TapiUniversal,  we need the functionality  in  "dumpSymbolNamesFromFile" ,  so we still need the dumpSymbolNamesFromFile() for printExportSymbolList().

If we need the object fetching code from `dumpSymbolNamesFromFile` then just pull that code into a separate function too... Here's what I'm thinking:

  // Class for storing the binary and passing around associated properties, in case it's an object.
  struct NMObject {
    std::unique_ptr<Binary> Bin; // Possibly any other members that might be necessary to store archive members.
    StringRef ArchiveName;
    StringRef ArchitectureName;
  };
  
  static std::vector<NMSymbol> getSymbolsFromDLInfoMachO(MachOObjectFile &MachO) {
    std::vector<NMSymbol> Symbols;
    // As original implementation of dumpSymbolsFromDLInfoMachO, but replace all references to SymbolList with Symbols.
    return Symbols;
  }
  
  static std::vector<NMSymbol> getSymbolsFromObject(NMObject &Obj) {
    std::vector<NMSymbol> Symbols;
    // Parts of original dumpSymbolNamesFromObject that get the symbols that are to be printed.
    // Filtering will be done later, rather than here, i.e. don't add the exportSymbolsForXCOFF call here.
    return Symbols;
  }
  
  static void dumpSymbolNamesFromObject(NMObject &Obj) {
    std::vector<NMSymbol> Symbols = getSymbolsFromObject(Obj);
    Symbols = sortSymbolList(std::move(Symbols)); // Or sort in place
    printSymbolList(Symbols, Obj, printName);
  }
  
  static std::vector<NMObject> getObjectsFromFile(StringRef InputFile) {
    std::vector<NMObject> Objects;
    /*
      Code from dumpSymbolNamesFromFile which retrieves the objects (and archive properties, if appropriate) inside a binary (may be a single object, or many).
      Emplace an NMObject in the vector for each such constructed object, instead of calling dumpSymbolNamesFromObject.
    */
    return Objects;
  }
  
  static void dumpSymbolNamesFromFile(StringRef InputFile) {
    std::vector<NMObject> Objects = getObjectsFromFile(InputFile);
    llvm::for_each(Objects, dumpSymbolNamesFromObject);
  }
  
  std::vector<NMSymbol> getExportSymbols(ArrayRef<NMSymbol> Symbols, NMObject &Obj) {
    std::vector<NMSymbol> ExportSymbols;
    if (auto *XCOFF = dyn_cast<XCOFFObjectFile>(Obj.Bin)) {
      // Do what's necessary to find the symbols appropriate for XCOFF export symbols.
    }
    // TODO: Add implementations for other objects here. Possibly warn otherwise.
    return ExportSymbols;
  }
  
  static void dumpExportSymbolList(ArrayRef<StringRef> InputFilenames) {
    std::vector<NMSymbol> Symbols;
    for(StringRef InputFile : InputFilenames) {
      std::vector<NMObject> FileObjects = getObjectsFromFile(InputFile);
      for(NMObject &Object : FileObjects) {
        std::vector<NMSymbol> ObjSymbols = getSymbolsFromObject(InputFile);
        ObjSymbols = getExportSymbols(ObjSymbols, Object);
        // Add ObjSymbols to the end of Symbols.
      }
    }
    Symbols = sortSymbolList(Symbols); // Or sort in place
    SymbolList.erase(std::unique(Symbols.begin(), Symbols.end()), Symbols.end());
    printExportSymbols(Symbols);
  }
  
  int main() {
    /* ... */
    if(ExportSymbols)
      dumpExportSymbolList(InputFilenames);
    else
      llvm::for_each(InputFilenames, dumpSymbolNamesFromFile);
  }

I've not tested this structure out, but I'm pretty confident that it, likely with some small tweaks is both a) not a massive departure from the current structure (meaning we don't need to rewrite large amounts of code), and b) will resolve the structural issues with this patch.



================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:243
 
+  bool setSymFlag(const SymbolicFile &Obj) {
+    Expected<uint32_t> SymFlagsOrErr = Sym.getFlags();
----------------
I'd rename this function, to clarify the intent - `setSymFlag` implies it's changing from one value to another, whereas `initializeFlags` shows that we're putting them into their proper initial state, and that it's bad if things go wrong.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:254
+
+  bool shouldSkipPrintingSymbol() const {
+    bool Undefined = SymFlags & SymbolRef::SF_Undefined;
----------------
It's easier to work with positives, so I'd suggest renaming this `shouldPrint`, and invert the logic.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:285
+bool operator==(const NMSymbol &A, const NMSymbol &B) {
+  return !(A < B) && !(A < B);
+}
----------------



================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:702
 
+static void printSymbolList() {
+  for (const NMSymbol &Sym : SymbolList) {
----------------
Maybe to clarify intent, I'd call this `printExportSymbolList`.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1716-1723
+      bool HasNonNum = false;
+      for (unsigned int i = 2; i < SymName.size() - 2; i++)
+        if (SymName[i] < '0' || SymName[i] > '9') {
+          HasNonNum = true;
+          break;
+        }
+      if (!HasNonNum)
----------------
This can be simplified, I believe as per the inline edit.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1718
+      for (unsigned int i = 2; i < SymName.size() - 2; i++)
+        if (SymName[i] < '0' || SymName[i] > '9') {
+          HasNonNum = true;
----------------
Use `llvm::isDigit`


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1743
+        S.Visibility = "protected";
+      if ((SymType & XCOFF::VISIBILITY_MASK) == XCOFF::SYM_V_EXPORTED)
+        S.Visibility = "export";
----------------
Consider making this `else if`, for clarity.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1784-1788
+    if (ExportSymbols && Obj.isXCOFF()) {
+      XCOFFObjectFile *XCOFFObj = cast<XCOFFObjectFile>(&Obj);
+      exportSymbolsForXCOFF(XCOFFObj, ArchiveName);
+      return;
+    }
----------------
This code doesn't really belong here, as it's just filtering out the export symbols. Later code will already retrieve XCOFf symbols without needing to do anything extra - we should filter that set of symbols after that point.

This block here is also a good example of why the current implementation in this patch is not a good structure. It looks like none of the code before this block is relevant. The `if (DynamicSyms)` block isn't relevant, becuase that's only for ELF. The `if (!SegSect.empty() && MachO)` block is also irrelevant, since an object cannot be both MachO and XCOFF. That also implies that `if (!(MachO && DyldInfoOnly))` is always true, if we are XCOFF. Nothing after this block is relevant either, since it returns unconditionally. This shows that this function is not the place for this piece of code.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1849
       S.Sym = Sym;
-      SymbolList.push_back(S);
+      if (S.setSymFlag(Obj))
+        SymbolList.push_back(S);
----------------
I don't understand the need for this addition.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112735/new/

https://reviews.llvm.org/D112735



More information about the llvm-commits mailing list