[PATCH] D112735: export unique symbol list with llvm-nm new option "--export-symbols"

Digger Lin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 8 12:41:01 PST 2022


DiggerLin added a comment.

In D112735#3303843 <https://reviews.llvm.org/D112735#3303843>, @jhenderson wrote:

> In D112735#3297519 <https://reviews.llvm.org/D112735#3297519>, @DiggerLin wrote:
>
>>   I created a refactor patch for above and rebase current patch based on it.
>
> Thanks.
>
>>>>> I think you should also do a refactor whereby SymbolList is no longer a global variable - having it as a global variable is confusing, at best, and at worst risks cases where it is still populated when it shouldn't be.
>>>>
>>>>   for the export list, I think we need a global variable , otherwise we need a variable in main() , and then pass into each function.  The llvm-nm is c style code, I do not think we can do all the refactor in this patch.   if you have have good idea, please let me know, I will create  a separate refactor patch after current patch land.
>>>
>>> My suggested design I already provided in my previous comment actively needs to avoid the global variable - the proposed `getSymbolsFromObject` function would return this list for use by the respective calling functions. Global variables make code re-use harder, and lead to temporal dependencies, both of which this patch is inflicted with, in part due to the global `SymbolList`. Doing the refactor up-front would significantly improve this patch.
>>
>>   Since we have not reach an agreement on the getSymbolsFromObject() , I am prefer to  do "removing the global variable SymbolList" in a later separate refactor patch. Our team is been blocking on the "export symbol list", I understand your concern about the quality of the code,  but I do not think "removing a global variable SymbolList" is in the scope of the patch. 
>
>
>
>>>> 2. printExportSymbolList loops over all inputs and calls getSymbolsFromObject , I think it need to share the code the dumpSymbolNamesFromFile. and we need to added a additional  if(ExportSymbols) in the dumpSymbolNamesFromFile to go to different function .
>>>
>>> I'm not sure I follow. In my proposed design, `dumpSymbolNamesFromFile` should never be touched by the export symbols code route, only `getSymbolsFromObject`.
>>
>>   when printExportSymbolList loops over all inputs ,  we still need to get object files from archive or from  MachOUniversalBinary, TapiUniversal,  we need the functionality  in  "dumpSymbolNamesFromFile" ,  so we still need the dumpSymbolNamesFromFile() for printExportSymbolList().
>
> If we need the object fetching code from `dumpSymbolNamesFromFile` then just pull that code into a separate function too... Here's what I'm thinking:
>
>   // Class for storing the binary and passing around associated properties, in case it's an object.
>   struct NMObject {
>     std::unique_ptr<Binary> Bin; // Possibly any other members that might be necessary to store archive members.
>     StringRef ArchiveName;
>     StringRef ArchitectureName;
>   };
>   
>   static std::vector<NMSymbol> getSymbolsFromDLInfoMachO(MachOObjectFile &MachO) {
>     std::vector<NMSymbol> Symbols;
>     // As original implementation of dumpSymbolsFromDLInfoMachO, but replace all references to SymbolList with Symbols.
>     return Symbols;
>   }
>   
>   static std::vector<NMSymbol> getSymbolsFromObject(NMObject &Obj) {
>     std::vector<NMSymbol> Symbols;
>     // Parts of original dumpSymbolNamesFromObject that get the symbols that are to be printed.
>     // Filtering will be done later, rather than here, i.e. don't add the exportSymbolsForXCOFF call here.
>     return Symbols;
>   }
>   
>   static void dumpSymbolNamesFromObject(NMObject &Obj) {
>     std::vector<NMSymbol> Symbols = getSymbolsFromObject(Obj);
>     Symbols = sortSymbolList(std::move(Symbols)); // Or sort in place
>     printSymbolList(Symbols, Obj, printName);
>   }
>   
>   static std::vector<NMObject> getObjectsFromFile(StringRef InputFile) {
>     std::vector<NMObject> Objects;
>     /*
>       Code from dumpSymbolNamesFromFile which retrieves the objects (and archive properties, if appropriate) inside a binary (may be a single object, or many).
>       Emplace an NMObject in the vector for each such constructed object, instead of calling dumpSymbolNamesFromObject.
>     */
>     return Objects;
>   }
>   
>   static void dumpSymbolNamesFromFile(StringRef InputFile) {
>     std::vector<NMObject> Objects = getObjectsFromFile(InputFile);
>     llvm::for_each(Objects, dumpSymbolNamesFromObject);
>   }
>   
>   std::vector<NMSymbol> getExportSymbols(ArrayRef<NMSymbol> Symbols, NMObject &Obj) {
>     std::vector<NMSymbol> ExportSymbols;
>     if (auto *XCOFF = dyn_cast<XCOFFObjectFile>(Obj.Bin)) {
>       // Do what's necessary to find the symbols appropriate for XCOFF export symbols.
>     }
>     // TODO: Add implementations for other objects here. Possibly warn otherwise.
>     return ExportSymbols;
>   }
>   
>   static void dumpExportSymbolList(ArrayRef<StringRef> InputFilenames) {
>     std::vector<NMSymbol> Symbols;
>     for(StringRef InputFile : InputFilenames) {
>       std::vector<NMObject> FileObjects = getObjectsFromFile(InputFile);
>       for(NMObject &Object : FileObjects) {
>         std::vector<NMSymbol> ObjSymbols = getSymbolsFromObject(InputFile);
>         ObjSymbols = getExportSymbols(ObjSymbols, Object);
>         // Add ObjSymbols to the end of Symbols.
>       }
>     }
>     Symbols = sortSymbolList(Symbols); // Or sort in place
>     SymbolList.erase(std::unique(Symbols.begin(), Symbols.end()), Symbols.end());
>     printExportSymbols(Symbols);
>   }
>   
>   int main() {
>     /* ... */
>     if(ExportSymbols)
>       dumpExportSymbolList(InputFilenames);
>     else
>       llvm::for_each(InputFilenames, dumpSymbolNamesFromFile);
>   }
>
> I've not tested this structure out, but I'm pretty confident that it, likely with some small tweaks is both a) not a massive departure from the current structure (meaning we don't need to rewrite large amounts of code), and b) will resolve the structural issues with this patch.

thanks for your sample code, I agree with you, the patch is quite large now. if I implement as your suggestion in the patch , I believe that we will need another refactor patch which split the dumpSymbolNamesFromFile function into several small function before the patch. Our patch https://reviews.llvm.org/D119147 is depend on the patch. My suggestion is that I will do refactor as your suggestion after the current patch land and at least let the change code in XCOFFObjectFile.cpp , test case, llvm-nm.rst etc landed, and we can focus on refactoring the llvm-nm.cpp.

if you strong insisted that we should do as your suggestion in the patch first, I can do it and postpones our schedule.   @hubert.reinterpretcast .



================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:243
 
+  bool setSymFlag(const SymbolicFile &Obj) {
+    Expected<uint32_t> SymFlagsOrErr = Sym.getFlags();
----------------
jhenderson wrote:
> I'd rename this function, to clarify the intent - `setSymFlag` implies it's changing from one value to another, whereas `initializeFlags` shows that we're putting them into their proper initial state, and that it's bad if things go wrong.
thanks


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:254
+
+  bool shouldSkipPrintingSymbol() const {
+    bool Undefined = SymFlags & SymbolRef::SF_Undefined;
----------------
jhenderson wrote:
> It's easier to work with positives, so I'd suggest renaming this `shouldPrint`, and invert the logic.
thanks


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:285
+bool operator==(const NMSymbol &A, const NMSymbol &B) {
+  return !(A < B) && !(A < B);
+}
----------------
jhenderson wrote:
> 
thanks


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1716-1723
+      bool HasNonNum = false;
+      for (unsigned int i = 2; i < SymName.size() - 2; i++)
+        if (SymName[i] < '0' || SymName[i] > '9') {
+          HasNonNum = true;
+          break;
+        }
+      if (!HasNonNum)
----------------
jhenderson wrote:
> This can be simplified, I believe as per the inline edit.
> This can be simplified, I believe as per the inline edit.




================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1716-1723
+      bool HasNonNum = false;
+      for (unsigned int i = 2; i < SymName.size() - 2; i++)
+        if (SymName[i] < '0' || SymName[i] > '9') {
+          HasNonNum = true;
+          break;
+        }
+      if (!HasNonNum)
----------------
DiggerLin wrote:
> jhenderson wrote:
> > This can be simplified, I believe as per the inline edit.
> > This can be simplified, I believe as per the inline edit.
> 
> 
thanks for simplify


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1784-1788
+    if (ExportSymbols && Obj.isXCOFF()) {
+      XCOFFObjectFile *XCOFFObj = cast<XCOFFObjectFile>(&Obj);
+      exportSymbolsForXCOFF(XCOFFObj, ArchiveName);
+      return;
+    }
----------------
jhenderson wrote:
> This code doesn't really belong here, as it's just filtering out the export symbols. Later code will already retrieve XCOFf symbols without needing to do anything extra - we should filter that set of symbols after that point.
> 
> This block here is also a good example of why the current implementation in this patch is not a good structure. It looks like none of the code before this block is relevant. The `if (DynamicSyms)` block isn't relevant, becuase that's only for ELF. The `if (!SegSect.empty() && MachO)` block is also irrelevant, since an object cannot be both MachO and XCOFF. That also implies that `if (!(MachO && DyldInfoOnly))` is always true, if we are XCOFF. Nothing after this block is relevant either, since it returns unconditionally. This shows that this function is not the place for this piece of code.
agree with you , but I do not think there is better choice in current structure.


================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1849
       S.Sym = Sym;
-      SymbolList.push_back(S);
+      if (S.setSymFlag(Obj))
+        SymbolList.push_back(S);
----------------
jhenderson wrote:
> I don't understand the need for this addition.
without setSymFlag(Obj).  the S do not have a value SymFlags, it will crash at function
 bool shouldSkipPrinting()


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112735/new/

https://reviews.llvm.org/D112735



More information about the llvm-commits mailing list