[PATCH] D120357: [llvm-nm]add helper function to print out the object file name, archive name, architecture name

Wed Mar 2 22:30:52 PST 2022

jhenderson added inline comments.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1745

-static void dumpSymbolNamesFromObject(SymbolicFile &Obj, bool printName,
-                                      StringRef ArchiveName = {},
-                                      StringRef ArchitectureName = {}) {
-  if (!shouldDump(Obj))
-    return;
-
-  if (ExportSymbols && Obj.isXCOFF()) {
-    XCOFFObjectFile *XCOFFObj = cast<XCOFFObjectFile>(&Obj);
-    getXCOFFExports(XCOFFObj, ArchiveName);
-    return;
-  }
-
+static bool getSymbolNamesFromObject(SymbolicFile &Obj, bool &IsSymbolsEmpty) {
   auto Symbols = Obj.symbols();
----------------
Add a comment to the function explaining the meaning of the return value.

================
Comment at: llvm/tools/llvm-nm/llvm-nm.cpp:1889
+  CurrentFilename = Obj.getFileName();
+  if (IsSymbolsEmpty && SymbolList.empty() && !Quiet) {
     writeFileName(errs(), ArchiveName, ArchitectureName);
----------------
DiggerLin wrote:
> jhenderson wrote:
> > DiggerLin wrote:
> > > jhenderson wrote:
> > > > Why have you just introduced this change? How is it better than the previous code?
> > > yes, in the original code of function getSymbolNamesFromObject( )
> > > 
> > > ```
> > >   auto Symbols = Obj.symbols();
> > >   .....
> > >  
> > >   if (DynamicSyms) {
> > >     const auto *E = dyn_cast<ELFObjectFileBase>(&Obj);
> > >     if (!E) {
> > >       error("File format has no dynamic symbol table", Obj.getFileName());
> > >       return;
> > >     }
> > >   Symbols = E->getDynamicSymbolIterators();
> > > ```
> > >  The value of Symbols be changed by the line 
> > > Symbols = E->getDynamicSymbolIterators();
> > >  Symbols  maybe not the equal to  Obj.symbols()
> > > 
> > > so we can not use the 
> > > if ( Obj.symbols().empty() && SymbolList.empty() && !Quiet) 
> > Thanks, I see. I think I'd prefer it if you pulled the logic on whether or not an object has symbols into a separate function (maybe `hasSymbols`) which takes `Obj` and returns the value based on the appropriate set of symbols for whether `DynamicSyms` is specified or not. You probably would then benefit from a separate helper function used by this new function and `getSymbolNamesFromObject` that returns the dynamic symbols list (or an empty list if there was an error). Something like the following:
> > 
> > ```
> > static bool hasSymbols(SymbolicFile &Obj) {
> >   if (DynamicSyms)
> >     return !getDynamicSyms(Obj).empty();
> >   return !Obj.symbols().empty();
> > }
> > 
> > static <insert return type here> getDynamicSyms(SymbolFile &Obj) {
> >   const auto *E = dyn_cast<ELFObjectFileBase>(&Obj);
> >   if (!E) {
> >     error("File format has no dynamic symbol table", Obj.getFileName());
> >     return {};
> >   }
> >   return E->getDynamicSymbolIterators();
> > }
> > ```
> as your suggestion,  the function getDynamicSyms() will be call twice .
> 1. One is in the function
> getSymbolNamesFromObject
> 
> 2 . the other is in hasSymbols() which will be called in the function
> dumpSymbolNamesFromObject() 
> 
> if there is error.
> 
> > if (!E) {
> >     error("File format has no dynamic symbol table", Obj.getFileName());
> >     return {};
> >   }
> 
> the error info will be print out twice.
> 
> if I keep the code in the 
> as 
> 
> ```
> static bool getSymbolNamesFromObject(SymbolicFile &Obj, bool &IsSymbolsEmpty) {
>   auto Symbols = Obj.symbols();
>   std::vector<VersionEntry> SymbolVersions;
>   if (DynamicSyms) {
>     const auto *E = dyn_cast<ELFObjectFileBase>(&Obj);
>     if (!E) {
>       error("File format has no dynamic symbol table", Obj.getFileName());
>       return false;
>     }
>     Symbols = E->getDynamicSymbolIterators();
> ```
>   
>  there is no advantage of adding a two helper function.
>     
> I do not think adding a new parameter "bool &IsSymbolsEmpty" is a big problem.
> 
> 
> the error info will be print out twice.

Fair point. My gut says the problem is that this error is reported at too low a level in the code, and that the right way to fix this would be to send errors back up the stack via `Expected` and `Error`, with it being reported near the top-level of the code, such that an error effectively aborts dumping of one object, but not all objects. I acknowledge that this is yet another change, although I think the cost of it wouuld be fairly small. If you don't want to go that far, I understand (I'd offer to do it myself, but don't really have the time). However, in that case, I think we still need a different solution, because the new input argument is yuck. A partial implementation of the above (using Expected/Error) might be to have `getDynamicSyms` return an `Expected`, and then in `getSymbolNamesFromObject`, if the `Expected` is in a failed state, call `error` there, while in `hasSymbols` simply do `consumeError`. If you do this, I'd add a TODO: comment in the latter case saying the error is already reported elsewhere, but really it should be propagated up the stack, to avoid duplicate errors.

> I do not think adding a new parameter "bool &IsSymbolsEmpty" is a big problem.

Sorry, but I disagree. (Related aside: I'm not really all that happy about the return type change in this patch either, but I understand it's a step towards a later goal). With your change, you're making this function do at least four different things, 1) get all the object's symbols 2) filter out the ones you care about, 3) tell the caller whether the symbols could be loaded from the object, and 4) tell the caller whether the object has any symbols. Yes, some of these were there before, but adding more to that set is not ideal, when there are alternative approaches. In clean code, it is generally preferable for each function to try to do only one thing: this makes it easier to understand what any given function is doing, making it more maintanable and easier to use by other developers in the future.

Additionally, the use of a reference argument purely for setting a secondary return type is not great for various reasons: 1) it forces calling code to provide an input boolean that is set, even if the caller doesn't care about the parameter; 2) related to 1), the caller has to define a boolean variable before the function is called, preventing the function from being used cleanly within e.g. if clauses (the boolean input argument has to be defined before the if check); 3) it raises a question of "why is one returned and the other an argument?"; 4) it raises another question: "does my boolean need to be initialised or not?"

Another side-advantage with the extra helper function is that this reduces the size of `getSymbolNamesFromObject`. Again, the simpler a function is, the easier it is to understand.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120357/new/

https://reviews.llvm.org/D120357