[PATCH] D131589: [llvm-objdump] Handle multiple syms at same addr in disassembly.

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 12 00:40:07 PDT 2022


jhenderson added inline comments.


================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1505-1508
+        for (const SymbolInfoTy &Symbol : SymbolsHere)
+          DemangledSymNamesHere.push_back(demangle(Symbol.Name.str()));
+        for (const std::string &DemangledName : DemangledSymNamesHere)
+          SymNamesHere.push_back(DemangledName);
----------------
You can do this, right?


================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1501
+      // Get the demangled names of all those symbols.
+      std::vector<std::string> SymNamesHere;
+      for (const SymbolInfoTy &Symbol : SymbolsHere) {
----------------
simon_tatham wrote:
> jhenderson wrote:
> > Can this be a vector of `StringRef` to save the copying?
> Not trivially, because in the case where we have to demangle names, `demangle()` returns a newly made string and we have to have somewhere to store it.
> 
> I've changed it so that we have the vector of `StringRef` you wanted, and //also// a vector of `std::string` which is left empty when not in demangling mode, and the StringRefs point at the local vector or the original name elsewhere, as appropriate.
I'd forgotten about the `demangle` aspect of this. As such, I'm not fussed whichever way you prefer to do it.


================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1655
       if (Obj.isELF() && !DisassembleAll && Section.isText()) {
-        uint8_t SymTy = Symbols[SI].Type;
-        if (SymTy == ELF::STT_OBJECT || SymTy == ELF::STT_COMMON) {
-          dumpELFData(SectionAddr, Index, End, Bytes);
-          Index = End;
+        for (const SymbolInfoTy &Symbol : SymbolsHere) {
+          uint8_t SymTy = Symbol.Type;
----------------
simon_tatham wrote:
> jhenderson wrote:
> > Is there a subtle behaviour change here if you have multiple symbols at the same address but different types (i.e. one is STT_OBJECT and one isn't, e.g. STT_FUNC)?
> Potentially, yes. Previously, `llvm-objdump` would pick just one of the symbols defined at the address, and base its decision on that symbol alone. With this patch, it will go through all of them, and spots any STT_OBJECT even if it's not the symbol last in the sorted list.
> 
> This is just the sort of thing I hoped to have a useful discussion about in order to decide what the behaviour //should// be, to avoid the risk of writing oodles of code to implement a complicated policy that we had no consensus on :-) so thanks for flagging it up.
> 
> What do you think we //should// do if an STT_FUNC and an STT_OBJECT occur at the same address? llvm-objdump's existing policy  doesn't look particularly deliberate to me – it's an artefact of the code's previous lack of attention to collocated symbols. Perhaps it's nonetheless best to stick with the existing policy just for stability's sake, but if so, I'd prefer that we'd discussed other options before deciding that.
> 
> Other possibilities that spring to mind are to deliberately make STT_OBJECT highest priority (which is what's happening in this version of the code), or to make it lowest priority, or to choose based on some criterion like symbol index in the ELF file (go with whichever symbol was first/last in the actual object file's symtab). And maybe, whichever of those we do, emit a warning that flags up that we had to make an arbitrary decision that could have gone the other way.
> 
> What do you think?
> 
> (PS I hope you're not going to like the symtab index idea, because that information isn't preserved at all in `SymbolInfoTy` so it would take a load more plumbing :-)
Looking at the comment block, my instinct says we should treat STT_FUNC as higher priority (possibly assuming it hasn't got size 0) and do regular disassembly. Having an STT_OBJECT/STT_COMMON symbol at the same address as an STT_FUNC symbol sounds like it's unlikely to ever occur in practice ("this code represents both a function and some data??"). If it does, I think it's reasonable to pick one style somewhat arbitrarily. The user can use `--disassemble-symbol` of the STT_OBJECT symbol if they want to disassemble it as data in this case, I think.

I'm less clear on the second block below about ARM mapping symbols, because I'm not familiar with how ARM mapping symbols are used, and therefore don't think I can make an informed decision on the right approach there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131589/new/

https://reviews.llvm.org/D131589



More information about the llvm-commits mailing list