[PATCH] D131589: [llvm-objdump] Handle multiple syms at same addr in disassembly.

Thu Aug 18 00:27:07 PDT 2022

jhenderson added inline comments.

================
Comment at: llvm/test/tools/llvm-objdump/ELF/data-vs-code-priority.test:17-20
+# CHECK: movw r0, #1
+# CHECK: movw r0, #2
+# CHECK: movw r0, #3
+# CHECK: movw r0, #4
----------------
simon_tatham wrote:
> jhenderson wrote:
> > Do you think it would be worth also showing which symbols are printed?
> Hmmm. Now I look more closely, there's still something a bit odd here.
> 
> The symbols that //are// printed are the ones beginning with `B`, in every case. (Exactly as the unmodified llvm-objdump would have done.) The presence of a non-data symbol at each location has caused the section contents to be disassembled as code, but the non-data symbol isn't winning the contest in every case to be the one printed. I wonder if that inconsistency might be confusing? If we think the code symbol is more important from a disassembly perspective, perhaps we should make it the one displayed, as well, for consistency? 
> 
> Otherwise you end up printing a data symbol name followed by code, which looks confusing. I feel as if we ought to print a data symbol with data, or a code symbol with code, but not a confusing mixture. I'll have a rethink.
I agree - if we are disassembling as code, we should be printing code symbols. If there are multiple symbols to print (due to --disassemble-symbol etc), then if any of them are code, I think we should still print as code. However, if none of the code symbols are "selected" for printing, we should print as data, in my opinion.

In case there's any ambiguity, I do think we should pick the code symbols above the data symbols, both in choosing which symbol to pick and therefore how to disassemble a block of bytes.

It is probably worth taking a look at what GNU objdump does, and see if you can identify any behaviour that makes sense and we can conform to. Disassembly is one area where we diverge somewhat, but I think it might still be a useful reference point.

================
Comment at: llvm/test/tools/llvm-objdump/multiple-symbols.test:21
+
+## The test input file contains an Arm and a Thumb function, each with two
+## function-type symbols defined at its entry point. Also, because it's Arm,
----------------
simon_tatham wrote:
> jhenderson wrote:
> > Nit: here and elsewhere, I think the canonical spelling is ARM.
> I do have to keep remembering that LLVM's canonical spelling isn't the same as Arm's canonical spelling. My fingers have a very strong habit of typing it the way //we// spell it, for obvious reasons!
I mean, I'm going off what wikipedia (and several other websites) tells me is the spelling :-) Strange that Arm's official spelling according to the company is different! I'd be happy to go back to what you had before then!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131589/new/

https://reviews.llvm.org/D131589