[PATCH] D84191: [llvm-objdump] Symbolize binary addresses for low-noisy asm diff.

Wed Jul 22 13:34:24 PDT 2020

hoyFB added inline comments.

================
Comment at: llvm/docs/CommandGuide/llvm-objdump.rst:206
+
+  Only works with --x86-asm-syntax=Intel and an executable binary.
+
----------------
jhenderson wrote:
> MaskRay wrote:
> > You probably want to mean a linked image (not a relocatable object file).
> > 
> > Capitalized `Intel` isn't supported. att is much more common and --x86-asm-syntax=intel is rare. As a related fact, LLVM's Intel syntax assembler gets less scrutiny and tends to have some problems.
> Is there anything fundamentally preventing this from working with the default assembly syntax or relocatable objects? Or is it just that you haven't implemented it yet?
Yes, a linked image is more accurate.

Changed Intel to lower case. 

The att syntax is now supported too. 

================
Comment at: llvm/docs/CommandGuide/llvm-objdump.rst:206
+
+  Only works with --x86-asm-syntax=Intel and an executable binary.
+
----------------
hoyFB wrote:
> jhenderson wrote:
> > MaskRay wrote:
> > > You probably want to mean a linked image (not a relocatable object file).
> > > 
> > > Capitalized `Intel` isn't supported. att is much more common and --x86-asm-syntax=intel is rare. As a related fact, LLVM's Intel syntax assembler gets less scrutiny and tends to have some problems.
> > Is there anything fundamentally preventing this from working with the default assembly syntax or relocatable objects? Or is it just that you haven't implemented it yet?
> Yes, a linked image is more accurate.
> 
> Changed Intel to lower case. 
> 
> The att syntax is now supported too. 
We need a tool to compare executable image since very often the immediate object files are not available. 

Regarding the assembly syntax, we are mostly comfortable reading the intel syntax. But since the att syntax is more common, I'll add support for that.

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1576

+static void collectLocalBranchTargets(
+    ArrayRef<uint8_t> Bytes, const MCInstrAnalysis *MIA, MCDisassembler *DisAsm,
----------------
MaskRay wrote:
> I wonder why you need this. For BOLT or similar post-link rewriting stuff?
Good question. This is needed when doing optimization work where we compare code quality by checking binary disassembly.  

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1580-1583
+  // So far only supports X86. Other target like Arm64 may have data embedded in
+  // the text section.
+  if (!STI->getTargetTriple().isX86())
+    return;
----------------
jhenderson wrote:
> I don't think X86 is exempt from potentially having data embedded in the text section. I think I've seen jump tables that are, for example.
> 
> It's not clear to me, why a target having potential data embedding is an issue?
Yeah, on X86 a jump table could be placed inside a function as a `.data` instruction. So literally it is still an instruction. I noticed that on Arm there might be non-instruction data encoded in the .text section and special handling is needed when disassembling instructions. I removed the comment to reduce the confusion. 

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:1594
+  uint64_t Index = Start;
+  while (Index < End) {
+    // Disassemble a real instruction and record function-local branch labels.
----------------
jhenderson wrote:
> I'd be slightly concerned by this additional pass over the disassembly, due to performance reasons, but I guess it's behind a switch, so it's not that big a deal. Plus, I can't think how you'd identify where the labels should go otherwise, since they might appear before the reference.
Exactly, two-pass disassembling is needed to support backward jumps.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84191/new/

https://reviews.llvm.org/D84191