[PATCH] D84191: [llvm-objdump] Symbolize binary addresses for low-noisy asm diff.

Fri Jul 24 09:19:48 PDT 2020

hoyFB marked 7 inline comments as done.
hoyFB added a comment.

In D84191#2168738 <https://reviews.llvm.org/D84191#2168738>, @jhenderson wrote:

> It looks like you might still have a binary file in the patch, that is no longer needed?

The binary file should have been removed.

================
Comment at: llvm/docs/CommandGuide/llvm-objdump.rst:206
+
+  Only works with --x86-asm-syntax=Intel and an executable binary.
+
----------------
jhenderson wrote:
> hoyFB wrote:
> > hoyFB wrote:
> > > jhenderson wrote:
> > > > MaskRay wrote:
> > > > > You probably want to mean a linked image (not a relocatable object file).
> > > > > 
> > > > > Capitalized `Intel` isn't supported. att is much more common and --x86-asm-syntax=intel is rare. As a related fact, LLVM's Intel syntax assembler gets less scrutiny and tends to have some problems.
> > > > Is there anything fundamentally preventing this from working with the default assembly syntax or relocatable objects? Or is it just that you haven't implemented it yet?
> > > Yes, a linked image is more accurate.
> > > 
> > > Changed Intel to lower case. 
> > > 
> > > The att syntax is now supported too. 
> > We need a tool to compare executable image since very often the immediate object files are not available. 
> > 
> > Regarding the assembly syntax, we are mostly comfortable reading the intel syntax. But since the att syntax is more common, I'll add support for that.
> I'm not sure you didn't misunderstand my comment earlier. Let's say a user does have a single intermediate object file they want to disassemble, and it has plenty of references within itself. These addresses will be present, thus comparing it against a different version of the same object might produce a noisy diff. It sounds like this option would be helpful there, so why doesn't it work?
Thanks for the clarification. Yes, we have a fundamental issue with disassembling object files because the symbol resolution unit (in llvm-objdump.cpp) doesn't handle relocations. For local jump targets this is probably fine since the jump distance is computed from the start of the current function. For global jump targets like a function call target or a global variable access, the symbolization will not work without considering the relocations. This is true even without symbolization where a call target will be disassembled as an offset from the current function.

================
Comment at: llvm/test/tools/llvm-objdump/X86/elf-disassemble-symbololize-operands.yaml:9
+# ATT-NEXT: <L1>:
+# ATT-NEXT: cmpl	, %eax <g>
+# ATT-NEXT: jge <L0>
----------------
jhenderson wrote:
> What's with the spaces before the ',' here.
It is for the alignment of the first operand which isn't printed with the symbolization.

================
Comment at: llvm/test/tools/llvm-objdump/X86/elf-disassemble-symbololize-operands.yaml:9
+# ATT-NEXT: <L1>:
+# ATT-NEXT: cmpl	, %eax <g>
+# ATT-NEXT: jge <L0>
----------------
jhenderson wrote:
> hoyFB wrote:
> > This is less than ideal because of the current less intrusive implementation. It's probably fine from asm diffing standpoint. To fix this we may want to pass the symbol/label resolution result from the common code into the per-target inst printer and print the symbols right in place for a memory operand. But that may cause changes to the core `MCInstPrinter` interface and the callbacks to initialize target inst printers.
> It's not clear to me what is less than ideal here? What would you prefer in an ideal situation?
Sorry, I should have commented on this line : `# ATT-NEXT: cmpl , %eax <g> `. Ideally I'd like something like `cmpl <g>, %eax`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84191/new/

https://reviews.llvm.org/D84191