[PATCH] D78702: [RFC][RISCV][MC/Objdump] Extend llvm-objdump output to support more instruction patterns

Simon Cook via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 23 04:49:44 PDT 2020


simoncook created this revision.
simoncook added reviewers: asb, lenary, edward-jones.
Herald added subscribers: llvm-commits, evandro, luismarques, apazos, sameer.abuasal, pzheng, s.egerton, Jim, benna, psnobl, jocewei, rupprecht, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, zzheng, MaskRay, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, johnrusso, rbar, hiraditya.
Herald added a reviewer: jhenderson.
Herald added a project: LLVM.

This extends llvm-objdump and MCInstrAnalysis to be able to do more in
depth analysis of instructions which cannot be done by looking at
just a single instruction. I would like some feedback on the approach
taken so far, whether this is suitable or if some parts should be
moved or changed into more generic interfaces.

Currently MCInstrAnalysis has the evaluateBranch method which is given
a single MCInst that is known to be a branch will return if possible
the target address of that instruction, with no context other than the
current PC.

In GNU objdump, the RISC-V disassembler can determine the address of
more instructions, for example calculating branch/call targets built
over a number of instructions, and the addresses of targets of load
and store instruction, both immediates and those which are GP-relative.

To add such functionality, I have extended MCInstrAnalysis with a new
evaluateInst function, which provides the same functionality as
evaluateBranch, but has the ability to store information across multiple
instructions allowing these patterns to be picked up. The default
implementation does the same check llvm-objdump does before calling
evaluateInst, so is a NFC change for other targets, but for RISC-V
does a more detailed analysis. This allows evaluateInst to be safe to
call on all instructions and thus is called for each instruction.

In the RISC-V implementation there is a cache of the current state of
all known registers, and upon evaluating an instruction, updates the
known GPR state in case a future instruction needs that value. To avoid
mis-identifying addresses, upon a change of control flow the cache is
invalidated, except for if GP is defined (by the __global_pointer$
symbol, in which case it is reset to the known value).

For this to work there have had to be a couple of other changes to
llvm-objdump that are target specific, but would be good to make
generic. These are:

- The MCInstrAnalysis object is no longer const, should this analysis

belong in this class (which given the evaluateBranch call, I think
makes sense)

- There is a resetAnalysis hook which provides an additional signal

that control flow has changed, I think as a "new symbol/file"
indicator this should be fine.

- I've added a address printout after the instruction just for RISC-V;

this matches GNU objdump's behaviour, but does not necessarily have
to land with the rest of the patch.

- I do an explicit check for __global_pointer$ and pass this through

to MCInstrAnalysis since it has no concept of anything more than
MCInsts, there may be scope for making this more generic.

Since I haven't written tests yet, I can give an example of before/
after with this patch:

  00010074 <main>:
     10074: 63 15 05 00                   bne     a0, zero, 10 <main+0xa>
     10078: 03 a5 c1 8f                   lw      a0, -1796(gp)
     1007c: 82 80                         c.jr    ra
     1007e: 23 ae a1 8e                   sw      a0, -1796(gp)
     10082: 7d 15                         c.addi  a0, -1
     10084: c5 bf                         c.j     -16 <main>
  
  00010090 <_start>:
     10090: 97 21 00 00                   auipc   gp, 2
     10094: 93 81 41 ba                   addi    gp, gp, -1116
     10098: 17 15 00 00                   auipc   a0, 1
     1009c: 13 05 85 49                   addi    a0, a0, 1176
     100a0: 17 16 00 00                   auipc   a2, 1
     100a4: 13 06 06 4b                   addi    a2, a2, 1200
     100a8: 09 8e                         c.sub   a2, a0
     100aa: 81 45                         c.li    a1, 0
     100ac: 97 00 00 00                   auipc   ra, 0
     100b0: e7 80 20 1e                   jalr    ra, 482(ra)
     100b4: 17 05 00 00                   auipc   a0, 0
     100b8: 13 05 85 13                   addi    a0, a0, 312
     100bc: 97 00 00 00                   auipc   ra, 0
     100c0: e7 80 40 0f                   jalr    ra, 244(ra)
     100c4: 97 00 00 00                   auipc   ra, 0
     100c8: e7 80 40 16                   jalr    ra, 356(ra)
     100cc: 02 45                         c.lwsp  a0, 0(sp)
     100ce: 4c 00                         c.addi4spn      a1, sp, 4
     100d0: 01 46                         c.li    a2, 0
     100d2: 97 00 00 00                   auipc   ra, 0
     100d6: e7 80 20 fa                   jalr    ra, -94(ra)
     100da: 17 03 00 00                   auipc   t1, 0
     100de: 67 00 63 0e                   jalr    zero, 230(t1)



  00010074 <main>:
     10074: 63 15 05 00                   bne     a0, zero, 10 #1007e <main+0xa>
     10078: 03 a5 c1 8f                   lw      a0, -1796(gp) #11490 <__bss_start>
     1007c: 82 80                         c.jr    ra
     1007e: 23 ae a1 8e                   sw      a0, -1796(gp) #11490 <__bss_start>
     10082: 7d 15                         c.addi  a0, -1
     10084: c5 bf                         c.j     -16 #10074 <main>
  
  00010090 <_start>:
     10090: 97 21 00 00                   auipc   gp, 2
     10094: 93 81 41 ba                   addi    gp, gp, -1116 #11c34 <_end+0x6e4>
     10098: 17 15 00 00                   auipc   a0, 1
     1009c: 13 05 85 49                   addi    a0, a0, 1176 #11530 <__bss_start>
     100a0: 17 16 00 00                   auipc   a2, 1
     100a4: 13 06 06 4b                   addi    a2, a2, 1200 #11550 <_end>
     100a8: 09 8e                         c.sub   a2, a0
     100aa: 81 45                         c.li    a1, 0
     100ac: 97 00 00 00                   auipc   ra, 0
     100b0: e7 80 20 1e                   jalr    ra, 482(ra) #1028e <memset>
     100b4: 17 05 00 00                   auipc   a0, 0
     100b8: 13 05 85 13                   addi    a0, a0, 312 #101ec <__libc_fini_array>
     100bc: 97 00 00 00                   auipc   ra, 0
     100c0: e7 80 40 0f                   jalr    ra, 244(ra) #101b0 <atexit>
     100c4: 97 00 00 00                   auipc   ra, 0
     100c8: e7 80 40 16                   jalr    ra, 356(ra) #10228 <__libc_init_array>
     100cc: 02 45                         c.lwsp  a0, 0(sp)
     100ce: 4c 00                         c.addi4spn      a1, sp, 4
     100d0: 01 46                         c.li    a2, 0
     100d2: 97 00 00 00                   auipc   ra, 0
     100d6: e7 80 20 fa                   jalr    ra, -94(ra) #10074 <main>
     100da: 17 03 00 00                   auipc   t1, 0
     100de: 67 00 63 0e                   jalr    zero, 230(t1) #101c0 <exit>

Any other comments/suggestions on how to implement this are welcome.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D78702

Files:
  llvm/include/llvm/MC/MCInstrAnalysis.h
  llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
  llvm/test/MC/Disassembler/RISCV/branch-targets.txt
  llvm/test/MC/RISCV/rv64-relax-all.s
  llvm/tools/llvm-objdump/llvm-objdump.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D78702.259524.patch
Type: text/x-patch
Size: 12093 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200423/f2647973/attachment.bin>


More information about the llvm-commits mailing list