[llvm] [feature][riscv] handle target address calculation in llvm-objdump disassembly for riscv (PR #109914)

Arjun Patel via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 24 08:25:10 PST 2024


arjunUpatel wrote:

I am consistently failing one of the tests and have rattled my brain quite a bit about it. Here is whats going + guidance needed:

The test I am failing is the 5th test in llvm-project/llvm/test/tools/llvm-objdump/X86/disassemble-same-section-addr.test. This particular test ensures that the absolute symbol is used for target address resolution if no symbol is found in the candidate section. Here is the structure of the test ELF file:

Sections:
  - Name:    .caller
    Type:    SHT_PROGBITS
    Flags:   [SHF_ALLOC, SHF_EXECINSTR]
    Address: 0x0
    Content: e800000000 ## Call instruction to next address.
  - Name:    .first
    Type:    SHT_PROGBITS
    Flags:   [SHF_ALLOC, SHF_EXECINSTR]
    Address: 0x5
    Size:    [[SIZE1]]
  - Name:    .second
    Type:    SHT_PROGBITS
    Flags:   [SHF_ALLOC, SHF_EXECINSTR]
    Address: 0x5
    Size:    [[SIZE2]]

Symbols:
  - Name:    target
    Section: [[SECTION]]
    Value:   0x5
  - Name:    other
    Index:   [[INDEX]]
    Value:   0x0

Here is the disassembly of the file when the parameters are set in the following way (according to test 5):
SIZE1=1, SIZE2=0, SECTION=.caller, INDEX=SHN_ABS

Disassembly of section .caller:
0000000000000000 <.caller>:
       0: e8 00 00 00 00               	callq	0x5

Disassembly of section .first:
0000000000000005 <.first>:
       5: 00                           	<unknown>

Target address resolution shall occur at address the callq instruction in the .caller section. Currently, the expected resolution is <other+0x5>. This would occur because the current implementation strictly checks the set of sections with the same address where the address is the closest to and less than or equal to the target. Lets call this set of sections $A$. In the current implementation if all the sections in $A$ are empty, then the absolute symbol is used. But what about symbols that occur in sections before $A$? Should the sections before $A$ also not be checked for valid symbols and the address resolution printed relative to one of those symbols? Following this scheme, the address resolution in this case will switch to <target> since the address of target is 0x5.

Checking sections before $A$ is exactly what binutils seems to be doing leading to discrepancies and failing tests when trying to fully mimic its behavior. Which scheme should I proceed with?

https://github.com/llvm/llvm-project/pull/109914


More information about the llvm-commits mailing list