[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips
Alex Bradbury via llvm-dev
llvm-dev at lists.llvm.org
Wed Jan 10 02:48:53 PST 2018
# Summary
As a consequence of comparing the RISC-V LLVM MC assembler to the RISC-V GNU
assembler I've noticed that a number of targets have quite different handling
for pc-relative jumps/branches with immediate integer operands in llvm-mc vs
GNU as. I'll admit that this isn't likely to occur in hand-written code (as
you'd almost always prefer to use a label), but thought it was worth slightly
wider discussion. See below for full details, but really it boils down to
whether you treat an immediate offset to a pc-relative branch as an absolute
target or a pc-relative offset to be directly encoded in the instruction.
1) Is this an intentional difference in behaviour and just something assembly
authors should live with?
2) If not, is there any interest in resolving it? Obviously I can file bugs on
bugzilla.
3) Is anyone interested in collaborating on better automated tooling for
comparing the LLVM MC assembler and GNU as? Or even better, already have
tooling help with this that might be open sourced? Automatically finding
problems such as the assembly parsing issue described in Google's recent
LLVMDevMeeting keynote <https://youtu.be/6l4DtR5exwo?t=10m44s> would be great.
Please note: it's possible some of the differences I'm seeing are due to
different default ASM variants or default target options across tools
- do let me know
if it seems that's the case.
# Comparing Mips behaviour
$ cat test-mips.s
lab:
beq $6, $7, 128
bne $4, $5, 64
beq $6, $7, 128
bne $4, $5, 64
Assemble with llvm-mc: `llvm-mc -triple=mipsel-unknown-linux test-mips.s
-filetype=obj > foo.o` and then disassemble with `llvm-objdump -d -r`:
foo.o: file format ELF32-mips
Disassembly of section .text:
lab:
0: 20 00 c7 10 beq $6, $7, 132 <lab+0x84>
4: 00 00 00 00 nop
8: 10 00 85 14 bne $4, $5, 68 <lab+0x4c>
c: 00 00 00 00 nop
10: 20 00 c7 10 beq $6, $7, 132 <lab+0x94>
14: 00 00 00 00 nop
18: 10 00 85 14 bne $4, $5, 68 <lab+0x5c>
1c: 00 00 00 00 nop
We can see that no relocations are generated, the immediate offsets for the
beq and bne pairs remain identical, and are interpreted as a PC-relative
offset.
Assembling the same input with GNU as (no arguments), then dumping with GNU
objdump (from the Mips 2016.05-03 precompiled SDK):
a.out: file format elf32-tradbigmips
Disassembly of section .text:
00000000 <lab>:
0: 10c70020 beq a2,a3,84 <lab+0x84>
0: R_MIPS_PC16 *ABS*
4: 00000000 nop
8: 14850010 bne a0,a1,4c <lab+0x4c>
8: R_MIPS_PC16 *ABS*
c: 00000000 nop
10: 10c70020 beq a2,a3,94 <lab+0x94>
10: R_MIPS_PC16 *ABS*
14: 00000000 nop
18: 14850010 bne a0,a1,5c <lab+0x5c>
18: R_MIPS_PC16 *ABS*
1c: 00000000 nop
We note that the encoded instructions are identical and the pretty-printed
target matches LLVM. However the printed immediate is changed across the
beq/beq and bne/bne pairs so it matches the absolute target.
# Comparing Arm behaviour
$ cat test-arm.s
lab:
beq 128
bne 64
beq 128
bne 64
Assemble with llvm-mc: `llvm-mc -triple=armv7-unknown-none test-arm.s
-filetype=obj > foo.o` and then disassemble with `llvm-ojbdump -d -r`:
foo.o: file format ELF32-arm-little
Disassembly of section .text:
lab:
0: 20 00 00 0a beq #128 <lab+0x88>
4: 10 00 00 1a bne #64 <lab+0x4c>
8: 20 00 00 0a beq #128 <lab+0x90>
c: 10 00 00 1a bne #64 <lab+0x54>
No relocations are produced and the immediate argument is clearly interpreted
as a pc-relative offset.
Assembling and objdumping the same program with the
gcc-arm-non-eabi-7-2017-q4-major toolchain (no arguments to as):
a.out: file format elf32-littlearm
Disassembly of section .text:
00000000 <lab>:
0: 0afffffe beq 80 <*ABS*0x80>
0: R_ARM_JUMP24 *ABS*0x80
4: 1afffffe bne 40 <*ABS*0x40>
4: R_ARM_JUMP24 *ABS*0x40
8: 0afffffe beq 80 <*ABS*0x80>
8: R_ARM_JUMP24 *ABS*0x80
c: 1afffffe bne 40 <*ABS*0x40>
c: R_ARM_JUMP24 *ABS*0x40
In this case, relocations are generated and the argument appears to be
interpreted as an absolute target. It's worth noting that `beq #128`
and so on aren't recognised by the GNU assembler, but I might be
missing an option that enables that syntax?
# Comparing AArch64 behaviour
$ cat test-arm.s
lab:
beq 128
bne 64
beq 128
bne 64
Assemble with llvm-mc: `llvm-mc -triple=aarch64-unknown-none test-arm.s
-filetype=obj > foo.o` and then disassemble with `llvm-objdump -d -r`:
foo.o: file format ELF64-aarch64-little
Disassembly of section .text:
lab:
0: 00 04 00 54 b.eq #128
4: 01 02 00 54 b.ne #64
8: 00 04 00 54 b.eq #128
c: 01 02 00 54 b.ne #64
No relocations are produced, and because the pairs of b.eq and b.ne have
identical encoding we can conclude the immediate argument is interpreted as a
pc-relative offset.
Assembling (no arguments to as) and objdumping the same input with the Linaro
gcc-linaro-7.2.1-2017.11-i686-aarch64-elf toolchain gives:
a.out: file format elf64-littleaarch64
Disassembly of section .text:
0000000000000000 <lab>:
0: 54000400 b.eq 80 <lab+0x80> // b.none
4: 54000201 b.ne 44 <lab+0x44> // b.any
8: 54000400 b.eq 88 <lab+0x88> // b.none
c: 54000201 b.ne 4c <lab+0x4c> // b.any
This seems to match the LLVM interpretation, other than different choices
about printing immediates in hex vs decimal.
Thanks,
Alex
More information about the llvm-dev
mailing list