[llvm-dev] llvm-mc assembler, GNU as, and pc-relative branches for Arm/AArch64/Mips

Daniel Sanders via llvm-dev llvm-dev at lists.llvm.org
Wed Jan 10 10:23:06 PST 2018


On the Mips side of things, I expect this difference was missed because there's little reason to use immediates in branches. The Mips assembler has relative labels that can be redefined and don't create entries in the symbol table:
1:
	...
	beq $6, $7, 1f  ; The next label 1
	beq $6, $7, 1b ; The previous label 1
	...
1:

Generally speaking, I think LLVM's assembler output should match GNU's but I'm not sure this particular case occurs in real code.

> On 10 Jan 2018, at 02:48, Alex Bradbury via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> # Summary
> 
> As a consequence of comparing the RISC-V LLVM MC assembler to the RISC-V GNU
> assembler I've noticed that a number of targets have quite different handling
> for pc-relative jumps/branches with immediate integer operands in llvm-mc vs
> GNU as. I'll admit that this isn't likely to occur in hand-written code (as
> you'd almost always prefer to use a label), but thought it was worth slightly
> wider discussion. See below for full details, but really it boils down to
> whether you treat an immediate offset to a pc-relative branch as an absolute
> target or a pc-relative offset to be directly encoded in the instruction.
> 
> 1) Is this an intentional difference in behaviour and just something assembly
> authors should live with?
> 2) If not, is there any interest in resolving it? Obviously I can file bugs on
> bugzilla.
> 3) Is anyone interested in collaborating on better automated tooling for
> comparing the LLVM MC assembler and GNU as? Or even better, already have
> tooling help with this that might be open sourced? Automatically finding
> problems such as the assembly parsing issue described in Google's recent
> LLVMDevMeeting keynote <https://youtu.be/6l4DtR5exwo?t=10m44s> would be great.
> 
> Please note: it's possible some of the differences I'm seeing are due to
> different default ASM variants or default target options across tools
> - do let me know
> if it seems that's the case.
> 
> # Comparing Mips behaviour
> 
>    $ cat test-mips.s
>    lab:
>    beq $6, $7, 128
>    bne $4, $5, 64
>    beq $6, $7, 128
>    bne $4, $5, 64
> 
> Assemble with llvm-mc: `llvm-mc -triple=mipsel-unknown-linux test-mips.s
> -filetype=obj > foo.o` and then disassemble with `llvm-objdump -d -r`:
> 
>    foo.o:  file format ELF32-mips
> 
>    Disassembly of section .text:
>    lab:
>           0: 20 00 c7 10   beq $6, $7, 132 <lab+0x84>
>           4: 00 00 00 00   nop
>           8: 10 00 85 14   bne $4, $5, 68 <lab+0x4c>
>           c: 00 00 00 00   nop
>          10: 20 00 c7 10   beq $6, $7, 132 <lab+0x94>
>          14: 00 00 00 00   nop
>          18: 10 00 85 14   bne $4, $5, 68 <lab+0x5c>
>          1c: 00 00 00 00   nop
> 
> We can see that no relocations are generated, the immediate offsets for the
> beq and bne pairs remain identical, and are interpreted as a PC-relative
> offset.
> 
> Assembling the same input with GNU as (no arguments), then dumping with GNU
> objdump (from the Mips 2016.05-03 precompiled SDK):
> 
>    a.out:     file format elf32-tradbigmips
> 
> 
>    Disassembly of section .text:
> 
>    00000000 <lab>:
>       0: 10c70020  beq a2,a3,84 <lab+0x84>
>          0: R_MIPS_PC16  *ABS*
>       4: 00000000  nop
>       8: 14850010  bne a0,a1,4c <lab+0x4c>
>          8: R_MIPS_PC16  *ABS*
>       c: 00000000  nop
>      10: 10c70020  beq a2,a3,94 <lab+0x94>
>          10: R_MIPS_PC16 *ABS*
>      14: 00000000  nop
>      18: 14850010  bne a0,a1,5c <lab+0x5c>
>          18: R_MIPS_PC16 *ABS*
>      1c: 00000000  nop
> 
> We note that the encoded instructions are identical and the pretty-printed
> target matches LLVM. However the printed immediate is changed across the
> beq/beq and bne/bne pairs so it matches the absolute target.
> 
> 
> # Comparing Arm behaviour
> 
>    $ cat test-arm.s
>    lab:
>    beq 128
>    bne 64
>    beq 128
>    bne 64
> 
> Assemble with llvm-mc: `llvm-mc -triple=armv7-unknown-none test-arm.s
> -filetype=obj > foo.o` and then disassemble with `llvm-ojbdump -d -r`:
> 
>    foo.o:  file format ELF32-arm-little
> 
>    Disassembly of section .text:
>    lab:
>           0: 20 00 00 0a   beq #128 <lab+0x88>
>           4: 10 00 00 1a   bne #64 <lab+0x4c>
>           8: 20 00 00 0a   beq #128 <lab+0x90>
>           c: 10 00 00 1a   bne #64 <lab+0x54>
> 
> No relocations are produced and the immediate argument is clearly interpreted
> as a pc-relative offset.
> 
> Assembling and objdumping the same program with the
> gcc-arm-non-eabi-7-2017-q4-major toolchain (no arguments to as):
> 
>    a.out:     file format elf32-littlearm
> 
> 
>    Disassembly of section .text:
> 
>    00000000 <lab>:
>       0: 0afffffe  beq 80 <*ABS*0x80>
>          0: R_ARM_JUMP24 *ABS*0x80
>       4: 1afffffe  bne 40 <*ABS*0x40>
>          4: R_ARM_JUMP24 *ABS*0x40
>       8: 0afffffe  beq 80 <*ABS*0x80>
>          8: R_ARM_JUMP24 *ABS*0x80
>       c: 1afffffe  bne 40 <*ABS*0x40>
>          c: R_ARM_JUMP24 *ABS*0x40
> 
> In this case, relocations are generated and the argument appears to be
> interpreted as an absolute target. It's worth noting that `beq #128`
> and so on aren't recognised by the GNU assembler, but I might be
> missing an option that enables that syntax?
> 
> # Comparing AArch64 behaviour
> 
>    $ cat test-arm.s
>    lab:
>    beq 128
>    bne 64
>    beq 128
>    bne 64
> 
> Assemble with llvm-mc: `llvm-mc -triple=aarch64-unknown-none test-arm.s
> -filetype=obj > foo.o` and then disassemble with `llvm-objdump -d -r`:
> 
>    foo.o:  file format ELF64-aarch64-little
> 
>    Disassembly of section .text:
>    lab:
>           0: 00 04 00 54   b.eq  #128
>           4: 01 02 00 54   b.ne  #64
>           8: 00 04 00 54   b.eq  #128
>           c: 01 02 00 54   b.ne  #64
> 
> No relocations are produced, and because the pairs of b.eq and b.ne have
> identical encoding we can conclude the immediate argument is interpreted as a
> pc-relative offset.
> 
> Assembling (no arguments to as) and objdumping the same input with the Linaro
> gcc-linaro-7.2.1-2017.11-i686-aarch64-elf toolchain gives:
> 
>    a.out:     file format elf64-littleaarch64
> 
> 
>    Disassembly of section .text:
> 
>    0000000000000000 <lab>:
>       0: 54000400  b.eq  80 <lab+0x80>  // b.none
>       4: 54000201  b.ne  44 <lab+0x44>  // b.any
>       8: 54000400  b.eq  88 <lab+0x88>  // b.none
>       c: 54000201  b.ne  4c <lab+0x4c>  // b.any
> 
> This seems to match the LLVM interpretation, other than different choices
> about printing immediates in hex vs decimal.
> 
> 
> Thanks,
> 
> Alex
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list