[llvm-dev] MASM & RIP-relative addressing

Eric Astor via llvm-dev llvm-dev at lists.llvm.org
Tue Jan 21 14:41:31 PST 2020


Apologies - I apparently remembered part of the issue incorrectly, so this
ended up quite confusing. The problem comes when referencing labels in a
different section of the binary. To clarify, if I assemble the code:

.data
foo BYTE 5
.code
mov eax, foo

with Microsoft's ml64.exe, it emits an object file disassembling to:

       0:       8b 05 00 00 00 00       mov     eax, dword ptr [rip]
                000000000000000b:  IMAGE_REL_AMD64_REL32        foo

On the other hand, if I use my current local draft of llvm-ml, I get a
different result. I actually get the same result as I do for llvm-mc, using
the corresponding code:

.data
foo:
.byte 5
.text
.intel_syntax
mov eax, foo

Either way, LLVM emits an object file with disassembly (and relocation) as
follows:

       0:       8b 04 25 00 00 00 00    mov     eax, dword ptr [0]
                0000000000000003:  IMAGE_REL_AMD64_ADDR32       foo

To replicate the results from ml64.exe with LLVM, I instead need to use

mov eax, [foo + rip]

in place of mov eax, foo. At least when building with llvm-ml, we need to
mimic ml.exe's approach; a reference to a symbol in another section should
use the relative addressing mode.

My first attempt to fix this was very clumsy - when in MASM mode, I forced
all expressions without a base register to presume RIP. Unfortunately, that
breaks any attempt to use "jcc", since it turns label references into
absolute memory references with a base register (and the "jcc" family
doesn't accept absolute memory operands). Any suggestions for how I can fix
the issue described here without breaking "jcc"?

On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <efriedma at quicinc.com> wrote:

> All immediate jump instructions on x86 (call/jmp/jcc) have a relative
> offset operand.  The destination is, in some sense, “rip-relative”, but we
> don’t represent it like that in LLVM.  If you look at the TableGen
> descriptions, jumps use brtarget32, and calls use i32imm_pcrel.  In both
> Microsoft and GNU assembly syntax, this is something like “call baz”.
>
>
>
> “call”/”jmp” also have a register/memory form, for indirect calls.  In
> 64-bit, this allows rip-relative references, to call a function pointer
> stored in a global variable.  In Microsoft assembly syntax, this is “call
> QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.
>
>
>
> For 64-bit x86, any reference to a global has to be a rip-relative address
> (since all 64-bit programs are position-independent), but on 32-bit x86,
> it’s also possible to refer to the address of a variable using something
> like “add eax, OFFSET baz”.
>
>
>
> For globals which are explicitly labeled “PTR” or “OFFSET”, the correct
> representation should be unambiguous, and it should be easy to print
> appropriate error messages.  For other cases, I’m not sure what the
> inference rules are.  It might vary depending on the opcode.
>
>
>
> -Eli
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Eric
> Astor via llvm-dev
> *Sent:* Monday, January 20, 2020 6:26 PM
> *To:* LLVM-dev <llvm-dev at lists.llvm.org>
> *Subject:* [EXT] [llvm-dev] MASM & RIP-relative addressing
>
>
>
> Hi all,
>
>
>
> Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is
> in enabling MASM's convention that (unless specified) all memory location
> references should be RIP-relative. Without it, we emit the wrong
> instructions for "call", "jmp", etc., and anything we build fails at the
> linking stage.
>
>
>
> My best attempt at this so far is a small patch to X86AsmParser.cpp - just
> taking any Intel expression with no specified base register and switching
> it to use RIP - and this works alright. There's at least one exception: it
> breaks the "jcc" instructions, at least "jcc <label>". The issue seems to
> be that the "jcc" family exclusively takes a relative offset, never an
> absolute reference... so adding a base register causes the operand not to
> match. ("jcc" is always RIP-relative anyway.)
>
>
>
> I'm not very familiar with the operand-matching logic, and am still pretty
> new to LLVM as a whole. Are there more X86 instructions this will interact
> badly with? Any thoughts on how this could be handled better?
>
>
>
> If this is mostly a valid approach, might there be a way to change the
> operand type of "jcc" to accept offset(base) operands, as long as base ==
> X86::RIP, then ignore the RIP bit?
>
>
>
> Thanks,
>
> - Eric
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200121/444f0f07/attachment.html>


More information about the llvm-dev mailing list