[llvm-dev] MASM & RIP-relative addressing
Eric Astor via llvm-dev
llvm-dev at lists.llvm.org
Tue Jan 21 14:41:31 PST 2020
Apologies - I apparently remembered part of the issue incorrectly, so this
ended up quite confusing. The problem comes when referencing labels in a
different section of the binary. To clarify, if I assemble the code:
.data
foo BYTE 5
.code
mov eax, foo
with Microsoft's ml64.exe, it emits an object file disassembling to:
0: 8b 05 00 00 00 00 mov eax, dword ptr [rip]
000000000000000b: IMAGE_REL_AMD64_REL32 foo
On the other hand, if I use my current local draft of llvm-ml, I get a
different result. I actually get the same result as I do for llvm-mc, using
the corresponding code:
.data
foo:
.byte 5
.text
.intel_syntax
mov eax, foo
Either way, LLVM emits an object file with disassembly (and relocation) as
follows:
0: 8b 04 25 00 00 00 00 mov eax, dword ptr [0]
0000000000000003: IMAGE_REL_AMD64_ADDR32 foo
To replicate the results from ml64.exe with LLVM, I instead need to use
mov eax, [foo + rip]
in place of mov eax, foo. At least when building with llvm-ml, we need to
mimic ml.exe's approach; a reference to a symbol in another section should
use the relative addressing mode.
My first attempt to fix this was very clumsy - when in MASM mode, I forced
all expressions without a base register to presume RIP. Unfortunately, that
breaks any attempt to use "jcc", since it turns label references into
absolute memory references with a base register (and the "jcc" family
doesn't accept absolute memory operands). Any suggestions for how I can fix
the issue described here without breaking "jcc"?
On Tue, Jan 21, 2020 at 3:43 PM Eli Friedman <efriedma at quicinc.com> wrote:
> All immediate jump instructions on x86 (call/jmp/jcc) have a relative
> offset operand. The destination is, in some sense, “rip-relative”, but we
> don’t represent it like that in LLVM. If you look at the TableGen
> descriptions, jumps use brtarget32, and calls use i32imm_pcrel. In both
> Microsoft and GNU assembly syntax, this is something like “call baz”.
>
>
>
> “call”/”jmp” also have a register/memory form, for indirect calls. In
> 64-bit, this allows rip-relative references, to call a function pointer
> stored in a global variable. In Microsoft assembly syntax, this is “call
> QWORD PTR baz”. In GNU assembly syntax, this is “call *baz(%rip)”.
>
>
>
> For 64-bit x86, any reference to a global has to be a rip-relative address
> (since all 64-bit programs are position-independent), but on 32-bit x86,
> it’s also possible to refer to the address of a variable using something
> like “add eax, OFFSET baz”.
>
>
>
> For globals which are explicitly labeled “PTR” or “OFFSET”, the correct
> representation should be unambiguous, and it should be easy to print
> appropriate error messages. For other cases, I’m not sure what the
> inference rules are. It might vary depending on the opcode.
>
>
>
> -Eli
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Eric
> Astor via llvm-dev
> *Sent:* Monday, January 20, 2020 6:26 PM
> *To:* LLVM-dev <llvm-dev at lists.llvm.org>
> *Subject:* [EXT] [llvm-dev] MASM & RIP-relative addressing
>
>
>
> Hi all,
>
>
>
> Continuing work on llvm-ml (a MASM assembler)... and my latest obstacle is
> in enabling MASM's convention that (unless specified) all memory location
> references should be RIP-relative. Without it, we emit the wrong
> instructions for "call", "jmp", etc., and anything we build fails at the
> linking stage.
>
>
>
> My best attempt at this so far is a small patch to X86AsmParser.cpp - just
> taking any Intel expression with no specified base register and switching
> it to use RIP - and this works alright. There's at least one exception: it
> breaks the "jcc" instructions, at least "jcc <label>". The issue seems to
> be that the "jcc" family exclusively takes a relative offset, never an
> absolute reference... so adding a base register causes the operand not to
> match. ("jcc" is always RIP-relative anyway.)
>
>
>
> I'm not very familiar with the operand-matching logic, and am still pretty
> new to LLVM as a whole. Are there more X86 instructions this will interact
> badly with? Any thoughts on how this could be handled better?
>
>
>
> If this is mostly a valid approach, might there be a way to change the
> operand type of "jcc" to accept offset(base) operands, as long as base ==
> X86::RIP, then ignore the RIP bit?
>
>
>
> Thanks,
>
> - Eric
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200121/444f0f07/attachment.html>
More information about the llvm-dev
mailing list