[LLVMbugs] [Bug 22511] New: [x86asm intel syntax] `mov` with a symbol from a .set directive not handled correctly (?)

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Sun Feb 8 16:39:15 PST 2015


http://llvm.org/bugs/show_bug.cgi?id=22511

            Bug ID: 22511
           Summary: [x86asm intel syntax] `mov` with a symbol from a .set
                    directive not handled correctly (?)
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: nicolasweber at gmx.de
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Consider this asm hello world (on OS X):

    .intel_syntax

    str:
      .ascii "Hello, ASM.\n"
      .set mylen, .-str

    .global start
    start:
      mov rdi, 1
      lea rsi, qword ptr [rip+str]  // [rip+str at GOTPCREL] for GOT instead of
rip-rel
      mov rdx, mylen  // doesn't. clang -cc1as bug?
      mov rax, 0x2000004  # SYSCALL_WRITE
      syscall

      mov rdi, 42
      mov rax, 0x2000001  # SYSCALL_EXIT
      syscall

$ clang -c -o hello.o hello.asm && ld -o hello hello.o
$ ./hello
Segmentation fault: 11

The reason this crashes is because `mov rdx, mylen` is compiled as `mov rdx,
[12]` -- mylen is correctly converted to "12", but clang thinks that it should
be dereferenced:

$ r2 hello
[0x00001fd1]> px 10
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00001fd1  48c7 c701 0000 0048 8d35                 H......H.5      
[0x00001fd1]> pd 10
           ;-- entry0:
           0x00001fd1    48c7c701000. mov rdi, 1
           0x00001fd8    488d35e6fff. lea rsi, qword [rip - 0x1a]
           0x00001fdf    488b14250c0. mov rdx, qword [0xc]



This doesn't look right to me. In AT&T syntax, I have to say `mov $len, %rdx`
(with a $) to not dereference len, but that's consistent with other immediates.

The same program in AT&T syntax works fine (compiled with the same commands):

    str:
      .ascii "Hello world!\n"
      .set mylen, .-str

    .globl start
    start:
      movl $0x2000004, %eax
      movl $1, %edi
      movq str at GOTPCREL(%rip), %rsi
      mov $mylen, %rdx
      syscall

      movl $42, %ebx
      movl $0x2000001, %eax           # exit 0
      syscall



The same equivalent program in intel syntax works fine with gas on linux:

    .intel_syntax noprefix

    str:
      .ascii "Hello, ASM.\n"
      .set len, .-str

    .global _start
    _start:
      movq rdi, 1
      movq rsi, OFFSET FLAT:str
      movq rdx, len
      movq rax, 1  # sys_write
      syscall

      movq rdi, 42
      movq rax, 60  # sys_exit
      syscall

$ gcc -c test.s && ld test.o
$ ./a.out 
Hello, ASM.


So I the behavior of clang's integrated assembler might be incorrect for .set
directives.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150209/70bbcef2/attachment.html>


More information about the llvm-bugs mailing list