            Bug ID: 22511
           Summary: [x86asm intel syntax] `mov` with a symbol from a .set
                    directive not handled correctly (?)
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: nicolasweber at gmx.de
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Consider this asm hello world (on OS X):


      .ascii "Hello, ASM.\n"
      .set mylen, .-str

    .global start
      mov rdi, 1
      lea rsi, qword ptr [rip+str]  // [rip+str at GOTPCREL] for GOT instead of
      mov rdx, mylen  // doesn't. clang -cc1as bug?
      mov rax, 0x2000004  # SYSCALL_WRITE

      mov rdi, 42
      mov rax, 0x2000001  # SYSCALL_EXIT

$ clang -c -o hello.o hello.asm && ld -o hello hello.o
$ ./hello
Segmentation fault: 11

The reason this crashes is because `mov rdx, mylen` is compiled as `mov rdx,
[12]` -- mylen is correctly converted to "12", but clang thinks that it should
be dereferenced:

$ r2 hello
[0x00001fd1]> px 10
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00001fd1  48c7 c701 0000 0048 8d35                 H......H.5      
[0x00001fd1]> pd 10
           ;-- entry0:
           0x00001fd1    48c7c701000. mov rdi, 1
           0x00001fd8    488d35e6fff. lea rsi, qword [rip - 0x1a]
           0x00001fdf    488b14250c0. mov rdx, qword [0xc]

This doesn't look right to me. In AT&T syntax, I have to say `mov $len, %rdx`
(with a $) to not dereference len, but that's consistent with other immediates.

The same program in AT&T syntax works fine (compiled with the same commands):

      .ascii "Hello world!\n"
      .set mylen, .-str

    .globl start
      movl $0x2000004, %eax
      movl $1, %edi
      movq str at GOTPCREL(%rip), %rsi
      mov $mylen, %rdx

      movl $42, %ebx
      movl $0x2000001, %eax           # exit 0

The same equivalent program in intel syntax works fine with gas on linux:

    .intel_syntax noprefix

      .ascii "Hello, ASM.\n"
      .set len, .-str

    .global _start
      movq rdi, 1
      movq rsi, OFFSET FLAT:str
      movq rdx, len
      movq rax, 1  # sys_write

      movq rdi, 42
      movq rax, 60  # sys_exit

$ gcc -c test.s && ld test.o
$ ./a.out 
Hello, ASM.

So I the behavior of clang's integrated assembler might be incorrect for .set

