[llvm-commits] [llvm] r105303 - /llvm/trunk/lib/Target/X86/README-X86-64.txt

Sat Jun 12 18:46:44 PDT 2010

On Sat, Jun 12, 2010 at 4:34 PM, Chris Lattner <clattner at apple.com> wrote:
>
> On Jun 1, 2010, at 5:10 PM, Eli Friedman wrote:
>
>> Author: efriedma
>> Date: Tue Jun  1 19:10:36 2010
>> New Revision: 105303
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=105303&view=rev
>> Log:
>> Remove outdated README entries.
>
> Hi Eli, are you sure these are outdated?
>
> These looks still relevant:
>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/README-X86-64.txt (original)
>> +++ llvm/trunk/lib/Target/X86/README-X86-64.txt Tue Jun  1 19:10:36 2010
>> @@ -98,124 +76,6 @@
>>
>> //===---------------------------------------------------------------------===//
>>
>> -Vararg function prologue can be further optimized. Currently all XMM registers
>> -are stored into register save area. Most of them can be eliminated since the
>> -upper bound of the number of XMM registers used are passed in %al. gcc produces
>> -something like the following:
>> -
>> -     movzbl  %al, %edx
>> -     leaq    0(,%rdx,4), %rax
>> -     leaq    4+L2(%rip), %rdx
>> -     leaq    239(%rsp), %rax
>> -             jmp     *%rdx
>> -     movaps  %xmm7, -15(%rax)
>> -     movaps  %xmm6, -31(%rax)
>> -     movaps  %xmm5, -47(%rax)
>> -     movaps  %xmm4, -63(%rax)
>> -     movaps  %xmm3, -79(%rax)
>> -     movaps  %xmm2, -95(%rax)
>> -     movaps  %xmm1, -111(%rax)
>> -     movaps  %xmm0, -127(%rax)
>> -L2:
>> -
>> -It jumps over the movaps that do not need to be stored. Hard to see this being
>> -significant as it added 5 instruciton (including a indirect branch) to avoid
>> -executing 0 to 8 stores in the function prologue.
>> -
>> -Perhaps we can optimize for the common case where no XMM registers are used for
>> -parameter passing. i.e. is %al == 0 jump over all stores. Or in the case of a
>> -leaf function where we can determine that no XMM input parameter is need, avoid
>> -emitting the stores at all.

We have a jump over the stores if %al == 0.  I guess we don't try to
detect the case where no floats are passed to va_arg, but that's
practically impossible without large changes to the way we lower
va_arg to IR.

>> -//===---------------------------------------------------------------------===//
>> -
>> -AMD64 has a complex calling convention for aggregate passing by value:
>> -
>> -1. If the size of an object is larger than two eightbytes, or in C++, is a non-
>> -   POD structure or union type, or contains unaligned fields, it has class
>> -   MEMORY.
>> -2. Both eightbytes get initialized to class NO_CLASS.
>> -3. Each field of an object is classified recursively so that always two fields
>> -   are considered. The resulting class is calculated according to the classes
>> -   of the fields in the eightbyte:
>> -   (a) If both classes are equal, this is the resulting class.
>> -   (b) If one of the classes is NO_CLASS, the resulting class is the other
>> -       class.
>> -   (c) If one of the classes is MEMORY, the result is the MEMORY class.
>> -   (d) If one of the classes is INTEGER, the result is the INTEGER.
>> -   (e) If one of the classes is X87, X87UP, COMPLEX_X87 class, MEMORY is used as
>> -      class.
>> -   (f) Otherwise class SSE is used.
>> -4. Then a post merger cleanup is done:
>> -   (a) If one of the classes is MEMORY, the whole argument is passed in memory.
>> -   (b) If SSEUP is not preceeded by SSE, it is converted to SSE.
>> -
>> -Currently llvm frontend does not handle this correctly.
>> -
>> -Problem 1:
>> -    typedef struct { int i; double d; } QuadWordS;
>> -It is currently passed in two i64 integer registers. However, gcc compiled
>> -callee expects the second element 'd' to be passed in XMM0.
>> -
>> -Problem 2:
>> -    typedef struct { int32_t i; float j; double d; } QuadWordS;
>> -The size of the first two fields == i64 so they will be combined and passed in
>> -a integer register RDI. The third field is still passed in XMM0.
>> -
>> -Problem 3:
>> -    typedef struct { int64_t i; int8_t j; int64_t d; } S;
>> -    void test(S s)
>> -The size of this aggregate is greater than two i64 so it should be passed in
>> -memory. Currently llvm breaks this down and passed it in three integer
>> -registers.
>> -
>> -Problem 4:
>> -Taking problem 3 one step ahead where a function expects a aggregate value
>> -in memory followed by more parameter(s) passed in register(s).
>> -    void test(S s, int b)
>> -
>> -LLVM IR does not allow parameter passing by aggregates, therefore it must break
>> -the aggregates value (in problem 3 and 4) into a number of scalar values:
>> -    void %test(long %s.i, byte %s.j, long %s.d);
>> -
>> -However, if the backend were to lower this code literally it would pass the 3
>> -values in integer registers. To force it be passed in memory, the frontend
>> -should change the function signiture to:
>> -    void %test(long %undef1, long %undef2, long %undef3, long %undef4,
>> -               long %undef5, long %undef6,
>> -               long %s.i, byte %s.j, long %s.d);
>> -And the callee would look something like this:
>> -    call void %test( undef, undef, undef, undef, undef, undef,
>> -                     %tmp.s.i, %tmp.s.j, %tmp.s.d );
>> -The first 6 undef parameters would exhaust the 6 integer registers used for
>> -parameter passing. The following three integer values would then be forced into
>> -memory.
>> -
>> -For problem 4, the parameter 'd' would be moved to the front of the parameter
>> -list so it will be passed in register:
>> -    void %test(int %d,
>> -               long %undef1, long %undef2, long %undef3, long %undef4,
>> -               long %undef5, long %undef6,
>> -               long %s.i, byte %s.j, long %s.d);
>> -

I'm pretty sure argument passing on x86-64 works. :)  And we have a
bug on adding an ABI-lowering library to LLVM.

-Eli