[LLVMdev] Scheduling quirks
Andrew Trick
atrick at apple.com
Tue Jan 21 14:48:51 PST 2014
On Jan 18, 2014, at 5:13 AM, Jasper Neumann <jn at sirrida.de> wrote:
> Hello all!
>
> When I compile the following more or less stupid functions with
> clang++ -O3 -S test.cpp
> ===>
> int test_register(int x) {
> x ^= (x >> 2);
> x ^= (x >> 3);
> x = x ^ (x >> 4);
> int y = x; x >>= 5; x ^= y; // almost the same but explicit
> return x;
> }
>
> int test_scheduler(int x) {
> return ((x>>2) & 15) ^ ((x>>3) & 31);
> }
> <===
> ...I get the following result:
> ===>
> .file "test.cpp"
> .text
> .globl _Z13test_registeri
> .align 16, 0x90
> .type _Z13test_registeri, at function
> _Z13test_registeri: # @_Z13test_registeri
> .cfi_startproc
> # BB#0: # %entry
> movl %edi, %eax
> sarl $2, %eax
> xorl %edi, %eax
> movl %eax, %ecx
> sarl $3, %ecx
> xorl %eax, %ecx
> movl %ecx, %edx
> sarl $4, %edx
> xorl %ecx, %edx
> movl %edx, %eax
> sarl $5, %eax
> xorl %edx, %eax
> retq
> .Ltmp0:
> .size _Z13test_registeri, .Ltmp0-_Z13test_registeri
> .cfi_endproc
>
> .globl _Z14test_scheduleri
> .align 16, 0x90
> .type _Z14test_scheduleri, at function
> _Z14test_scheduleri: # @_Z14test_scheduleri
> .cfi_startproc
> # BB#0: # %entry
> movl %edi, %eax
> shrl $2, %eax
> andl $15, %eax
> shrl $3, %edi
> andl $31, %edi
> xorl %eax, %edi
> movl %edi, %eax
> retq
> .Ltmp1:
> .size _Z14test_scheduleri, .Ltmp1-_Z14test_scheduleri
> .cfi_endproc
>
>
> .ident "clang version 3.5 (trunk 199507)"
> .section ".note.GNU-stack","", at progbits
> <===
>
> Now once more in detail.
>
> The lines
> x ^= (x >> 2);
> and
> x = x ^ (x >> 4);
> and (!)
> int y = x; x >>= 8; x ^= y; // almost the same but explicit
> are compiled into code like
> movl %edi, %eax
> sarl $2, %eax
> xorl %edi, %eax
> As far as I know optimal for all x86 but the very latest 4th generation Intel Core processors the following variant is better (2 instead of 3 cycles; I proved this for e.g. Intel i7 920) because the first two lines can be executed simultaneously:
> movl %edi, %eax
> sarl $2, %edi # modify source instead of copy
> xorl %edi, %eax
> Is there a special reason to do that this way?
> Interestingly most compilers including ICC and GCC show this strange behavior. I had reported this in an Intel forum as well as for GCC a long time ago but there has been no real reaction...
> Also, why are 4 registers used whereas 2 are sufficient?
>
>
> In the second function the line
> return ((x>>2) & 15) ^ ((x>>3) & 31);
> is compiled into
> movl %edi, %eax
> shrl $2, %eax
> andl $15, %eax
> shrl $3, %edi
> andl $31, %edi
> xorl %eax, %edi
> movl %edi, %eax
> I would have expected that the scheduler interleaves the subexpressions and would be able to get rid of the final move like this:
> movl %edi, %eax
> shrl $3, %edi # modify source instead of copy, see above
> shrl $2, %eax
> andl $31, %edi
> andl $15, %eax
> xorl %edi, %eax # we need %eax here
>
>
> I think this is independent of the used high level language.
> Is this known to the LLVM community?
> May I help to correct this?
In your example, we're copying from/to argument/return registers, hence the extra copies. Normally, it is the register coalescer's job to remove copies but
- our register coalescer is not always very smart
- in this case, it intentionally leaves the physreg (arg/return) copies in place to give the register allocator the most freedom.
The register allocator attempts to remove physreg copies by biasing allocation, but the decisions are subject to the order of allocation which really amounts to luck in cases like this.
We tend to not introduce complexity to optimize special cases involving multiple physreg copies because
- we care more about code within loops
- the performance impact of register copies is very small
In these cases, I think it comes down to luck given the allocation order that we choose within a block (I’m not sure how scheduling could affect it). Do you still see the extra copies on LLVM trunk?
-Andy
>
> Best regards
> Jasper
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list