[LLVMdev] Scheduling quirks

Wed Jan 22 10:03:37 PST 2014

Hello Andrew!

Referring:
int test_scheduler(int x) {
  return ((x>>2) & 15) ^ ((x>>3) & 31);
  }
=>
	movl	%edi, %eax
	shrl	$2, %eax
	andl	$15, %eax
	shrl	$3, %edi
	andl	$31, %edi
	xorl	%eax, %edi
	movl	%edi, %eax
instead of
	movl	%edi, %eax
	shrl	$3, %edi    # modify source instead of copy
	shrl	$2, %eax    # modifications interlaced
	andl	$31, %edi
	andl	$15, %eax
	xorl	%edi, %eax    # we need %eax here

 > In your example, we're copying from/to argument/return registers,
 > hence the extra copies.

This might be the reason for the final move.
However a simple peephole optimizer could catch this case.

 > [...] Do you still see the extra copies on LLVM trunk?

I retested this with a fresh checkout and compile of trunk (svn 199769) 
and got the same result.

BTW: Providing e.g. -march=atom helps a bit by sporting the expected 
interleaving for this example but IMHO this ought to be the case even 
without explicitely specifying a processor type.

The missed optimization regarding the modification of a copy occurs 
quite often and costs a cycle on almost all x86 processors.

Best regards
Jasper