[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

Jakob Stoklund Olesen stoklund at 2pi.dk
Thu Jul 5 23:39:08 PDT 2012


On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:

> I've noticed that LLVM tends to generate suboptimal code and spill an
> excessive amount of registers in large functions, such as in those
> that are automatically generated by FFTW.

One problem might be that we're forcing the 16 stores to the out array to happen in source order, which constrains the schedule. The stores are clearly non-aliasing.

> LLVM generates good code for a function that computes an 8-point
> complex FFT, but from 16-point upwards, icc or gcc generates much
> better code. Here is an example of a sequence of instructions from a
> 32-point FFT, compiled with clang/LLVM 3.1 for x86_64 with SSE:
> 
>        [...]
> 	movaps	32(%rdi), %xmm3
> 	movaps	48(%rdi), %xmm2
> 	movaps	%xmm3, %xmm1     ### <-- xmm3 mov'ed into xmm1
> 	movaps	%xmm3, %xmm4     ### <-- xmm3 mov'ed into xmm4
> 	addps	%xmm0, %xmm1
> 	movaps	%xmm1, -16(%rbp)        ## 16-byte Spill
> 	movaps	144(%rdi), %xmm3   ### <-- new data mov'ed into xmm3
>        [...]
> 
> xmm3 loaded, duplicated into 2 registers, and then discarded as other
> data is loaded into it. Can anyone shed some light on why this might
> be happening?

I'm not actually seeing this behavior on trunk.

/jakob





More information about the llvm-dev mailing list