[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
Anthony Blake
amb33 at cs.waikato.ac.nz
Fri Jul 6 05:25:58 PDT 2012
On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>
>> I've noticed that LLVM tends to generate suboptimal code and spill an
>> excessive amount of registers in large functions, such as in those
>> that are automatically generated by FFTW.
>
> One problem might be that we're forcing the 16 stores to the out array to happen in source order, which constrains the schedule. The stores are clearly non-aliasing.
>
>> LLVM generates good code for a function that computes an 8-point
>> complex FFT, but from 16-point upwards, icc or gcc generates much
>> better code. Here is an example of a sequence of instructions from a
>> 32-point FFT, compiled with clang/LLVM 3.1 for x86_64 with SSE:
>>
>> [...]
>> movaps 32(%rdi), %xmm3
>> movaps 48(%rdi), %xmm2
>> movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1
>> movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4
>> addps %xmm0, %xmm1
>> movaps %xmm1, -16(%rbp) ## 16-byte Spill
>> movaps 144(%rdi), %xmm3 ### <-- new data mov'ed into xmm3
>> [...]
>>
>> xmm3 loaded, duplicated into 2 registers, and then discarded as other
>> data is loaded into it. Can anyone shed some light on why this might
>> be happening?
>
> I'm not actually seeing this behavior on trunk.
>
I've just tried trunk, and although behavior like above isn't
immediately obvious, trunk generates more instructions and spills more
registers compared to 3.1.
amb
More information about the llvm-dev
mailing list