[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

Anthony Blake amb33 at cs.waikato.ac.nz
Fri Jul 6 06:00:06 PDT 2012


On Sat, Jul 7, 2012 at 12:40 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>> On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>>> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>>>>        [...]
>>>>       movaps  32(%rdi), %xmm3
>>>>       movaps  48(%rdi), %xmm2
>>>>       movaps  %xmm3, %xmm1     ### <-- xmm3 mov'ed into xmm1
>>>>       movaps  %xmm3, %xmm4     ### <-- xmm3 mov'ed into xmm4
>>>>       addps   %xmm0, %xmm1
>>>>       movaps  %xmm1, -16(%rbp)        ## 16-byte Spill
>>>>       movaps  144(%rdi), %xmm3   ### <-- new data mov'ed into xmm3
>>>>        [...]
>>>>
>>>> xmm3 loaded, duplicated into 2 registers, and then discarded as other
>>>> data is loaded into it. Can anyone shed some light on why this might
>>>> be happening?
>>>
>>> I'm not actually seeing this behavior on trunk.
>>>
>>
>> I've just tried trunk, and although behavior like above isn't
>> immediately obvious, trunk generates more instructions and spills more
>> registers compared to 3.1.
>>
>
> Actually, here is an occurrence of that behavior when compiling the
> code with trunk:
>
>         [...]
>         movaps  %xmm1, %xmm0      ###  xmm1 mov'ed to xmm0
>         movaps  %xmm1, %xmm14    ###  xmm1 mov'ed to xmm14
>         addps   %xmm7, %xmm0
>         movaps  %xmm7, %xmm13
>         movaps  %xmm0, %xmm1      ###  and now other data is mov'ed into xmm1,
> making one of the above movaps superfluous
>         [...]

As well as many occurrences in the above form, a similar form appears:

        [...]
        movaps	%xmm5, %xmm7
	movaps	%xmm7, %xmm3
	movaps	-96(%rsp), %xmm0        ## 16-byte Reload
	subps	%xmm0, %xmm3
	addps	%xmm0, %xmm7
	movaps	240(%rsp), %xmm0        ## 16-byte Reload
	movaps	-128(%rsp), %xmm1       ## 16-byte Reload
	movlhps	%xmm0, %xmm1            ## xmm1 = xmm1[0],xmm0[0]
	movaps	%xmm8, %xmm4
	movaps	160(%rsp), %xmm5        ## 16-byte Reload
        [...]

Here the problem manifests with xmm3, 5 and 7, but in contrast to the
above case, there is now data dependence in the first pair of
instructions.

amb



More information about the llvm-dev mailing list