[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
Anthony Blake
amb33 at cs.waikato.ac.nz
Fri Jul 6 06:00:06 PDT 2012
On Sat, Jul 7, 2012 at 12:40 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>> On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>>> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>>>> [...]
>>>> movaps 32(%rdi), %xmm3
>>>> movaps 48(%rdi), %xmm2
>>>> movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1
>>>> movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4
>>>> addps %xmm0, %xmm1
>>>> movaps %xmm1, -16(%rbp) ## 16-byte Spill
>>>> movaps 144(%rdi), %xmm3 ### <-- new data mov'ed into xmm3
>>>> [...]
>>>>
>>>> xmm3 loaded, duplicated into 2 registers, and then discarded as other
>>>> data is loaded into it. Can anyone shed some light on why this might
>>>> be happening?
>>>
>>> I'm not actually seeing this behavior on trunk.
>>>
>>
>> I've just tried trunk, and although behavior like above isn't
>> immediately obvious, trunk generates more instructions and spills more
>> registers compared to 3.1.
>>
>
> Actually, here is an occurrence of that behavior when compiling the
> code with trunk:
>
> [...]
> movaps %xmm1, %xmm0 ### xmm1 mov'ed to xmm0
> movaps %xmm1, %xmm14 ### xmm1 mov'ed to xmm14
> addps %xmm7, %xmm0
> movaps %xmm7, %xmm13
> movaps %xmm0, %xmm1 ### and now other data is mov'ed into xmm1,
> making one of the above movaps superfluous
> [...]
As well as many occurrences in the above form, a similar form appears:
[...]
movaps %xmm5, %xmm7
movaps %xmm7, %xmm3
movaps -96(%rsp), %xmm0 ## 16-byte Reload
subps %xmm0, %xmm3
addps %xmm0, %xmm7
movaps 240(%rsp), %xmm0 ## 16-byte Reload
movaps -128(%rsp), %xmm1 ## 16-byte Reload
movlhps %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0]
movaps %xmm8, %xmm4
movaps 160(%rsp), %xmm5 ## 16-byte Reload
[...]
Here the problem manifests with xmm3, 5 and 7, but in contrast to the
above case, there is now data dependence in the first pair of
instructions.
amb
More information about the llvm-dev
mailing list