[LLVMdev] Macro-op fusion experiment
geek4civic at gmail.com
Fri Apr 8 09:56:12 PDT 2011
>> 8B C3 mov eax, ebx
>> 03 C1 add eax, ecx
>> 8B C3 03 C1 add eax, ebx, ecx
In my understanding, twoaddr pass tends to emit such a sequence.
Though I don't have sandybridge, I have not measured.
Prior processors(intel and amd) might spend 1 ALU to execute "mov",
then mov - add must have dependency.
In contrast, the sequence below might be executed in parallel;
mov %ebx, %eax
add %ecx, %ebx
(I understand it might not be applicable in all cases)
More information about the llvm-dev