[LLVMdev] Macro-op fusion experiment

NAKAMURA Takumi geek4civic at gmail.com
Fri Apr 8 09:56:12 PDT 2011

>>                 8B C3 mov eax, ebx
>>                 03 C1 add eax, ecx
>> becomes
>>                 8B C3 03 C1 add eax, ebx, ecx

In my understanding, twoaddr pass tends to emit such a sequence.

Though I don't have sandybridge, I have not measured.
Prior processors(intel and amd) might spend 1 ALU to execute "mov",
then mov - add must have dependency.

In contrast, the sequence below might be executed in parallel;
mov %ebx, %eax
add %ecx, %ebx
(I understand it might not be applicable in all cases)


More information about the llvm-dev mailing list