[LLVMdev] Macro-op fusion experiment
Jakob Stoklund Olesen
stoklund at 2pi.dk
Fri Apr 8 09:25:34 PDT 2011
On Apr 8, 2011, at 3:29 AM, Nicolas Capens wrote:
> x86 processors use macro-op fusion to merge together two instructions and execute them as one. So it's beneficial for the compiler to emit them as a pair.
> Currently only compare and jump instructions get fused though. And I was wondering whether it also makes sense to fuse move and arithmetic instructions together, to form non-destructive instructions (which x86 lacks for regular instructions). For instance:
> 8B C3 mov eax, ebx
> 03 C1 add eax, ecx
> 8B C3 03 C1 add eax, ebx, ecx
> There's no difference in the binary encoding; it's just considered one instruction at a logical level and inside the hardware (I'm assuming x86's RISC internals actually use non-destructive micro-operations).
Most x86 implementations use register renaming these days, so micro-operations are non-destructive, but they don't refer to architectural registers. They refer to a larger number of real registers.
Register copies are mostly free to execute except they increase code size and consume decoder resources. To my knowledge, they are not fused in the way you describe.
Intel's optimization reference manual describes which instructions can be fused. The Sandy Bridge processors fuse more pairs than previous generations, but the second instruction is always a conditional branch.
There is no need to define pseudo-instructions to support this. If you want to experiment, you could add a late pass that tries to form fusable pairs by pushing instructions down to the conditional branch. This should happen after register allocation where code is often inserted before a branch.
I would be interested to see the performance impact of such a pass.
More information about the llvm-dev