[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

Sun Dec 21 23:21:13 PST 2014

The official Intel guide has less resolution here than we need.
Consider that the MOV instruction described in C-21 only uses the ALU execution unit, not memory.

If we look at Agner's table for Haswell, the relevant form of the MOV instruction (m,r) has a latency of 3 cycles and a reciprocal throughput of 1, same as a register PUSH.
The same applies for other recent x86 processors.

-----Original Message-----
From: Caldarale, Charles R [mailto:Chuck.Caldarale at unisys.com] 
Sent: Monday, December 22, 2014 04:56
To: Herbie Robinson; Kuperstein, Michael M; LLVMdev at cs.uiuc.edu
Subject: RE: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Herbie Robinson
> Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

> > On 12/21/14 4:27 AM, Kuperstein, Michael M wrote:
> > Which performance guidelines are you referring to?

> Table C-21 in "Intel(r) 64 and IA-32 Architectures Optimization Reference Manual", September 2014.

> It hasn't changed.  It still lists push and pop instructions as 2-3 times more expensive as mov.

And verified by Agner Fog's independent measurements: 
http://www.agner.org/optimize/instruction_tables.pdf

The relevant Haswell numbers are on pages 186 - 187.

 -Chuck

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.