[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

Mon Dec 22 22:34:37 PST 2014

Going by Agner's, the register form of pop, on Haswell, has a reciprocal throughput of 0.5, just like the equivalent mov.
The memory form of pop has a reciprocal throughput of 1, but that has no equivalent mov instruction, since x86 can't do a MOVmm. It's equivalent to to a MOVrm followed by a MOVmr.
In any case, that's not really relevant since I'm planning on only adding pushes, not pops.

I'm not saying this can't result in any performance issues (as David noted there may be more complex microarchitectural issues here), just that it's not expected to be a problem on an instruction-per-instruction basis.  

-----Original Message-----
From: Herbie Robinson [mailto:HerbieRobinson at verizon.net] 
Sent: Monday, December 22, 2014 20:41
To: Kuperstein, Michael M; Caldarale, Charles R; LLVMdev at cs.uiuc.edu
Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

But the r,m move has a reciprocal throughput of 0.5 while the pop is 1.

It sounds like this should be optional as you have already proposed.

On 12/22/14 2:21 AM, Kuperstein, Michael M wrote:
> The official Intel guide has less resolution here than we need.
> Consider that the MOV instruction described in C-21 only uses the ALU execution unit, not memory.
>
> If we look at Agner's table for Haswell, the relevant form of the MOV instruction (m,r) has a latency of 3 cycles and a reciprocal throughput of 1, same as a register PUSH.
> The same applies for other recent x86 processors.
>
> -----Original Message-----
> From: Caldarale, Charles R [mailto:Chuck.Caldarale at unisys.com]
> Sent: Monday, December 22, 2014 04:56
> To: Herbie Robinson; Kuperstein, Michael M; LLVMdev at cs.uiuc.edu
> Subject: RE: [LLVMdev] [RFC] [X86] Mov to push transformation in 
> x86-32 call sequences
>
>> From: llvmdev-bounces at cs.uiuc.edu 
>> [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Herbie Robinson
>> Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in 
>> x86-32 call sequences
>>> On 12/21/14 4:27 AM, Kuperstein, Michael M wrote:
>>> Which performance guidelines are you referring to?
>> Table C-21 in "Intel(r) 64 and IA-32 Architectures Optimization Reference Manual", September 2014.
>> It hasn't changed.  It still lists push and pop instructions as 2-3 times more expensive as mov.
> And verified by Agner Fog's independent measurements:
> http://www.agner.org/optimize/instruction_tables.pdf
>
> The relevant Haswell numbers are on pages 186 - 187.
>
>   -Chuck
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.