[PATCH] D14901: [X86][SSE] Improve i16 splatting shuffles

Wed Dec 16 14:12:58 PST 2015

> On Dec 16, 2015, at 12:31 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote:
> 
> RKSimon added a comment.
> 
> In http://reviews.llvm.org/D14901#312358, @escha wrote:
> 
>> Just a side note, but Agner claims pshufb is 1 cycle latency on Wolfdale, Nehalem, and Ivy Bridge as well.
> 
> 
> But not any recent Atom or AMD targets - it still doesn't account for the cost of loading the shuffle mask either unfortunately. On the whole I think the 3 op threshold is about right.

We usually don’t optimize for latency but for throughput I think, i.e., pshufb is better.
On the other hand, given that we know pshufb will load from the pool it will be broken down into two uops… thus, the trade may be acceptable.

What are the performance numbers for this patch?

Thanks,
Q. 

> 
> 
> Repository:
>  rL LLVM
> 
> http://reviews.llvm.org/D14901
> 
> 
>