[PATCH] D14901: [X86][SSE] Improve i16 splatting shuffles

Quentin Colombet via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 17 09:24:23 PST 2015


Thanks for the numbers Simon.

> On Dec 17, 2015, at 3:08 AM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote:
> 
> RKSimon added a comment.
> 
> Tested on Jaguar CPU:
> 
> Throughput: 
> Old 3op shuffle: 4cy
> New 2op shuffle: 2cy

I am confused.
When the code sequence is shorter, I was expecting this, but this number is not for the problem we were discussing, i.e., when the shufb is replaced by 2 shuf(w|hd, whatever), right?
If it is, I am missing something because it should be 2 uops in both cases.

Thanks,
-Quentin

> pshufb_rr        3cy
> pshufb_rm        3cy
> 
> Latency:
> Old 3op shuffle: 4cy
> New 2op shuffle: 3cy
> pshufb_rr        3cy
> pshufb_rm        4cy
> 
> 
> Repository:
>  rL LLVM
> 
> http://reviews.llvm.org/D14901
> 
> 
> 



More information about the llvm-commits mailing list