[PATCH] D14901: [X86][SSE] Improve i16 splatting shuffles
Quentin Colombet via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 17 09:24:23 PST 2015
Thanks for the numbers Simon.
> On Dec 17, 2015, at 3:08 AM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote:
>
> RKSimon added a comment.
>
> Tested on Jaguar CPU:
>
> Throughput:
> Old 3op shuffle: 4cy
> New 2op shuffle: 2cy
I am confused.
When the code sequence is shorter, I was expecting this, but this number is not for the problem we were discussing, i.e., when the shufb is replaced by 2 shuf(w|hd, whatever), right?
If it is, I am missing something because it should be 2 uops in both cases.
Thanks,
-Quentin
> pshufb_rr 3cy
> pshufb_rm 3cy
>
> Latency:
> Old 3op shuffle: 4cy
> New 2op shuffle: 3cy
> pshufb_rr 3cy
> pshufb_rm 4cy
>
>
> Repository:
> rL LLVM
>
> http://reviews.llvm.org/D14901
>
>
>
More information about the llvm-commits
mailing list