[PATCH] D14901: [X86][SSE] Improve i16 splatting shuffles

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 21 13:07:54 PST 2015

RKSimon added a comment.

Sorry Quentin - I missed your follow up email to the list - copied here:

> > Tested on Jaguar CPU:


> > 


> > Throughput: 


> >  Old 3op shuffle: 4cy


> >  New 2op shuffle: 2cy


> >  pshufb_rr        3cy


> >  pshufb_rm        3cy



> I am confused.

>  When the code sequence is shorter, I was expecting this, but this number is not for the problem we were discussing, i.e., when the shufb is replaced by 2 shuf(w|hd, whatever), right?

>  If it is, I am missing something because it should be 2 uops in both cases.

I'm confused too - I'm not certain what outstanding problem with my patch you think I should be addressing.

What it does is improve vXi16 shuffles so that more patterns can be performed in 2uops instead of 3uops, a side effect of which is that a later combine stage in PerformShuffleCombine (combineX86ShufflesRecursively) no longer merges these into a single PSHUFB as its threshold for combining is 3uops. The timing tests I did demonstrated that this threshold is probably about right - although I accept that more recent targets can perform PSHUFB faster.

What am I missing?



More information about the llvm-commits mailing list