[PATCH] D27692: [x86] use a single shufps when it can save instructions
Michael Kuperstein via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 13 13:01:04 PST 2016
mkuper added a comment.
In https://reviews.llvm.org/D27692#621431, @sroland wrote:
> In https://reviews.llvm.org/D27692#621298, @mkuper wrote:
> > Just wanted to point out the other direction for this also exists.
> > Because we don't even try to match a pshufd in the float domain, even though we could do something like:
> That is quite true however it is imho much less severe. Because shufps can do everything pshufd can do, minus the destructive 2-op syntax. Hence this is only a) an issue with pre-avx targets (whereas missing shufps is definitely an issue with both avx-128 and avx-256 too, albeit this patch in question doesn't address the v8i32 cases yet, but it works all the same). And b) even then, it is only one additional mov, the most simple instruction, subject to never reaching execution units and being handled in renaming stage on some cpus even (with 0 latency, albeit of course there's still some cost with having additional instructions). And last d) it is unclear how often that additional mov is actually needed - I don't think it's even possible the shuffle lowering code knows about additional movs needed to preserve regs?
> In any case, not arguing this wouldn't be worth looking at, but the benefits look much smaller to me, it only makes a difference with pre-avx target, and you really want to make sure you do this only if there's zero domain transition penalties for using pshufd in a float sequence.
I'm not sure I completely agree on the details (In particular, I'm not certain the older non-AVX arches where the mov is needed have zero-latency movs), but I mostly agree. Especially since it looks like the same platforms that need the mov are the ones that *do* have transition penalty. So using a shufps may be worth it only with something like -march=nehalem -mtune=ivb, where we generate SSE code, but expect to run it on a newer platform.
So this is definitely lower priority, just wanted to point out it exists.
More information about the llvm-commits