[PATCH] D27692: [x86] use a single shufps when it can save instructions
Roland Scheidegger via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 13 12:42:31 PST 2016
sroland added a comment.
In https://reviews.llvm.org/D27692#621298, @mkuper wrote:
> Just wanted to point out the other direction for this also exists.
> Because we don't even try to match a pshufd in the float domain, even though we could do something like:
That is quite true however it is imho much less severe. Because shufps can do everything pshufd can do, minus the destructive 2-op syntax. Hence this is only a) an issue with pre-avx targets (whereas missing shufps is definitely an issue with both avx-128 and avx-256 too, albeit this patch in question doesn't address the v8i32 cases yet, but it works all the same). And b) even then, it is only one additional mov, the most simple instruction, subject to never reaching execution units and being handled in renaming stage on some cpus even (with 0 latency, albeit of course there's still some cost with having additional instructions). And last d) it is unclear how often that additional mov is actually needed - I don't think it's even possible the shuffle lowering code knows about additional movs needed to preserve regs?
In any case, not arguing this wouldn't be worth looking at, but the benefits look much smaller to me, it only makes a difference with pre-avx target, and you really want to make sure you do this only if there's zero domain transition penalties for using pshufd in a float sequence.
More information about the llvm-commits