[PATCH] D27692: [x86] use a single shufps when it can save instructions

Roland Scheidegger via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 13 18:48:32 PST 2016

sroland added a comment.

In https://reviews.llvm.org/D27692#621550, @RKSimon wrote:

> I'd like to propose the following:
> 1 - we get this patch and https://reviews.llvm.org/D27684 approved and committed, providing v4i32 lowering to shufps and avoiding some of the more unnecessary domain switches.
>  2 - get shufps lowering added to target shuffle combining, I added shufpd recently and it's just been the domain issues that I wanted to tidyup up before adding shufps as well
>  3 - add support for v8i32 (and v16i32?) lowering to shufps
>  4 - other missing domain switch patterns (scalar stores and vpermilps/vpshufd come to mind)
>  5 - add support for domain switching to target shuffle combine when the shuffle depth is 3 or more - this will allow pshufd use on pre-AVX targets and seems to introduce some good uses of insertps as well.
> That seems within scope for 4.0 and doesn't involve anything too exotic. After 4.0 we should be in a better position to begin work on moving some of this work to MC combines to better make use of specific scheduler models

Sounds like a good plan to me. As for 3) it is pretty trivial (as seen by my patch) albeit I only did it for v8i32, not v16i32. The latter can always use another native perm shuffle I think though that might be more expensive (well it will have a memory op for sure for the shuffle mask, but beyond that I have no idea neither for KNL nor SKL-E - for that matter I have absolutely no idea if KNL would have domain transition penalties...)

I'd love to see https://llvm.org/bugs/show_bug.cgi?id=31151 addressed as well, either something along the lines of the patch there or differently, then I'm happy with all the shuffles we need :-).


More information about the llvm-commits mailing list