[PATCH] D27692: [x86] use a single shufps when it can save instructions
Roland Scheidegger via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 13 18:48:32 PST 2016
sroland added a comment.
In https://reviews.llvm.org/D27692#621550, @RKSimon wrote:
> I'd like to propose the following:
> 1 - we get this patch and https://reviews.llvm.org/D27684 approved and committed, providing v4i32 lowering to shufps and avoiding some of the more unnecessary domain switches.
> 2 - get shufps lowering added to target shuffle combining, I added shufpd recently and it's just been the domain issues that I wanted to tidyup up before adding shufps as well
> 3 - add support for v8i32 (and v16i32?) lowering to shufps
> 4 - other missing domain switch patterns (scalar stores and vpermilps/vpshufd come to mind)
> 5 - add support for domain switching to target shuffle combine when the shuffle depth is 3 or more - this will allow pshufd use on pre-AVX targets and seems to introduce some good uses of insertps as well.
> That seems within scope for 4.0 and doesn't involve anything too exotic. After 4.0 we should be in a better position to begin work on moving some of this work to MC combines to better make use of specific scheduler models
Sounds like a good plan to me. As for 3) it is pretty trivial (as seen by my patch) albeit I only did it for v8i32, not v16i32. The latter can always use another native perm shuffle I think though that might be more expensive (well it will have a memory op for sure for the shuffle mask, but beyond that I have no idea neither for KNL nor SKL-E - for that matter I have absolutely no idea if KNL would have domain transition penalties...)
I'd love to see https://llvm.org/bugs/show_bug.cgi?id=31151 addressed as well, either something along the lines of the patch there or differently, then I'm happy with all the shuffles we need :-).
More information about the llvm-commits