[PATCH] D47401: [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.
Craig Topper via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri May 25 17:25:57 PDT 2018
craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel, GBuella.
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.
For v16i32 and floating point we have legacy 128/256 bit instructions we can use.
I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 449334 bytes
Desc: not available
More information about the cfe-commits