[PATCH] D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute

Mon Dec 11 23:25:35 PST 2017

zvi added a comment.

In https://reviews.llvm.org/D40865#949585, @RKSimon wrote:

> In https://reviews.llvm.org/D40865#948072, @zvi wrote:
>
> > @RKSimon, I'm not too familiar with the MachineCombiner. Are there already any shuffle cases that are handled or was that wishful thinking? :)
>
>
> Yes, it keeps being proposed but it's a big job, part of the idea behind https://reviews.llvm.org/D40602 was to show how it'd work in principle for a much simpler case (double shifts) than shuffles. The idea would be to perform more aggressive combining to variable shuffles (PSHUFB/VPERMPS etc.) in the MC, so we'd still keep to the '3 shuffles limit' for variable mask folding in DAG lowering as that works better for AMD Jaguar/Bulldozer/Zen and older Intel cores, and then the MC driven by the scheduler models tries again later on. But there's still concerns that there will be plenty of regressions due to register pressure, load latency etc. and whether the code really is port5 bound....
>
> A second (temporary?) option mentioned in https://reviews.llvm.org/D38318 was to add a feature flag for more recent intel cores that reduced the 'AllowVariableMask depth limit' in combineX86ShuffleChain to 2.

Thanks for the explanation, Simon. I will update the patch with the feature you proposed as a temporary solution untill the MachineCombiner is ready to handle this case.

https://reviews.llvm.org/D40865