[PATCH] D33938: [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load

Sat Jun 10 09:39:44 PDT 2017

RKSimon accepted this revision.
RKSimon added a comment.
This revision is now accepted and ready to land.

LGTM.

Looking at lowerV2X128VectorShuffle, shuffle combining will have a much easier time if we keep to 256-bit vectors (blends / X86ISD::VPERM2X128) as much as possible - subvector extract/insert chains makes combining really tricky - and this dealing with memory cases looks like a good first step.

https://reviews.llvm.org/D33938