[PATCH] D33938: [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 6 07:44:56 PDT 2017


spatel added inline comments.


================
Comment at: test/CodeGen/X86/avx-vperm2x128.ll:55
+; ALL:       ## BB#0: ## %entry
+; ALL-NEXT:    vperm2f128 {{.*#+}} ymm0 = mem[0,1,0,1]
+; ALL-NEXT:    retq
----------------
spatel wrote:
> RKSimon wrote:
> > I wonder what's preventing this from using VBROADCASTF128 ?
> I think it's just that we don't have the code to do the load shrinking + address offset. Ie, this is a 32-byte load even though we're only using half of it.
On 2nd thought, it's more likely because we don't recognize this as a splat because we didn't see the "canWidenShuffleElements()" opportunity. Ie, these are 32-bit elts, so the mask isn't a simple splat.


https://reviews.llvm.org/D33938





More information about the llvm-commits mailing list