[PATCH] optimize merging of scalar loads for 32-byte vectors [X86, AVX] (PR21710)
Sanjay Patel
spatel at rotateright.com
Thu Dec 4 13:04:27 PST 2014
Hi qcolombet, andreadb, RKSimon,
This patch fixes the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ). Before we crack 32-byte build vectors into smaller chunks (and then subsequently glue them back together), we should look for the easy case where we can just load all elements in a single op.
The codegen change for the latter 2 testcases (derived from the bug report examples) is:
vmovss 16(%rdi), %xmm1
vmovups (%rdi), %xmm0
vinsertps $16, 20(%rdi), %xmm1, %xmm1
vinsertps $32, 24(%rdi), %xmm1, %xmm1
vinsertps $48, 28(%rdi), %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
retq
To:
vmovups (%rdi), %ymm0
retq
And:
vmovsd 16(%rdi), %xmm1
vmovupd (%rdi), %xmm0
vmovhpd 24(%rdi), %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
retq
To:
vmovups (%rdi), %ymm0
retq
I think it's benign that we generate 'vmovups' in that 2nd case rather than 'vmovupd' because we're not using the result here. I confirmed that we will use a double instruction if we actually use the load result in this function.
I've also updated the existing load merge test to use FileCheck and added a v4f32 test for completeness.
http://reviews.llvm.org/D6536
Files:
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/vec_loadsingles.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6536.16945.patch
Type: text/x-patch
Size: 4874 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141204/b5b9e161/attachment.bin>
More information about the llvm-commits
mailing list