[LLVMdev] unaligned AVX store gets split into two instructions
Zach Devito
zdevito at gmail.com
Tue Jul 9 21:01:48 PDT 2013
I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads
on AVX.
3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as
a single instruction (details below).
In a matrix-matrix inner-kernel, I see a ~25% decrease in performance,
which seems to be due to this.
Any ideas why this changed? Thanks!
Zach
LLVM Code:
define <4 x double> @vstore(<4 x double>*) {
entry:
%1 = load <4 x double>* %0, align 8
ret <4 x double> %1
}
------------------------------------------------------------
Running llvm-32/bin/llc vstore.ll creates:
.section __TEXT,__text,regular,pure_instructions
.globl _vstore
.align 4, 0x90
_vstore: ## @vstore
.cfi_startproc
## BB#0: ## %entry
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
vmovups (%rdi), %ymm0
popq %rbp
ret
.cfi_endproc
----------------------------------------------------------------
Running llvm-33/bin/llc vstore.ll creates:
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## @main
.cfi_startproc
## BB#0: ## %entry
vmovups (%rdi), %xmm0
vinsertf128 $1, 16(%rdi), %ymm0, %ymm0
ret
.cfi_endproc
.subsections_via_symbols
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130709/f0222f43/attachment.html>
More information about the llvm-dev
mailing list