[LLVMdev] unaligned AVX store gets split into two instructions
Ondřej Bílka
neleai at seznam.cz
Tue Jul 9 22:32:08 PDT 2013
On Tue, Jul 09, 2013 at 09:01:48PM -0700, Zach Devito wrote:
> I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector
> loads on AVX.
> 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as
> a single instruction (details below).
> In a matrix-matrix inner-kernel, I see a ~25% decrease in performance,
> which seems to be due to this.
> Any ideas why this changed? Thanks!
What is code and architecture? In most loops spliting makes code faster
when ran on ivy bridge, You could dig intel optimization manual for that
recomendation. Perhaps this code is special case.
> Zach
> LLVM Code:
> define <4 x double> @vstore(<4 x double>*) {
> entry:
> %1 = load <4 x double>* %0, align 8
> ret <4 x double> %1
> }
> ------------------------------------------------------------
> Running llvm-32/bin/llc vstore.ll creates:
> .section __TEXT,__text,regular,pure_instructions
> .globl _vstore
> .align 4, 0x90
> _vstore: ## @vstore
> .cfi_startproc
> ## BB#0: ## %entry
> pushq %rbp
> Ltmp2:
> .cfi_def_cfa_offset 16
> Ltmp3:
> .cfi_offset %rbp, -16
> movq %rsp, %rbp
> Ltmp4:
> .cfi_def_cfa_register %rbp
> vmovups (%rdi), %ymm0
> popq %rbp
> ret
> .cfi_endproc
> ----------------------------------------------------------------
> Running llvm-33/bin/llc vstore.ll creates:
> .section __TEXT,__text,regular,pure_instructions
> .globl _main
> .align 4, 0x90
> _main: ## @main
> .cfi_startproc
> ## BB#0: ## %entry
> vmovups (%rdi), %xmm0
> vinsertf128 $1, 16(%rdi), %ymm0, %ymm0
> ret
> .cfi_endproc
> .subsections_via_symbols
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
--
fat electrons in the lines
More information about the llvm-dev
mailing list