[LLVMdev] unaligned AVX store gets split into two instructions
Tom Stellard
tom at stellard.net
Tue Jul 9 21:57:13 PDT 2013
On Tue, Jul 09, 2013 at 09:01:48PM -0700, Zach Devito wrote:
> I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads
> on AVX.
> 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as
> a single instruction (details below).
> In a matrix-matrix inner-kernel, I see a ~25% decrease in performance,
> which seems to be due to this.
>
> Any ideas why this changed? Thanks!
>
Hi Zack,
I ran into a similar problem with the R600 backend, and I was able to fix it
by implementing the TargetLowering::allowsUnalignedMemoryAccesses().
Take a look at r184822.
-Tom
> Zach
>
> LLVM Code:
> define <4 x double> @vstore(<4 x double>*) {
> entry:
> %1 = load <4 x double>* %0, align 8
> ret <4 x double> %1
> }
> ------------------------------------------------------------
> Running llvm-32/bin/llc vstore.ll creates:
> .section __TEXT,__text,regular,pure_instructions
> .globl _vstore
> .align 4, 0x90
> _vstore: ## @vstore
> .cfi_startproc
> ## BB#0: ## %entry
> pushq %rbp
> Ltmp2:
> .cfi_def_cfa_offset 16
> Ltmp3:
> .cfi_offset %rbp, -16
> movq %rsp, %rbp
> Ltmp4:
> .cfi_def_cfa_register %rbp
> vmovups (%rdi), %ymm0
> popq %rbp
> ret
> .cfi_endproc
> ----------------------------------------------------------------
> Running llvm-33/bin/llc vstore.ll creates:
> .section __TEXT,__text,regular,pure_instructions
> .globl _main
> .align 4, 0x90
> _main: ## @main
> .cfi_startproc
> ## BB#0: ## %entry
> vmovups (%rdi), %xmm0
> vinsertf128 $1, 16(%rdi), %ymm0, %ymm0
> ret
> .cfi_endproc
>
>
> .subsections_via_symbols
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list