[LLVMdev] unaligned AVX store gets split into two instructions

Ondřej Bílka neleai at seznam.cz
Tue Jul 9 22:32:08 PDT 2013


On Tue, Jul 09, 2013 at 09:01:48PM -0700, Zach Devito wrote:
>    I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector
>    loads on AVX.
>    3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as
>    a single instruction (details below).
>    In a matrix-matrix inner-kernel, I see a ~25% decrease in performance,
>    which seems to be due to this.
>    Any ideas why this changed? Thanks!

What is code and architecture? In most loops spliting makes code faster
when ran on ivy bridge, You could dig intel optimization manual for that
recomendation. Perhaps this code is special case.
>    Zach
>    LLVM Code:
>    define <4 x double> @vstore(<4 x double>*) {
>    entry:
>      %1 = load <4 x double>* %0, align 8
>      ret <4 x double> %1
>    }
>    ------------------------------------------------------------
>    Running llvm-32/bin/llc vstore.ll creates:
>            .section        __TEXT,__text,regular,pure_instructions
>            .globl  _vstore
>            .align  4, 0x90
>    _vstore:                                ## @vstore
>            .cfi_startproc
>    ## BB#0:                                ## %entry
>            pushq   %rbp
>    Ltmp2:
>            .cfi_def_cfa_offset 16
>    Ltmp3:
>            .cfi_offset %rbp, -16
>            movq    %rsp, %rbp
>    Ltmp4:
>            .cfi_def_cfa_register %rbp
>            vmovups         (%rdi), %ymm0
>            popq    %rbp
>            ret
>            .cfi_endproc
>    ----------------------------------------------------------------
>    Running llvm-33/bin/llc vstore.ll creates:
>            .section        __TEXT,__text,regular,pure_instructions
>            .globl  _main
>            .align  4, 0x90
>    _main:                                  ## @main
>            .cfi_startproc
>    ## BB#0:                                ## %entry
>            vmovups (%rdi), %xmm0
>            vinsertf128     $1, 16(%rdi), %ymm0, %ymm0
>            ret
>            .cfi_endproc
>    .subsections_via_symbols

> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 

fat electrons in the lines



More information about the llvm-dev mailing list