[LLVMdev] unaligned AVX store gets split into two instructions

Tue Jul 9 21:01:48 PDT 2013

I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads
on AVX.
3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as
a single instruction (details below).
In a matrix-matrix inner-kernel, I see a ~25% decrease in performance,
which seems to be due to this.

Any ideas why this changed? Thanks!

Zach

LLVM Code:
define <4 x double> @vstore(<4 x double>*) {
entry:
  %1 = load <4 x double>* %0, align 8
  ret <4 x double> %1
}
------------------------------------------------------------
Running llvm-32/bin/llc vstore.ll creates:
.section __TEXT,__text,regular,pure_instructions
.globl _vstore
.align 4, 0x90
_vstore:                                ## @vstore
.cfi_startproc
## BB#0:                                ## %entry
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
vmovups (%rdi), %ymm0
popq %rbp
ret
.cfi_endproc
----------------------------------------------------------------
Running llvm-33/bin/llc vstore.ll creates:
        .section        __TEXT,__text,regular,pure_instructions
        .globl  _main
        .align  4, 0x90
_main:                                  ## @main
        .cfi_startproc
## BB#0:                                ## %entry
        vmovups (%rdi), %xmm0
        vinsertf128     $1, 16(%rdi), %ymm0, %ymm0
        ret
        .cfi_endproc

.subsections_via_symbols
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130709/f0222f43/attachment.html>