<div>I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads on AVX.</div><div>3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a single instruction (details below).</div><div>
In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which seems to be due to this.</div><div><br></div><div>Any ideas why this changed? Thanks!</div><div><br></div><div>Zach</div><div><br></div><div>LLVM Code:</div>
<div><div>define <4 x double> @vstore(<4 x double>*) {</div><div>entry:</div><div> %1 = load <4 x double>* %0, align 8</div><div> ret <4 x double> %1</div><div>}</div></div><div>------------------------------------------------------------</div>
<div>Running llvm-32/bin/llc vstore.ll creates:</div><div><div><span class="Apple-tab-span" style="white-space:pre"> </span>.section<span class="Apple-tab-span" style="white-space:pre"> </span>__TEXT,__text,regular,pure_instructions</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>.globl<span class="Apple-tab-span" style="white-space:pre"> </span>_vstore</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>.align<span class="Apple-tab-span" style="white-space:pre"> </span>4, 0x90</div>
<div>_vstore: ## @vstore</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_startproc</div><div>## BB#0: ## %entry</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>pushq<span class="Apple-tab-span" style="white-space:pre"> </span>%rbp</div>
<div>Ltmp2:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_offset 16</div><div>Ltmp3:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_offset %rbp, -16</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>movq<span class="Apple-tab-span" style="white-space:pre"> </span>%rsp, %rbp</div><div>Ltmp4:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_register %rbp</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>vmovups<span class="Apple-tab-span" style="white-space:pre"> </span>(%rdi), %ymm0</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>popq<span class="Apple-tab-span" style="white-space:pre"> </span>%rbp</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>ret</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_endproc</div></div><div>----------------------------------------------------------------</div>
<div>Running llvm-33/bin/llc vstore.ll creates:</div><div><div> .section __TEXT,__text,regular,pure_instructions</div><div> .globl _main</div><div> .align 4, 0x90</div><div>_main: ## @main</div>
<div> .cfi_startproc</div><div>## BB#0: ## %entry</div><div> vmovups (%rdi), %xmm0</div><div> vinsertf128 $1, 16(%rdi), %ymm0, %ymm0</div><div> ret</div><div>
.cfi_endproc</div><div><br></div><div><br></div><div>.subsections_via_symbols</div></div>