[LLVMdev] unaligned AVX store gets split into two instructions

Tue Jul 9 22:01:33 PDT 2013

On Tue, Jul 9, 2013 at 9:01 PM, Zach Devito <zdevito at gmail.com> wrote:
> I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads
> on AVX.
> 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a
> single instruction (details below).
> In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which
> seems to be due to this.
>
> Any ideas why this changed? Thanks!

This was intentional; apparently doing it with two instructions is
supposed to be faster.  See r172868/r172894.

Adding Nadav in case he has anything more to say.

-Eli