[PATCH] combine consecutive subvector 16-byte loads into one 32-byte load (PR21709)

Wed Dec 3 09:01:06 PST 2014

Hi Elena - thanks for your feedback! Updated patch attached.

I've added test cases that use shufflevector rather than the vinsertf128 intrinsic. This is derived from Andrea's alternate code example in the bug report. Does this cover the scenario you're thinking about using insertelement(s)? Unless I'm mistaken, we can only use insertelement with scalars, so this would be the expected pattern from IR that had insert/extract?

I've also added test cases that swap the order of the operands to the intrinsic and shuffles. Does this answer your concern about changing the order of the pattern parameters?

http://reviews.llvm.org/D6492

Files:
  lib/Target/X86/X86InstrInfo.td
  lib/Target/X86/X86InstrSSE.td
  test/CodeGen/X86/unaligned-32-byte-memops.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6492.16867.patch
Type: text/x-patch
Size: 6191 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141203/28b3ded0/attachment.bin>