[PATCH] combine consecutive subvector 16-byte loads into one 32-byte load (PR21709)

Wed Dec 3 04:17:02 PST 2014

In general, this optimization is right. 
Will it work if I write code without intrinsics, like

%a =  load < v4 x float>*%ptr
%ptr1 = GEP ..
%b = load < v4 x float>*%ptr1
insertelement ..
insertelement ..
?
If it does not work for all possible combinations, this optimization should be done in DAG combine.
If it works, please add a test.

I also think that all possible types should be done in one patch. Just to be sure that this feature is completed. But it is up to you.

================
Comment at: lib/Target/X86/X86InstrSSE.td:8461
@@ +8460,3 @@
+  // TODO: Add patterns for other data types, aligned ops, and stores.
+  def : Pat<(insert_subvector
+              (v8f32 (insert_subvector undef, (loadv4f32 addr:$src), (iPTR 0))),
----------------
If you swap places of 2 insert_subvector this pattern will not work.

http://reviews.llvm.org/D6492