[PATCH] combine consecutive subvector 16-byte loads into one 32-byte load (PR21709)
Elena Demikhovsky
elena.demikhovsky at intel.com
Wed Dec 3 04:17:02 PST 2014
In general, this optimization is right.
Will it work if I write code without intrinsics, like
%a = load < v4 x float>*%ptr
%ptr1 = GEP ..
%b = load < v4 x float>*%ptr1
insertelement ..
insertelement ..
?
If it does not work for all possible combinations, this optimization should be done in DAG combine.
If it works, please add a test.
I also think that all possible types should be done in one patch. Just to be sure that this feature is completed. But it is up to you.
================
Comment at: lib/Target/X86/X86InstrSSE.td:8461
@@ +8460,3 @@
+ // TODO: Add patterns for other data types, aligned ops, and stores.
+ def : Pat<(insert_subvector
+ (v8f32 (insert_subvector undef, (loadv4f32 addr:$src), (iPTR 0))),
----------------
If you swap places of 2 insert_subvector this pattern will not work.
http://reviews.llvm.org/D6492
More information about the llvm-commits
mailing list