[PATCH] Add new indexed load/store intrinsics.
Hao.Liu at arm.com
Wed Apr 22 19:38:26 PDT 2015
Hi Renato and Ahmed,
I agree with your comments.
But I want to change the plan. Because I think maybe there is no need to use intrinsics.
For the interleaved load about <4 x double>
<4 x double> @llvm.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>)
I think we can use two common IRs:
<value> = load <4 x double>, <4 x double>* <ptr>
shufflevector <4 x double> <value>, <4 x double> undef, <4 x i32> <0, 2, 1, 3>
Even though it is more complex for a backend to match two IRs, it is achievable. I think the disadvantage of intrinsics is not easy to be optimized.
I want to implement the loop vectorization on interleaved memory access with vectorload/vectorstore+shufflevector.
What do you think?
More information about the llvm-commits