[PATCH] Add new indexed load/store intrinsics.

Hao Liu Hao.Liu at arm.com
Wed Apr 22 19:38:26 PDT 2015

Hi Renato and Ahmed,

I agree with your comments.

But I want to change the plan. Because I think maybe there is no need to use intrinsics.
For the interleaved load about <4 x double>

  <4 x double> @llvm.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>)

I think we can use two common IRs:

  <value> = load <4 x double>, <4 x double>* <ptr>
  shufflevector <4 x double> <value>, <4 x double> undef, <4 x i32> <0, 2, 1, 3>

Even though it is more complex for a backend to match two IRs, it is achievable. I think the disadvantage of  intrinsics is not easy to be optimized.

I want to implement the loop vectorization on interleaved memory access  with vectorload/vectorstore+shufflevector.

What do you think?




