<p dir="ltr"><br>

On 23 Apr 2015 3:38 am, "Hao Liu" <<a href="mailto:Hao.Liu@arm.com">Hao.Liu@arm.com</a>> wrote:<br>

><br>

> Hi Renato and Ahmed,<br>

><br>

> I agree with your comments.<br>

><br>

> But I want to change the plan. Because I think maybe there is no need to use intrinsics.<br>

> For the interleaved load about <4 x double><br>

><br>

>   <4 x double> @llvm.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>)<br>

><br>

> I think we can use two common IRs:<br>

><br>

>   <value> = load <4 x double>, <4 x double>* <ptr><br>

>   shufflevector <4 x double> <value>, <4 x double> undef, <4 x i32> <0, 2, 1, 3><br>

><br>

> Even though it is more complex for a backend to match two IRs, it is achievable. I think the disadvantage of  intrinsics is not easy to be optimized.<br>

><br>

> I want to implement the loop vectorization on interleaved memory access  with vectorload/vectorstore+shufflevector.<br>

><br>

> What do you think?</p>

<p dir="ltr">I agree. If it's possible to represent it in plain IR, I see no reason to not do it. </p>

<p dir="ltr">I'll be particularly interested in how other passes scramble the accesses, making the pattern irrecoverable. But I guess will find that out as you progress with the examples and tests. </p>

<p dir="ltr">Cheers, <br>

Renato </p>