<p dir="ltr"><br>

On 22 Apr 2015 8:19 pm, "Hao Liu" <<a href="mailto:Hao.Liu@arm.com">Hao.Liu@arm.com</a>> wrote:<br>

><br>

> In <a href="http://reviews.llvm.org/D9195#160109">http://reviews.llvm.org/D9195#160109</a>, @qcolombet wrote:<br>

><br>

> > Hi Hao,<br>

> ><br>

> > I share Ahmed’s concerned and believe the scalarization should be done as part of the SDAG legalization.<br>

> ><br>

> > Cheers,<br>

> > -Quentin<br>

><br>

><br>

> Yeah, that make sense but it seems difficult to do legalization for a backend who doesn't support it. So I think the problem is the intrinsic itself.<br>

><br>

> Also, I just think maybe such new intrinsics are not necessary for interleaved accesses.<br>

> For the interleaved load about <4 x double><br>

><br>

>   <4 x double> @llvm.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>)<br>

><br>

> I think we can use<br>

><br>

>   <value> = load <4 x double>, <4 x double>* <ptr><br>

>   shufflevector <4 x double> <value>, <4 x double> undef, <4 x i32> <0, 2, 1, 3><br>

><br>

> Even though it is more complex for a backend to match two IRs into one instruction, it is achievable. I think the disadvantage of intrinsics is not easy to be optimized. Also, I'm always worrying about it's error prone to allow an index vector with arbitrary elements.<br>

><br>

> I'll try to implement the vectorization on interleaved memory access with vectorload/vectorstore+shufflevector. If it is achievable, I think it is better than new intrinsics.<br>

><br>

> What do you think?</p>

<p dir="ltr">If you can, I think that will be a better option. </p>

<p dir="ltr">Cheers, <br>

Renato </p>