[RFC][PATCH][LoopVectorize] Teach Loop Vectorizer about interleaved data accesses

Hao Liu Hao.Liu at arm.com
Mon Mar 23 22:43:45 PDT 2015


Hi Renato,

>> 1) It is not safe due to possible memory access after eof buffer
>
>Though handling EOF with your intrinsic proposal means predicating every
>mask if the range is unknown, right? Can't this be handled in the same way we
>handle EOF with the current vectorization factor, but doubled/tripled/etc.?
>
[Hao Liu] 
Does EOF mean the tail iterations that cannot be vectorized? Current Loop Vectorizer generates a scalar loop for the left several iterations. 

>
>> 2) I don't want to load odd elements if I need only even - nobody says
>> that it should be implemented by sequential loads with shuffle
>
>Exactly. Even if there is no other [x+1] operation, VLDN is still beneficial.
>
[Hao Liu] 
Yes, VLDN is beneficial even if we don't access all the sequential elements. But VSTN seems not beneficial. If we fail to match several apart intrinsics into VSTN, we need to load each scalar elements and insert them into corresponding vectors, which is very expensive.

>
>> 3) What happens if stride is 3 or 4?
>
>I'm assuming ld2, ld3, ld4m, etc. I agree this is a fragile design.
>
>
>> To represent the interleaved load that you want to achieve with
>> suggested intrinsic, you need 2 calls %even = <8 x double>
>@llvm.interleave.load.v8f64(double * %ptr, i32 2, i32 0, i32 align, <8 x
>i1> %mask, <8 x double> undef)
>> %odd   = <8 x double> @llvm.interleave.load.v8f64(double * %ptr, i32 2, i32
>1, i32 align, <8 x i1> %mask, <8 x double> undef)
>
>I like this.
>
>
>> You can translate these 2 calls into one target specific on codegen pass, if the
>mask is "all true", of course.
>
[Hao Liu] 
See my previous mail to Elena. I think it is risky to combine several intrinsics in the backend. Match one-to-one and one-to-N are simple. But match N-to-one is risky.

>If the first/last elements of the mask are false you can force a head/tail loop.
>
>I'm also assuming that the vectorizer will only emit such intrinsics if the target
>supports it. Otherwise, it'd be hard for the back-ends that know nothing about
>interleaved access to untangle the mess and still generate acceptable code.
>
>cheers,
>--renato








More information about the llvm-commits mailing list