[RFC][PATCH][LoopVectorize] Teach Loop Vectorizer about interleaved data accesses

Hao Liu Hao.Liu at arm.com
Mon Mar 23 22:53:53 PDT 2015


Hi Elena,

Sorry that I misunderstand the vector type and think that <6 x i32> is
illegal in IR.

>The problem is about load/store 3 interleaved vectors. We don't have type
like
><12 x i32>. One way is to use the masked indexed load/store like:
>           %result = shufflevector <4 x i32> %V0, %V1, <0, 1, 2, 3, 4, 5,
6, 7>
>           %result1 = sufflevector <4 x i32> %V2, UNDEF, <0, 1, 2, 3,
undef, undef,
>undef, undef>
>           %result2 = sufflevector <4 x i32> %result, %result1, <0, 1, 2,
3, 4, 5, 6, 7,
>8, 9, 10, 11, undef, undef, undef, undef>
>           %result = call <16 x i32> @llvm.unidex.masked.store(i32 %ptr,
<0, 3, 6,
>9, 1, 4, 7, 9, 2, 5, 8, 10, undef, undef, undef, undef>, <true, ....,
false, false, false,
>false>) Another simpler way is to define new types like <12 x i32>, <6 x
i32>, so
>that we can still use no masked intrinsics like 2/4 interleaved load/store.
I'm
>not sure whether this is reasonable.
>
[Hao Liu] 
My fault. For the interleaved 3 vectors, <6 x i32>,  <12 x i32> etc are
legal on the IR. There is just no corresponding MVTs. No need to use uindex.
masked.store. Just an llvm.uindex.store is enough:
     %result = call <12 x i32> @llvm.uindex.store(i32 %ptr, <0, 3, 6, 9, 1,
4, 7, 10, 2, 5, 8, 11>, )

I think llvm.uindex.load and llvm.uindex.store are enough to represent
interleaved accesses.

>What do you think?
>
>Thanks,
>-Hao
>
>>-----Original Message-----
>>From: Demikhovsky, Elena [mailto:elena.demikhovsky at intel.com]
>>Sent: 2015年3月23日 20:23
>>To: Hao Liu; 'Arnold Schwaighofer'
>>Cc: Hal Finkel; Nadav Rotem; Commit Messages and Patches for LLVM;
>>Jiangning Liu; James Molloy; Adam Nemet
>>Subject: RE: [RFC][PATCH][LoopVectorize] Teach Loop Vectorizer about
>>interleaved data accesses
>>
>>> Actually I think the intrinsics which are currently used in
>>> AArch64/ARM
>>backends are simpler. Example for 2 interleaved vector:
>>>        %result = call { <2 x double>, <2 x double> }
>>> @llvm.aarch64.ld2.v2f64(<2 x double>* ptr)
>>
>>It is simple, but
>>
>>1) It is not safe due to possible memory access after eof buffer
>>2) I don't want to load odd elements if I need only even - nobody says
>>that it should be implemented by sequential loads with shuffle
>>3) What happens if stride is 3 or 4?
>>4) What happens if the block is predicated?
>>
>>To represent the interleaved load that you want to achieve with
>>suggested intrinsic, you need 2 calls %even = <8 x double>
>>@llvm.interleave.load.v8f64(double * %ptr, i32 2, i32 0, i32 align, <8
>>x
>>i1> %mask, <8 x double> undef)
>>%odd   = <8 x double> @llvm.interleave.load.v8f64(double * %ptr, i32 2,
i32 1,
>>i32 align, <8 x i1> %mask, <8 x double> undef)
>>
>>You can translate these 2 calls into one target specific on codegen
>>pass, if the mask is "all true", of course.
>>
>>-  Elena
>>
>>
>>-----Original Message-----
>>From: Hao Liu [mailto:Hao.Liu at arm.com]
>>Sent: Monday, March 23, 2015 12:25
>>To: Demikhovsky, Elena; 'Arnold Schwaighofer'
>>Cc: Hal Finkel; Nadav Rotem; Commit Messages and Patches for LLVM;
>>Jiangning Liu; James Molloy; Adam Nemet
>>Subject: RE: [RFC][PATCH][LoopVectorize] Teach Loop Vectorizer about
>>interleaved data accesses
>>
>>Hi Elena,
>>
>>>>-----Original Message-----
>>>>From: Demikhovsky, Elena [mailto:elena.demikhovsky at intel.com]
>>>>Sent: 2015年3月23日 15:45
>>>>To: Hao Liu; 'Arnold Schwaighofer'
>>>>Cc: Hal Finkel; Nadav Rotem; Commit Messages and Patches for LLVM;
>>>>Jiangning Liu; James Molloy; Adam Nemet
>>>>Subject: RE: [RFC][PATCH][LoopVectorize] Teach Loop Vectorizer about
>>>>interleaved data accesses
>>>>
>>>>I agree with Hao, that a bunch of loads and shuffles will be very
>>difficult to
>>>>handle.
>>>>For interleave factor 4 and vector 8, you'll need 4 masked loads and
>>>>3
>>shuffles,
>>>>that will never be gathered together in one or two target instruction.
>>>>
>>>>We also can consider an "interleave load" as a private case of gather
>>>>/
>>scatter,
>>>>but again, getting the stride and converting back to interleave-load
>>>>will
>>be
>>>>cumbersome.
>>>>
>>>>I think that we should go for llvm-common-target intrinsic form till
>>>>the CodeGen.
>>>>
>>>>I propose to add a mask of control flow as a parameter to the
>>>>intrinsic,
>>like
>>>>llvm.masked.load/store in order to allow efficient vectorization of
>>predicated
>>>>basic block.
>>>><8 x double> @llvm.interleave.load.v8f64(double * %ptr, i32 %stride,
>>>>i32 %first_ind, i32 align, <8 x i1> %mask, <8 x double> %PathThru)
>>>>
>>[Hao Liu]
>>I'm curious about how to use this intrinsic to represent interleaved load.
>>Do you mean the interleaved elements are in the result vector like
>>       <8 x double>: A[0], A[2], A[4], A[6], A[1], A[3], A[5], A[7] If
>>this is true. To get two vectors with odd and even elements, we need
>>two SHUFFLE_VECTORs
>>like:
>>       %result = <8 x double> @llvm.interleave.load.v8f64(double * %ptr,
>>...)      // A[0], A[2], A[4], A[6], A[1], A[3], A[5], A[7]
>>       %even_elements = shufflevector <8 x double> %result, UNDEF, <4 x
>>i32> <0, 1, 2, 3>
>>       %odd_elements = shufflevector <8 x double> %result, UNDEF, <4 x
>>i32> <4, 5, 6, 7>
>>        // Operations on %even_elements and %odd_elements.
>>Then how about the interleaved store, it seems we also need
>>shufflevectors to combine into a big vector and call interleave.store.
>>
>>Actually I think the intrinsics which are currently used in AArch64/ARM
>>backends are simpler. Example for 2 interleaved vector:
>>        %result = call { <2 x double>, <2 x double> }
>>@llvm.aarch64.ld2.v2f64(<2 x double>* ptr)
>>        %even_elements = extractvalue { <2 x double>, <2 x double> }
%result, 0
>>        %odd_elements = extractvalue { <2 x double>, <2 x double> }
>>%result,
>>1
>>I think extractvalue is simpler than shufflevector.
>>Also the interleaved store is simply only one intrinsic like:
>>         call void @llvm.aarch64.neon.st2.v2f64(<2 x double>* ptr, <2 x
>>double> %V0, <2 x double> %V1)
>>So I think maybe we can implement similar intrinsics .
>>
>>>>-  Elena
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>Intel Israel (74) Limited
>>
>>This e-mail and any attachments may contain confidential material for
>>the sole use of the intended recipient(s). Any review or distribution
>>by others is strictly prohibited. If you are not the intended
>>recipient, please contact the sender and delete all copies.
>>








More information about the llvm-commits mailing list