[PATCH] D21363: Strided Memory Access Vectorization
Ashutosh Nema via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 14 23:13:13 PDT 2016
ashutosh.nema created this revision.
ashutosh.nema added a subscriber: llvm-commits.
ashutosh.nema set the repository for this revision to rL LLVM.
Herald added subscribers: mzolotukhin, sanjoy.
<< This patch is to demonstrate the Strided Memory Access Vectorization >>
Currently LLVM does not support strided access vectorization completely, some support is available via Interleave vectorization.
There are two main overheads with strided vectorizations:
• An overhead of consolidating data into an operable vector.
• An overhead of distributing the data elements after the operations.
Because of these overheads LLVM finds strided memory vectorization is not profitable and generating serial code sequence over vectorized code.
GATHER & SCATTER instruction support is available on few(not all) targets for consolidation & distribution operations.
In this approach we are trying to handle cases like this:
for (int i = 0; i < len; i++)
a[i*3] = b[i*2] + c[i*3];
We model strided memory load & store using shuffle & load/mask-store operations.
• Load is modeled as loads followed by shuffle.
• Store is modeled as shuffle followed by mask store.
• To minimize load and store operation introduced ‘SkipFactor’.
‘SkipFactor’:
• Multiple load operation required for consolidating data into an operable vector.
• Between various loads we skip by some offsets to effective consolidate.
• SkipFactor is the number of additional offsets required to move from the previous vector load memory-end address for the next vector load.
• SkipFactor allows all memory loads to be considered as identical (From valid element perspective).
• SkipFactor = (Stride - (VF % Stride)) % Stride)
How LOAD is modeled:
Load stride 3 (i.e. load for b [ 3 * i ])
%5 = getelementptr inbounds i32, i32* %b, i64 %.lhs
%6 = bitcast i32* %5 to <4 x i32>*
%stride.load27 = load <4 x i32>, <4 x i32>* %6, align 1, !tbaa !1
%7 = getelementptr i32, i32* %5, i64 6
%8 = bitcast i32* %7 to <4 x i32>*
%stride.load28 = load <4 x i32>, <4 x i32>* %8, align 1, !tbaa !1
%strided.vec29 = shufflevector <4 x i32> %stride.load27, <4 x i32> %stride.load28, <4 x i32> <i32 0, i32 3, i32 4, i32 7>
How STORE is modeled:
Store with stride 3 (i.e. store to c [ 3 * i ])
%10 = getelementptr inbounds i32, i32* %c, i64 %.lhs
%11 = bitcast i32* %10 to <4 x i32>*
%interleaved.vec = shufflevector <4 x i32> %StoreResult, <4 x i32> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 1>
call void @llvm.masked.store.v4i32(<4 x i32> %interleaved.vec, <4 x i32>* %11, i32 4, <4 x i1> <i1 true, i1 false, i1 false, i1 true>)
%12 = getelementptr i32, i32* %10, i64 6
%13 = bitcast i32* %12 to <4 x i32>*
%interleaved.vec30 = shufflevector <4 x i32> %StoreResult, <4 x i32> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 3>
call void @llvm.masked.store.v4i32(<4 x i32> %interleaved.vec30, <4 x i32>* %13, i32 4, <4 x i1> <i1 true, i1 false, i1 false, i1 true>)
NOTE: For Stride 3 SkipFactor is 2, that’s why the second GEP offset by 6(4+2).
To enable this feature below LLVM changes required.
1) Identify strided memory access (Its already there i.e. interleave vectorizer does this)
2) Costing changes:
a. Identify number of Load[s]/Store[s] & Shuffle[s] required to model Load/Store operation by considering SkipFactor.
b. Return total cost by adding Load[s]/Store[s] and shuffle[s] costs.
3) Transform:
a. Generate Shuffle[s] followed by Mask-Store[s] instructions to model a Store operation.
b. Generate Load[s] followed by Shuffle[s] instructions to model a Load operation.
Use below option to enable this feature:
“-mllvm -enable-interleaved-mem-accesses -mllvm -enable-strided-vectorization”
Gains observed with prototype:
TSVC kernel S111 1.15x
TSVC kernel S1111 1.42x
Thanks,
Ashutosh
Repository:
rL LLVM
http://reviews.llvm.org/D21363
Files:
include/llvm/Analysis/TargetTransformInfo.h
include/llvm/Analysis/TargetTransformInfoImpl.h
include/llvm/CodeGen/BasicTTIImpl.h
lib/Analysis/TargetTransformInfo.cpp
lib/Target/X86/X86ISelLowering.h
lib/Target/X86/X86TargetTransformInfo.cpp
lib/Target/X86/X86TargetTransformInfo.h
lib/Transforms/Vectorize/LoopVectorize.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D21363.60792.patch
Type: text/x-patch
Size: 45614 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160615/98724315/attachment.bin>
More information about the llvm-commits
mailing list