[PATCH] D24681: Optimize patterns of vectorized interleaved memory accesses for X86.
Farhana via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 16 13:01:00 PDT 2016
Farhana created this revision.
Farhana added reviewers: DavidKreitzer, mkuper, delena.
Farhana added a subscriber: llvm-commits.
Herald added subscribers: mgorny, beanz, aemerson.
[X86InterleavedAccess] Optimize patterns of vectorized interleaved memory accesses for X86.
Prior to this, there were no x86 implementation of InterleavedAccessPass which detects a set of interleaved accesses and generates target specific intrinsics.
Here is an example of interleaved loads:
// %wide.vec = load <8 x i32>, <8 x i32>* %ptr
// %v0 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <0, 2, 4, 6>
// %v1 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <1, 3, 5, 7>
The ARM implementation generates ldn/stn intrinsics.
The change-set here places the basic framework to support InterleavedAccessPass on X86. It also, tries to detect an interleaved pattern(with 4 interleaved accesses, stride:4, 64-bit on AVX/AVX2) and generate optimized sequence for that.
This is just the first step of a long effort. The short-term plan is to continue supporting a few patterns this way while we work out a more general solution.
In order to allow code sharing between multiple transpose functions, the next change-set will introduce a class that will encapsulate all the necessary information.
Due to this change-set,
/// Current supported interleaved loads: here, T = {i/f}
/// %wide.vec = load <16 x T64>, <16 x T64>* %ptr
/// %v0 = shuffle %wide.vec, undef, <0, 4, 8, 12> ;
/// %v1 = shuffle %wide.vec, undef, <1, 5, 9, 13> ;
/// %v2 = shuffle %wide.vec, undef, <2, 6, 10, 14> ;
/// %v3 = shuffle %wide.vec, undef, <3, 7, 11, 15> ;
///
/// Into:
/// %load0 = load <4 x T64>, <4 x T64>* %ptr
/// %load1 = load <4 x T64>, <4 x T64>* %ptr+32
/// %load2 = load <4 x T64>, <4 x T64>* %ptr+64
/// %load3 = load <4 x T64>, <4 x T64>* %ptr+96
///
/// %intrshuffvec1 = shuffle %load0, %load2, <0, 1, 4, 5>;
/// %intrshuffvec2 = shuffle %load1, %load3, <0, 1, 4, 5>;
/// %v0 = shuffle %intrshuffvec1, %intrshuffvec2, <0, 4, 2, 6>;
/// %v1 = shuffle %intrshuffvec1, %intrshuffvec2, <1, 5, 3, 7>;
///
/// %intrshuffvec3 = shuffle %load0, %load2, <2, 3, 6, 7>;
/// %intrshuffvec4 = shuffle %load1, %load3, <2, 3, 6, 7>;
/// %v2 = shuffle %intrshuffvec3, %intrshuffvec4, <0, 4, 2, 6>;
/// %v3 = shuffle %intrshuffvec3, %intrshuffvec4, <1, 5, 3, 7>;
https://reviews.llvm.org/D24681
Files:
lib/CodeGen/InterleavedAccessPass.cpp
lib/Target/X86/CMakeLists.txt
lib/Target/X86/X86ISelLowering.h
lib/Target/X86/X86InterleavedAccess.cpp
lib/Target/X86/X86TargetMachine.cpp
test/CodeGen/X86/x86-interleaved-access.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D24681.71690.patch
Type: text/x-patch
Size: 11654 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160916/75e723c8/attachment.bin>
More information about the llvm-commits
mailing list