[PATCH] D70015: [PowerPC] Improve vectorization of loops that operate on values that are extended in the body
Nemanja Ivanovic via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 8 10:17:22 PST 2019
nemanjai created this revision.
nemanjai added reviewers: hfinkel, PowerPC.
Herald added subscribers: shchenz, jsji, kbarton, hiraditya.
Herald added a project: LLVM.
When vectorizing loops that operate on values that start narrower and are extended in the loop, we don't maximize the vector throughput and overall do a poor job of vectorizing.
Example:
double test(float *__restrict thing1, float *__restrict thing2) {
int i = 0;
double aggr_prod = 0.0;
for (i = 0; i < 300; i++) {
aggr_prod += (thing1[i] * thing2[i]);
}
return aggr_prod;
}
We will currently only vectorize this by a factor of 2, then extend early and perform FMA's for the computation. However, it is much faster to:
- Vectorize by a factor of 4
- Perform the multiplication in single precision
- Extend the result of the multiplication and do the addition
This patch improves performance of an important kernel by 50% which in turn provides a very significant improvement on the benchmark that contains the kernel. It also does not have a detrimental effect on performance of other benchmarks as measured by SPEC results.
Repository:
rL LLVM
https://reviews.llvm.org/D70015
Files:
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
llvm/test/CodeGen/PowerPC/vec_fmuladd.ll
llvm/test/Transforms/LoopVectorize/PowerPC/max-vec-bandwidth.ll
llvm/test/Transforms/LoopVectorize/PowerPC/pr30990.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D70015.228482.patch
Type: text/x-patch
Size: 26555 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191108/bd28ea18/attachment.bin>
More information about the llvm-commits
mailing list