[PATCH] D70015: [PowerPC] Improve vectorization of loops that operate on values that are extended in the body

Nemanja Ivanovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 8 10:17:22 PST 2019


nemanjai created this revision.
nemanjai added reviewers: hfinkel, PowerPC.
Herald added subscribers: shchenz, jsji, kbarton, hiraditya.
Herald added a project: LLVM.

When vectorizing loops that operate on values that start narrower and are extended in the loop, we don't maximize the vector throughput and overall do a poor job of vectorizing.
Example:

  double test(float *__restrict thing1, float *__restrict thing2) {
    int i = 0;
    double aggr_prod = 0.0;
  
    for (i = 0; i < 300; i++) {
      aggr_prod += (thing1[i] * thing2[i]);
    }
  
    return aggr_prod;
  }

We will currently only vectorize this by a factor of 2, then extend early and perform FMA's for the computation. However, it is much faster to:

- Vectorize by a factor of 4
- Perform the multiplication in single precision
- Extend the result of the multiplication and do the addition

This patch improves performance of an important kernel by 50% which in turn provides a very significant improvement on the benchmark that contains the kernel. It also does not have a detrimental effect on performance of other benchmarks as measured by SPEC results.


Repository:
  rL LLVM

https://reviews.llvm.org/D70015

Files:
  llvm/lib/Target/PowerPC/PPCISelLowering.cpp
  llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
  llvm/test/CodeGen/PowerPC/vec_fmuladd.ll
  llvm/test/Transforms/LoopVectorize/PowerPC/max-vec-bandwidth.ll
  llvm/test/Transforms/LoopVectorize/PowerPC/pr30990.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D70015.228482.patch
Type: text/x-patch
Size: 26555 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191108/bd28ea18/attachment.bin>


More information about the llvm-commits mailing list