[PATCH] Optimize unrolled reductions in LoopStrengthReduce

Olivier Sallenave ohsallen at us.ibm.com
Thu Jan 22 09:34:11 PST 2015


Hi hfinkel,

Break dependencies between unrolled iterations of reductions in loops. This should be particularly effective for superscalar targets. For a kernel similar to the one below, we get 2.5x speedup on POWER8 when the unroll factor is 3.

```
// Original reduction.
for (int i = 0; i < n; ++i)
    r += arr[i];

// Unrolled reduction.
for (int i = 0; i < n; i += 2) {
    r += arr[i];
    r += arr[i+1];
}

// Optimized reduction
float r_0 = 0;
for (int i = 0; i < n; i += 2) {
    r += arr[i];
    r_0 += arr[i+1];
}
r += r_0;
```

http://reviews.llvm.org/D7128

Files:
  lib/Transforms/Scalar/LoopStrengthReduce.cpp
  test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll
  test/Transforms/LoopStrengthReduce/unrolled-reduction.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D7128.18617.patch
Type: text/x-patch
Size: 14610 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150122/af6bfd0e/attachment.bin>


More information about the llvm-commits mailing list