[PATCH] Optimize unrolled reductions in LoopStrengthReduce
Olivier Sallenave
ohsallen at us.ibm.com
Thu Jan 22 09:34:11 PST 2015
Hi hfinkel,
Break dependencies between unrolled iterations of reductions in loops. This should be particularly effective for superscalar targets. For a kernel similar to the one below, we get 2.5x speedup on POWER8 when the unroll factor is 3.
```
// Original reduction.
for (int i = 0; i < n; ++i)
r += arr[i];
// Unrolled reduction.
for (int i = 0; i < n; i += 2) {
r += arr[i];
r += arr[i+1];
}
// Optimized reduction
float r_0 = 0;
for (int i = 0; i < n; i += 2) {
r += arr[i];
r_0 += arr[i+1];
}
r += r_0;
```
http://reviews.llvm.org/D7128
Files:
lib/Transforms/Scalar/LoopStrengthReduce.cpp
test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll
test/Transforms/LoopStrengthReduce/unrolled-reduction.ll
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D7128.18617.patch
Type: text/x-patch
Size: 14610 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150122/af6bfd0e/attachment.bin>
More information about the llvm-commits
mailing list