[llvm-bugs] [Bug 50942] New: [LoopVectorizer] Vectorization of running reduction/predication
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Jun 30 07:41:59 PDT 2021
https://bugs.llvm.org/show_bug.cgi?id=50942
Bug ID: 50942
Summary: [LoopVectorizer] Vectorization of running
reduction/predication
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Loop Optimizer
Assignee: unassignedbugs at nondot.org
Reporter: lebedev.ri at gmail.com
CC: llvm-bugs at lists.llvm.org
Consider https://godbolt.org/z/Wo86sfav1
void test(int pred, int* data, int width) {
for(int i = 0; i != width; ++i) {
int& out = data[i];
pred += out;
out = pred;
}
}
So the first element was incremented by `pred`,
and each next element is incremented by the value of all preceding elements.
This is a common pattern in image processing.
Currently we don't recognize the PHI, and don't vectorize,
even though it's somewhat simple:
https://godbolt.org/z/sEob7fE6h
That snippet as-is doesn't seem better vectorized:
https://godbolt.org/z/aMaqYz7f1
(better RThroughput, same cycle count, but more uOps/IPC)
However if we unroll x8 https://godbolt.org/z/Mb7vcGrzs
(and i do believe both loops unrolled still compute the same elt count)
i think we see a win: https://godbolt.org/z/EW77Max86
A ~third less cycles.
This makes sense because most of the computations there don't touch `pred`,
so they can be executed out-of-order, even though we won't be able to
finish and store until the previous group has finished processing.
The story will be somewhat different with two-element predictor,
different data types, etc.
Is this recipe something that could fit into LV?
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210630/fa90f227/attachment-0001.html>
More information about the llvm-bugs
mailing list