[LLVMdev] What opt pass attempts implements this optimization?
Villmow, Micah
Micah.Villmow at amd.com
Tue Oct 6 16:28:13 PDT 2009
I have a very simple kernel that is generating very very bad code.
The basic kernel pseudo-code is as follows:
forloop(1 to n) {
forloop(0 to j) {
A
}
B
}
C
It is generating very ugly and inefficient code for a vector system
similar to the following pseudo-code:
if (n > 1) {
if (j) {
forloop(1 to n) {
forloop(0 to j) {
A
}
B
}
C
} else {
forloop(1 to n) {
B
}
C
}
} else {
C
}
I can understand how this would be good in a scalar system like x86, but
this is just bad on a vector system.
The reason this is bad because if a single branch is taken by a
work-item in a hardware thread(there are 64 work-items per hw thread),
then every single work-item in a hardware thread must execute that
branch. In this specific example, instead of every thread executing A, B
and C once, in the worst case(which is also happens 100% of the time),
every thread will execute C three times, B twice and A once. This also
does not take into account the cost of managing flow control on the
hardware, which is relatively expensive. This gets worse with the more
flow control I add in.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091006/4b790bcf/attachment.html>
More information about the llvm-dev
mailing list