[LLVMdev] What opt pass attempts implements this optimization?

Tue Oct 6 16:28:13 PDT 2009

I have a very simple kernel that is generating very very bad code.

The basic kernel pseudo-code is as follows:

forloop(1 to n) {

  forloop(0 to j) {

  A

}

B

}

C

It is generating very ugly and inefficient code for a vector system
similar to the following pseudo-code:

if (n > 1) {

  if (j) {

                forloop(1 to n) {

                                forloop(0 to j) {

                                A

                                }

                                B

                }

               C

   } else {

                forloop(1 to n) {

                                B

                }

                C

   } 

} else {

C

}

I can understand how this would be good in a scalar system like x86, but
this is just bad on a vector system.

The reason this is bad because if a single branch is taken by a
work-item in a hardware thread(there are 64 work-items per hw thread),
then every single work-item in a hardware thread must execute that
branch. In this specific example, instead of every thread executing A, B
and C once, in the worst case(which is also happens 100% of the time),
every thread will execute C three times, B twice and A once. This also
does not take into account the cost of managing flow control on the
hardware, which is relatively expensive. This gets worse with the more
flow control I add in.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091006/4b790bcf/attachment.html>