<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - [PPC] slower vsx instructions generated for vmac"
   href="https://llvm.org/bugs/show_bug.cgi?id=31492">31492</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[PPC] slower vsx instructions generated for vmac
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: PowerPC
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>carrot@google.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=17789" name="attach_17789" title="testcase">attachment 17789</a> <a href="attachment.cgi?id=17789&action=edit" title="testcase">[details]</a></span>
testcase

The attached test case is simplified from vmac. Compile it with options

-m64 -O2 -mvsx -mcpu=power8

LLVM generates following code for the while loop


.LBB0_2:                                # %while.body
                                        # =>This Inner Loop Header: Depth=1
        lxvd2x 0, 0, 7             // *
        lxvd2x 1, 0, 6             // *
        xxswapd  34, 0             // *
        xxswapd  35, 1             // *
        vaddudm 2, 3, 2            // *
        xxswapd  10, 34            // *
        mfvsrd 9, 34               // *
        mfvsrd 10, 10              // *
        #APP
        mulhdu 11, 10, 9
        #NO_APP
        lxvd2x 11, 7, 8
        lxvd2x 12, 0, 5
        mulld 9, 9, 10
        addi 7, 7, 64
        xxswapd  50, 11
        xxswapd  51, 12
        vaddudm 2, 19, 18
        xxswapd  13, 34
        mfvsrd 0, 34
        mfvsrd 12, 13
        mulld 10, 0, 12
        #APP
        mulhdu 12, 12, 0
        #NO_APP
        #APP
        addc 3, 9, 10
        adde 4, 11, 12
        #NO_APP
        bdnz .LBB0_2

There are two problems:

1. (kp)[i] is loop invariant, its loading can be hoisted before the loop.

2. llvm generates vsx code marked * for the expression

    get64PE((mp) + i) + (kp)[i]

   if we use simple integer load and add instructions, it will be shorter and
faster. For large input, it can be 35% faster.
   Looks like cost model problem in vectorization?</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>