<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><span class="vcard"><a class="email" href="mailto:spatel+llvm@rotateright.com" title="Sanjay Patel <spatel+llvm@rotateright.com>"> <span class="fn">Sanjay Patel</span></a>

</span> changed

          <a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED FIXED - [x86, SSE] only use phaddw / phaddd when optimizing for minsize?"

   href="https://bugs.llvm.org/show_bug.cgi?id=26859">bug 26859</a>

          <br>

             <table border="1" cellspacing="0" cellpadding="8">

          <tr>

            <th>What</th>

            <th>Removed</th>

            <th>Added</th>

          </tr>

         <tr>

           <td style="text-align:right;">Resolution</td>

           <td>---

           </td>

           <td>FIXED

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">Status</td>

           <td>NEW

           </td>

           <td>RESOLVED

           </td>

         </tr></table>

      <p>

        <div>

            <b><a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED FIXED - [x86, SSE] only use phaddw / phaddd when optimizing for minsize?"

   href="https://bugs.llvm.org/show_bug.cgi?id=26859#c14">Comment # 14</a>

              on <a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED FIXED - [x86, SSE] only use phaddw / phaddd when optimizing for minsize?"

   href="https://bugs.llvm.org/show_bug.cgi?id=26859">bug 26859</a>

              from <span class="vcard"><a class="email" href="mailto:spatel+llvm@rotateright.com" title="Sanjay Patel <spatel+llvm@rotateright.com>"> <span class="fn">Sanjay Patel</span></a>

</span></b>

        <pre>(In reply to Simon Pilgrim from <a href="show_bug.cgi?id=26859#c13">comment #13</a>)

<span class="quote">> <a href="https://reviews.llvm.org/D53095">https://reviews.llvm.org/D53095</a></span >

Committed here:

<a href="https://reviews.llvm.org/rL344361">https://reviews.llvm.org/rL344361</a>

There's a stunning amount of vector duplication + unrolling in these tests

currently:

$ clang -O2 accum.c -S -o - -mavx | grep padd | wc -l

     155

...but that's not this bug.

This is the current behavior:

$ clang -Os accum.c -S -o - -mavx | grep phadd | grep mm

        vphaddd %xmm0, %xmm0, %xmm0

        vphaddw %xmm0, %xmm0, %xmm0

$ clang -O2 accum.c -S -o - -mavx | grep phadd | grep mm

$ clang -O2 accum.c -S -o - -mavx -march=btver2 | grep phadd | grep mm

        vphaddd %xmm0, %xmm0, %xmm0

        vphaddw %xmm0, %xmm0, %xmm0

Ie, if we are optimizing for size or Jaguar, we'll use horizontal ops,

otherwise, we use regular ops and shuffles. 

It's possible that our combiner predicate will need adjustments to optimize

that decision depending on code pattern and uarch, but we now have that

ability. Ideally, we can refine the choice using CPU instruction

latency/throughput models rather than with the DAG heuristic in the patch, but

that's also another bug.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>