<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - [SSE2] poor performance _mm_extract_epi8"
   href="https://llvm.org/bugs/show_bug.cgi?id=27737">27737</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[SSE2] poor performance _mm_extract_epi8
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>3.8
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Headers
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>rozhuk.im@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>I have some code that heavy uses _mm_extract_epi8().
Then I build with clang 3.8 and -msse2 (without -msse4.1) then program work
very slow.

To build with GCC and clang 3.4, 3.6, 3.7 I use macro:

#ifndef _mm_extract_epi8 /* SSE4.1 required. */
#define _mm_extract_epi8(__xmm, __n)                    \
    ((_mm_extract_epi16(__xmm, ((__n) >> 1)) >> (8 * ((__n) & 1))) & 0xff)
#endif


Test results:

AMD Athlon(tm) 5350 APU with Radeon(tm) R3      (2050.04-MHz K8-class CPU)
GCC:        20391006000 (SSE4.1) /  20116413000 (SSE2)
clang 3.8:    22329895000 (SSE4.1) / 117304135000 (SSE2) !!!
clang 3.7:    22367008000 (SSE4.1) /  25542571000 (SSE2)
clang 3.6:    22306648000 (SSE4.1) /  25914115000 (SSE2)
clang 3.4:    23684031000 (SSE4.1) /  25914115000 (SSE2)

Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (2999.72-MHz K8-class CPU)
GCC:        12031595000 (SSE4.1) / 12011303000 (SSE2)
clang 3.8:    12431116000 (SSE4.1) / 73035466000 (SSE2) !!!
clang 3.7:    12458839000 (SSE4.1) / 13317058000 (SSE2)
clang 3.6:    12462181000 (SSE4.1) / 14119683000 (SSE2)
clang 3.4:    13555167000 (SSE4.1) / 13178893000 (SSE2)</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>