<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - unnecessary bit-and in pshufb vector ctlz"
   href="https://bugs.llvm.org/show_bug.cgi?id=39703">39703</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>unnecessary bit-and in pshufb vector ctlz
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>danielwatson311@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>For SSSE3+, LLVM's ctlz generates a generic algorithm which uses pshufb to
calculate the leading zeros for each nibble of the vector.

pand instructions are used to select the appropriate high or low nibble.
However, for the lower nibbles this is unnecessary because the algorithm later
performs something like `nibble_lzs = if high_nibble != 0, then high_lz, else
high_lz + low_lz`. The value of `low_lz` is only used when the high nibble is
zero and thus the bit-and is unnecessary.

https:://godbolt.org/z/4lkksq

for v16i8

    pand    xmm3, xmm2 # lo_nib & 0x0f, unnecessary
    pshufb  xmm4, xmm3 # lo_lz
    psrlw   xmm0, 4
    pand    xmm0, xmm2 # hi_nib
    pxor    xmm2, xmm2 # zero
    pcmpeqb xmm2, xmm0 # hi_nib == 0
    pand    xmm2, xmm4 # if hi_nib != 0, set lo_lz = 0
    pshufb  xmm1, xmm0 # hi_lz
    paddb   xmm1, xmm2 # hi_lz + lo_lz</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>