<html>
    <head>
      <base href="http://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - missed optimization in trunc(shufflevector(insertvector))"
   href="http://llvm.org/bugs/show_bug.cgi?id=16397">16397</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>missed optimization in trunc(shufflevector(insertvector))
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>nlewycky@google.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu, nrotem@apple.com
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Examine example 3 from <a href="http://blog.regehr.org/archives/320">http://blog.regehr.org/archives/320</a> . Now that we
vectorize this code, we should be able to produce nothing less than what icc
does.

LLVM fails to generate awesome code for this due to extra casts. Consider the
sequence:

  %tmp = trunc i64 %index to i32
  %broadcast.splatinsert6 = insertelement <4 x i32> undef, i32 %tmp, i32 0
  %broadcast.splat7 = shufflevector <4 x i32> %broadcast.splatinsert6, <4 x
i32> undef, <4 x i32> zeroinitializer
  %induction8 = add <4 x i32> %broadcast.splat7, <i32 0, i32 1, i32 2, i32 3>
  %induction9 = add <4 x i32> %broadcast.splat7, <i32 4, i32 5, i32 6, i32 7>
  %tmp1 = trunc <4 x i32> %induction8 to <4 x i8>
  %tmp2 = trunc <4 x i32> %induction9 to <4 x i8>

this can be more straight-forwardly expressed as:

  %tmp = trunc i64 %index to i8
  %broadcast.splatinsert6 = insertelement <4 x i8> undef, i8 %tmp, i32 0
  %broadcast.splat7 = shufflevector <4 x i8> %broadcast.splatinsert6, <4 x i8>
undef, <4 x i32> zeroinitializer
  %tmp1 = add <4 x i8> %broadcast.splat7, <i8 0, i8 1, i8 2, i8 3>
  %tmp2 = add <4 x i8> %broadcast.splat7, <i8 4, i8 5, i8 6, i8 7>

which has dramatic effect on the x86 assembly produced by llc.

Adding this optimization to instcombine seems straight-forward, except that we
need to decide, at IR time, whether it's actually safe to shrink to the smaller
type. (For the non-vector case we assume it always is and let the backend
enlarge it as needed.) Nadav?</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>