<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [InstCombine] Fails to combine three shufflevectors produced by LoopVectorizer"
   href="https://bugs.llvm.org/show_bug.cgi?id=38792">38792</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[InstCombine]  Fails to combine three shufflevectors produced by LoopVectorizer
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Scalar Optimizations
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>paulsson@linux.vnet.ibm.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=20815" name="attach_20815" title="reduced testcase">attachment 20815</a> <a href="attachment.cgi?id=20815&action=edit" title="reduced testcase">[details]</a></span>
reduced testcase

The LoopVectorizer has interleaved loads and stores. Basically, the loaded
elements should pairwise be reversed like

[0 1 2 3]  -> [1 0 3 2]

The Vectorizer does not understand this but generates from two interleave
groups first a result for the load group, and then makes another shuffle for
the store group

 %tmp6 = load <4 x i64>, <4 x i64>* %tmp5, align 8
  %tmp7 = shufflevector <4 x i64> %tmp6, <4 x i64> undef, <2 x i32> <i32 0, i32
2>
  %tmp8 = shufflevector <4 x i64> %tmp6, <4 x i64> undef, <2 x i32> <i32 1, i32
3>
  %tmp9 = shufflevector <2 x i64> %tmp8, <2 x i64> %tmp7, <4 x i32> <i32 0, i32
1, i32 2, i32 3>
  %tmp10 = shufflevector <4 x i64> %tmp9, <4 x i64> undef, <4 x i32> <i32 0,
i32 2, i32 1, i32 3>
  store <4 x i64> %tmp10, <4 x i64>* undef, align 8

This results in [1 0 3 2], and I would have hoped that this would become a
single shufflevector after instcombine, but this does not happen.

There are comments in InstCombine that this is purposely done very
conservatively. It is however clear that this does not give good code on
SystemZ.

I wonder if anyone has any idea if InstCombiner should handle this case, or if
not, where should this be done. A custom DAGCombine by the target? 

bin/opt  ./tc_instcombine.ll -mtriple=systemz-unknown -mcpu=z13 -S -o
out.opt.ll -instcombine</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>