<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [LV/LAA] Vectorizer creates a dead vectorized loop body when specializing for stride=1"
   href="https://bugs.llvm.org/show_bug.cgi?id=34681">34681</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[LV/LAA] Vectorizer creates a dead vectorized loop body when specializing for stride=1
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dorit.nuzman@intel.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=19177" name="attach_19177" title="patch for a fix">attachment 19177</a> <a href="attachment.cgi?id=19177&action=edit" title="patch for a fix">[details]</a></span>
patch for a fix

Consider the following testcase:

void matmul(unsigned int N, int *C, int *A, int *B) {
  unsigned int i,j,k;
  for (i=0; i<N; i++) {
    for (j=0; j<N; j++) {
        C[i*N+j]=0;
        for(k=0;k<N;k++) {
                C[i*N+j]+=(int)A[i*N+k] * (int)B[k*N+j];
        }       
    }   
  }     
}

Compiling the attached testcase with -O2 -m32 results in specializing the loop
for the case where the unknown stride N == 1.
However, the vectorized loop will only be executed if the iteration count N >=
VF, that is if N > 1.
The two conditions cannot co-exist, so the vectorized loop body becomes dead
code. (Eventually this dead code is identified and gets removed).
It would have been better to avoid specialization for the stride if we know
that the stride==1 predicate is going to contradict the
loop-minimum-iteration-count guard.

The attached patch does that, and works for the simple case at hand.
I'll upload for review after further testing. Hopefully it's of interest...
(haven't seen any performance gains with it so far).</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>