<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Memory access versioning adds bad(?) runtime predicate to vectorized loop"
   href="https://bugs.llvm.org/show_bug.cgi?id=49347">49347</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Memory access versioning adds bad(?) runtime predicate to vectorized loop
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>mattias.v.eriksson@ericsson.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=24571" name="attach_24571" title="LV input">attachment 24571</a> <a href="attachment.cgi?id=24571&action=edit" title="LV input">[details]</a></span>
LV input

With the attached file, loop vectorization adds a runtime check so the the
vectorized loop only runs when that numOutputs == 1:

opt -S -o - lv-mav.ll -loop-vectorize -force-vector-width=4
[...]
  %ident.check = icmp ne i32 %numOutputs, 1
  %10 = or i1 %9, %ident.check
[...]
  %17 = or i1 %10, %16
  br i1 %17, label %scalar.ph, label %vector.ph

Running the vectorizer without memory access versioning, I get a partially
vectorized loop without the check on numOutputs:

opt -S -o - lv-mav.ll -loop-vectorize -force-vector-width=4
-enable-mem-access-versioning=0

In a performance issue I am looking at in my out-of-tree target, the partially
vectorized loop is faster than the scalar loop, but the check on numOutputs
makes the code always run the scalar loop. The vector code looks better when
numOutputs == 1, but it is worse in practice since the predicate is rarely
fulfilled.

I wonder if what LV does here makes sense in general? Is it a good idea to add
predicates like this and have the more general case only run the scalar version
of the loop?</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>