<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - interleaved memory access blocking loop vectorization"
   href="https://bugs.llvm.org/show_bug.cgi?id=44638">44638</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>interleaved memory access blocking loop vectorization
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>7.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: AArch64
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>wxz@marvell.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, peter.smith@linaro.org, Ties.Stuij@arm.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=23047" name="attach_23047" title="The attached file shows the loop cannot be vectorized because of the high cost in interleaved memory access">attachment 23047</a> <a href="attachment.cgi?id=23047&action=edit" title="The attached file shows the loop cannot be vectorized because of the high cost in interleaved memory access">[details]</a></span>
The attached file shows the loop cannot be vectorized because of the high cost
in interleaved memory access

In loop vectorizer, LV, the decision to vectorize a loop is based on the cost
of the vectorized loop. The cost is calculated on the each instruction in the
loop. For interleaved access instruction group, there is one cost for the whole
group. The cost is based on extract/insert model to implement the interleaved
memory access. Usually, this group cost will be very high, especially on
AArch64 which has has a VectorInsertExtractBaseCost set to 3 in
AArch64Subtarget.h. This high cost usually will fail the loop to be vectorized.

On AArch64, there are multiple ways to implement interleaved memory access
without using the vector extract/insert model. For example, TBL instruction can
be used to rearrange and dispatch data from a vector to form a new vector and
it is a single one instruction cost. Also one use zip1/2, umull1/2, smull1/2 to
do this same work instead of extract/insert model.

ti reproduce this problem, compile the attched file with this command

clang -mcpu=thunderx2t99 -march=armv8.1-a+lse -mllvm tbl3.c -ffast-math
-ffp-contract=fast -funroll-loops -finline-functions -fslp-vectorize
-fvectorize -O3 -o tbl3.out

The problem exists in all llvm versions.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>