<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Improve cost model for lengthens followed by truncates"

   href="http://llvm.org/bugs/show_bug.cgi?id=21370">21370</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Improve cost model for lengthens followed by truncates

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Loop Optimizer

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>james.molloy@arm.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>The loop vectorizer does not have a model for sequences of

lengthen->arithmetic->truncate. These sequences are often generated due to

implicit int promotions when dealing with sub-int types, and the

lengthen/truncate can be elided.

One of the "obvious" solutions to this would be to question why the

lengthen/truncate sequence hadn't been removed in the first place. This has

been discussed on-list, and keeping it is deliberate. If the sequence is not

vectorized, the lengthens and truncates would have to be generated because most

architectures can't do scalar arithmetic on sub-word types. Only if it is

vectorized can we do 8-bit or 16-bit arithmetic. So the model for this needs to

be in the vectorizer.

In fact, if the vectorizer is forced to vectorize, it does actually remove the

lengthen/truncate in its generated code. It just doesn't account for this in

its cost model.

char arr[5000];

char arr2[5000];

void f() {

  for (int i = 0; i < 5000; ++i) {

    int a = arr[i] + 1;

    arr2[i] = a;

  }

}

./bin/clang -O3 -o - -S test-vec2.c -target arm64 -Rpass=loop-vectorize

-Rpass-analysis=loop-vectorize

test-vec2.c:5:3: remark: unrolled with interleaving factor 2 (vectorization not

      beneficial) [-Rpass=loop-vectorize]

  for (int i = 0; i < 5000; ++i) {

  ^</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>