<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Optimization of loop via loop peeling does not occur for float and double"

   href="https://llvm.org/bugs/show_bug.cgi?id=31930">31930</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Optimization of loop via loop peeling does not occur for float and double

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>drraph@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Consider:

float f(float x[]) {

  float p = 1.0;

  for (int i = 0; i < 960; i++)

    p += 1;

  return p;

}

When compiled with  -march=core-avx2 -O3 -ffast-math the assembly loops round

adding until it gets to 961.

However:

int f(int x[]) {

  int p = 1;

  for (int i = 0; i < 960; i++)

    p += 1;

  return p;

}

gives:

f:                                      # @f

        mov     eax, 961

        ret

I don't know how hard it would be to add the same optimization for floats and

double.

As a side note, there are in fact a number of interesting details with the

first (float) loop. First, if we reduce the i < 960 limit to i < 959 the loop

is optimized out. Second if we change the type to 'double' this upper limit

goes down to i < 479.  My guess is that this corresponds to an unpeeling cost

model that is incorporated into the compiler.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>