<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - SLP Vectorizer fails to vectorize a horizontal pattern when it's repeated"

   href="https://bugs.llvm.org/show_bug.cgi?id=42070">42070</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>SLP Vectorizer fails to vectorize a horizontal pattern when it's repeated

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Scalar Optimizations

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>flashmozzg@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Seems to be loosely related to <a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Loop unrolling breaks vectorization"

   href="show_bug.cgi?id=35448">https://bugs.llvm.org/show_bug.cgi?id=35448</a>

since the problematic pattern is an often result of loop unrolling.

For some reason, SLP vectorizer fails to vectorize a horizontal reduction

pattern, when it's repeated, i.e. the following code:

float foo(float * __restrict x, float * __restrict y, unsigned len) {

    float acc = 0;

    acc += *x++ * *y++;

    acc += *x++ * *y++;

    x += 4; y += 4;

    acc += *x++ * *y++;

    acc += *x++ * *y++;

    return acc;

}

is compiled into:

define dso_local float @foo(float* noalias nocapture readonly, float* noalias

nocapture readonly, i32) local_unnamed_addr #0 {

  %4 = getelementptr inbounds float, float* %0, i64 1

  %5 = load float, float* %0, align 4, !tbaa !2

  %6 = getelementptr inbounds float, float* %1, i64 1

  %7 = load float, float* %1, align 4, !tbaa !2

  %8 = fmul float %5, %7

  %9 = fadd float %8, 0.000000e+00

  %10 = load float, float* %4, align 4, !tbaa !2

  %11 = load float, float* %6, align 4, !tbaa !2

  %12 = fmul float %10, %11

  %13 = fadd float %9, %12

  %14 = getelementptr inbounds float, float* %0, i64 6

  %15 = getelementptr inbounds float, float* %1, i64 6

  %16 = bitcast float* %14 to <2 x float>*

  %17 = load <2 x float>, <2 x float>* %16, align 4, !tbaa !2

  %18 = bitcast float* %15 to <2 x float>*

  %19 = load <2 x float>, <2 x float>* %18, align 4, !tbaa !2

  %20 = fmul <2 x float> %17, %19

  %21 = extractelement <2 x float> %20, i32 0

  %22 = fadd float %13, %21

  %23 = extractelement <2 x float> %20, i32 1

  %24 = fadd float %22, %23

  ret float %24

}

Note, that only the second half (after x+=4;y+=4) was vectorized, while each of

 them can be vectorized separately just fine. It looks like SLP vectorizer

initially attempts to reduce all loads and adds, fails because of the middle

increment and then never tries to vectorize the first half.

This can have a significant effect on performance in the presence of loop

unrolling.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>