<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Vectorization improvement opportunity for loops with stride"

   href="https://bugs.llvm.org/show_bug.cgi?id=36448">36448</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Vectorization improvement opportunity for loops with stride

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>serguei.katkov@azul.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Let's consider the following loop (<a href="https://godbolt.org/g/W8z3dY">https://godbolt.org/g/W8z3dY</a>)

void testStride(int a[], int b[], int N) {

  for (int i = 0; i < N; i+=2)

    a[i] = b[i];

}

If we specify that we have avx-512 support (-march=skylake-avx512) LLVM will be

able to vectorize it using Gather/Scatter.

However if we do not have the avx-512 support LLVM will not vectorize this loop

due to its cost model detects it is inefficient because it needs to scalarize

the memory access.

At the same time LLVM Vectorizer supports masked load/store but it is not used

for loops with stride access. It is only used for loops with conditions.

Specifically if I re-write the loop as

void testCond(int a[], int b[], int N) {

  for (int i = 0; i < N; i++)

    if ((i % 2) == 0)

      a[i] = b[i];

}

LLVM vectorizes this loop and uses masked load/store. However it has a problem

to detect a simple stride pattern for mask and computes it on each iteration.

So I guess there are two opportunities here:

1) Support masked load/store for stride access to memory

2) Be clever in determine invariant mask hoisting from the loop.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>