<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [AArch64] Generated code for rotate-lefts has an unnecessary extra and"
   href="https://bugs.llvm.org/show_bug.cgi?id=37421">37421</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[AArch64] Generated code for rotate-lefts has an unnecessary extra and
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>6.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: AArch64
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>fabiang@radgametools.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>void rotateInLoop2(unsigned int *arr, const bool *control, int count, int rot0,
int rot1)
{
    for (int i = 0; i < count; ++i)
    {
        int rot = control[i] ? rot1 : rot0;
        arr[i] = (arr[i] << (rot & 31)) | (arr[i] >> (-rot & 31));
    }
}

clang 6.0 "clang -O2 -fno-tree-vectorize -target aarch64-linux-android"
produces for the main loop:

.LBB1_2: // =>This Inner Loop Header: Depth=1
  ldrb w9, [x1], #1
  ldr w10, [x0]
  cmp w9, #0 // =0
  csel w9, w3, w4, eq
  neg w9, w9
  and w9, w9, #0x1f // <-- this is unnecessary
  ror w9, w10, w9
  subs x8, x8, #1 // =1
  str w9, [x0], #4
  b.ne .LBB1_2

Probably related to <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - ARM64: Backend should know about implicit masking of variable shift distances"
   href="show_bug.cgi?id=27582">https://bugs.llvm.org/show_bug.cgi?id=27582</a>
The conditional select for the "rot" is to work around the issue with shift
counts hoisted outside the loop mentioned in
<a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - MSVC rotate intrinsics don't (just) generate rotates on x86-64"
   href="show_bug.cgi?id=37387">https://bugs.llvm.org/show_bug.cgi?id=37387</a>

-fno-tree-vectorize because with vectorization on, the loop gets unrolled 2x
(presumably to use ldp/stp?)</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>