[llvm-bugs] [Bug 51288] New: Convert mov and shr to shrx in loops constrained by retirement rate

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Jul 30 21:37:34 PDT 2021


https://bugs.llvm.org/show_bug.cgi?id=51288

            Bug ID: 51288
           Summary: Convert mov and shr to shrx in loops constrained by
                    retirement rate
           Product: new-bugs
           Version: 12.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: todd at lipcon.org
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

This input file:

#include <stdint.h>
#include <utility>

struct Foo {
  uint64_t v;
  std::pair<uint32_t, uint32_t> Get() { return {v & 0xffffffff, v >> 32}; }
};

void Process(Foo* f, uint32_t* dst, int n) {
#pragma unroll
  for (int i = 0; i < n; i++) {
    auto [mask, idx] = f[i].Get();
    dst[idx] |= mask;
  }
}

Generates some assembly where the core of the loop has the following sequence:
        movq    24(%rdi,%rax,8), %r9
        movq    %r9, %rcx
        shrq    $32, %rcx
        orl     %r9d, (%rsi,%rcx,4)

When compiling with bmi2 support, it would instead be slightly faster to store
the constant 32 into a register and use shrx to combine the copy of %r9 into
%rcx with a shift.

Generated version:
https://bit.ly/2WzH8Pj

Preferred version (~saving half a cycle per unrolled-by-4 loop):
https://bit.ly/3jaXBBh

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210731/66ea114f/attachment.html>


More information about the llvm-bugs mailing list