[llvm-bugs] [Bug 37796] New: Optimize bit-scatter operation

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Jun 13 15:48:28 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=37796

            Bug ID: 37796
           Summary: Optimize bit-scatter operation
           Product: clang
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: C++
          Assignee: unassignedclangbugs at nondot.org
          Reporter: ruiu at google.com
                CC: dgregor at apple.com, llvm-bugs at lists.llvm.org

I found that clang can't optimize the following code:

  // This function scatter Val's bits as instructed by Mask.
  // Here is an example:
  //
  //  Val:    abcd efgh ijkl mnop
  //  Mask:   1110 0001 1111 0001
  //  Result: hij0 000k lmno 000p
  //
  // Some CPUs support this operation as a single instruction.
  // For example, Intel BMI2 extension has this operation as PDEP.
  static inline uint32_t scatter(uint32_t Val, uint32_t Mask) {
    uint32_t Res = 0;
    uint32_t Off = 0;

    for (uint32_t I = 0; I < 32; ++I)
      if (Mask & (1 << I))
        Res |= !!(Val & (1 << Off++)) << I;
    return Res;
  }

  uint32_t foo(uint32_t x) {
    return scatter(x, 1);
  }

It can be complied to just `andl $1, %edi` on x86-64, but currently clang
compiles this to a loop that iterates 32 times (https://godbolt.org/g/jX5sNW).

If I add "#pragma unroll", clang can optimize the code
(https://godbolt.org/g/Apx7Nj)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180613/c4d7dd8b/attachment.html>


More information about the llvm-bugs mailing list