[llvm-bugs] [Bug 30990] New: [ppc] Missed vectorization

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Nov 11 15:32:53 PST 2016


https://llvm.org/bugs/show_bug.cgi?id=30990

            Bug ID: 30990
           Summary: [ppc] Missed vectorization
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: PowerPC
          Assignee: unassignedbugs at nondot.org
          Reporter: carrot at google.com
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

The source code is:

int foo(char* ptr, int l) {
  const char* const end = ptr + l;
  int count = 0;
  while (ptr < end) {
    count += ((signed char)(*ptr) < -0x40) ? 1 : 0;
    ptr++;
  }
  return count;
}

When compiled with options -m64 -O2, llvm unrolls the loop 12 times, 

        ...
.LBB0_4:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
        lbzu 0, 12(3)
        ld 5, -160(1)                   # 8-byte Folded Reload
        addi 7, 7, -12 
        lbz 20, 1(3)
        lbz 19, 2(3)
        lbz 18, 3(3)
        lbz 14, 4(3)
        cmpld    5, 7
        extsb 0, 0
        lbz 5, 5(3)
        lbz 6, 7(3)
        cmpwi 1, 0, -64 
        lbz 8, 9(3)
        extsb 2, 20
        extsb 19, 19
        extsb 18, 18
        extsb 20, 14
        cmpwi 6, 2, -64 
        lbz 2, 6(3)
        cmpwi 7, 19, -64 
        extsb 0, 5
        lbz 5, 8(3)
        cmpwi 5, 18, -64 
        isel 16, 10, 9, 24
        extsb 18, 6
        cmpwi 6, 20, -64 
        extsb 19, 2
        extsb 20, 5
        add 12, 16, 12
        isel 14, 10, 9, 20
        cmpwi 5, 19, -64 
        lbz 19, 10(3)
        isel 2, 10, 9, 24
        cmpwi 6, 18, -64 
        lbz 18, 11(3)
        add 29, 14, 29
        isel 15, 10, 9, 28
        cmpwi 7, 0, -64 
        extsb 0, 8
        add 28, 2, 28
        isel 6, 10, 9, 28
        cmpwi 7, 20, -64 
        extsb 20, 19
        add 30, 15, 30
        isel 5, 10, 9, 20
        cmpwi 5, 0, -64 
        extsb 0, 18
        add 27, 6, 27
        isel 8, 10, 9, 24
        cmpwi 6, 20, -64
        add 25, 5, 25
        isel 19, 10, 9, 28
        cmpwi 7, 0, -64
        add 24, 8, 24
        isel 17, 10, 9, 4
        add 22, 19, 22
        isel 18, 10, 9, 20
        add 11, 17, 11
        isel 20, 10, 9, 24
        add 26, 18, 26
        isel 0, 10, 9, 28
        add 23, 20, 23
        add 21, 0, 21
        bne      0, .LBB0_4
        ...
// the rest iterations

GCC can vectorize the loop:
         ...
.L4:
        sldi 5,7,4
        addi 7,7,1
        lxvd2x 33,8,5
        xxpermdi 33,33,33,2
        vcmpgtsb 1,2,1
        xxsel 33,35,36,33
        vperm 12,1,5,7
        vperm 1,1,5,8
        vperm 6,12,0,9
        vperm 12,12,0,10
        vperm 11,1,0,9
        vadduwm 6,6,13
        vperm 13,1,0,10
        vadduwm 12,12,6
        vadduwm 12,11,12
        vadduwm 13,13,12
        bdnz .L4 
        ...
// the rest iterations

In one of our internal testcase, llvm version is 2.7x times slower than gcc on
power8.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161111/3a458c0c/attachment.html>


More information about the llvm-bugs mailing list