[llvm-bugs] [Bug 33813] New: Optimization for removing same variable comparisons in loop: while(it != end1 && it != end2)

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Jul 17 01:17:30 PDT 2017


https://bugs.llvm.org/show_bug.cgi?id=33813

            Bug ID: 33813
           Summary: Optimization for removing same variable comparisons in
                    loop: while(it != end1 && it != end2)
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: antoshkka at gmail.com
                CC: llvm-bugs at lists.llvm.org

Simple iteration by std::deque elements produces suboptimal code. For example 

#include <deque>

unsigned sum(std::deque<unsigned> cont) {
    unsigned sum = 0;
    for (unsigned v : cont)
        sum += v;

    return sum;
}


produces the following loop:

.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        cmp     rsi, rcx
        je      .LBB0_4
        add     eax, dword ptr [rcx]
        add     rcx, 4
        cmp     rcx, rdx
        jne     .LBB0_1
        jmp     .LBB0_3

The loop has two comparisons in it and behaves close to the following C code:

unsigned sum_like_deque_does(unsigned** chunks, unsigned* end) {
    unsigned sum = 0;

    for (unsigned* it = *chunks; it != end; it = *(++chunks)) {
        for (;it != end && it != *chunks + 128; ++it) {
            sum += *it;
        }
    }

    return sum;
}


Note the `it != end && it != *chunks + 128` condition. It could be simplified:
if `end` belongs to `[it, *chunks + 128]` change the condition to `it != end`
and use the condition `it != *chunks + 128` otherwise. Such optimization
removes the cmp from the loop and produces a much more faster loop:

.LBB2_3:                                #   Parent Loop BB2_2 Depth=1
        add     eax, dword ptr [rcx]
        add     rcx, 4
        cmp     rdx, rcx
        jne     .LBB2_3

Synthetic tests show up to 2 times better performance. Assembly outputs:
https://godbolt.org/g/vGs2qs

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170717/d23d99d5/attachment.html>


More information about the llvm-bugs mailing list