[llvm-bugs] [Bug 39213] New: Suboptimal assembly generated for -Os.

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Oct 8 01:36:40 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=39213

            Bug ID: 39213
           Summary: Suboptimal assembly generated for -Os.
           Product: new-bugs
           Version: 7.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: mariusz at podlesny.eu
                CC: llvm-bugs at lists.llvm.org

While playing around with templates I stumbled upon the following issue (the
code below is non-template - I simplified it): the assembly generated for
store2 function compiled with -Os seems to be "suboptimal".

COMMAND:
clang++ -Os -g0 -DNDEBUG -Werror -Wall -Wextra -pedantic  -std=c++17 -g0 -S
main.cpp

CODE:
#include <cstddef>
#include <cstdint>

void store1(uint16_t value, uint8_t* storage)
{
    storage[0] = value;
    storage[1] = value >> 8;
}

void store2(uint16_t value, uint8_t* storage)
{
    for (size_t i = 0; i < sizeof(value); ++i)
        storage[i] = value >> (8* i);
}

RESULT:
_Z6store1tPh:                           # @_Z6store1tPh
        .cfi_startproc
# %bb.0:
        movl    %edi, %eax
        movb    %al, (%rsi)
        movb    %ah, 1(%rsi)
        retq
.Lfunc_end0:
        .size   _Z6store1tPh, .Lfunc_end0-_Z6store1tPh
        .cfi_endproc

_Z6store2tPh:                           # @_Z6store2tPh
        .cfi_startproc
# %bb.0:
        movd    %edi, %xmm1
        movl    $1, %eax
        movq    %rax, %xmm0
        pslldq  $8, %xmm0               # xmm0 =
zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
        xorl    %eax, %eax
        pshufd  $68, %xmm1, %xmm1       # xmm1 = xmm1[0,1,0,1]
        movdqa  .LCPI1_0(%rip), %xmm2   # xmm2 = [4294967295,0,4294967295,0]
        pand    %xmm2, %xmm1
        movapd  .LCPI1_1(%rip), %xmm3   # xmm3 = [255,255]
        movdqa  .LCPI1_2(%rip), %xmm4   # xmm4 = [2,2]
        movl    $2, %ecx
.LBB1_1:                                # =>This Inner Loop Header: Depth=1
        movdqa  %xmm0, %xmm5
        psllq   $3, %xmm5
        pand    %xmm2, %xmm5
        movdqa  %xmm1, %xmm6
        psrlq   %xmm5, %xmm6
        pshufd  $78, %xmm5, %xmm5       # xmm5 = xmm5[2,3,0,1]
        movdqa  %xmm1, %xmm7
        psrlq   %xmm5, %xmm7
        movsd   %xmm6, %xmm7            # xmm7 = xmm6[0],xmm7[1]
        andpd   %xmm3, %xmm7
        packuswb        %xmm7, %xmm7
        packuswb        %xmm7, %xmm7
        packuswb        %xmm7, %xmm7
        movd    %xmm7, %edx
        movw    %dx, (%rsi,%rax)
        addq    $2, %rax
        paddq   %xmm4, %xmm0
        cmpq    %rcx, %rax
        jne     .LBB1_1
# %bb.2:
        retq
.Lfunc_end1:
        .size   _Z6store2tPh, .Lfunc_end1-_Z6store2tPh
        .cfi_endproc

Well, the result surprised me quite a bit as I expected "store2" function to be
unrolled to something similar as "store1" or better yet, changed into code
which gcc 8 generates:

store2(unsigned short, unsigned char*):
        mov     WORD PTR [rsi], di
        ret

The result occurs only in -Os, whereas O1, O2, O3, Oz and Og produce much more
"regular" assembly. It also seems to show up in clang 6, 7 and trunk but not in
clang 5 and previous ones (checked on Compiler Explorer).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181008/c6d21d25/attachment-0001.html>


More information about the llvm-bugs mailing list