[llvm-bugs] [Bug 39213] New: Suboptimal assembly generated for -Os.
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Oct 8 01:36:40 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=39213
Bug ID: 39213
Summary: Suboptimal assembly generated for -Os.
Product: new-bugs
Version: 7.0
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: mariusz at podlesny.eu
CC: llvm-bugs at lists.llvm.org
While playing around with templates I stumbled upon the following issue (the
code below is non-template - I simplified it): the assembly generated for
store2 function compiled with -Os seems to be "suboptimal".
COMMAND:
clang++ -Os -g0 -DNDEBUG -Werror -Wall -Wextra -pedantic -std=c++17 -g0 -S
main.cpp
CODE:
#include <cstddef>
#include <cstdint>
void store1(uint16_t value, uint8_t* storage)
{
storage[0] = value;
storage[1] = value >> 8;
}
void store2(uint16_t value, uint8_t* storage)
{
for (size_t i = 0; i < sizeof(value); ++i)
storage[i] = value >> (8* i);
}
RESULT:
_Z6store1tPh: # @_Z6store1tPh
.cfi_startproc
# %bb.0:
movl %edi, %eax
movb %al, (%rsi)
movb %ah, 1(%rsi)
retq
.Lfunc_end0:
.size _Z6store1tPh, .Lfunc_end0-_Z6store1tPh
.cfi_endproc
_Z6store2tPh: # @_Z6store2tPh
.cfi_startproc
# %bb.0:
movd %edi, %xmm1
movl $1, %eax
movq %rax, %xmm0
pslldq $8, %xmm0 # xmm0 =
zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
xorl %eax, %eax
pshufd $68, %xmm1, %xmm1 # xmm1 = xmm1[0,1,0,1]
movdqa .LCPI1_0(%rip), %xmm2 # xmm2 = [4294967295,0,4294967295,0]
pand %xmm2, %xmm1
movapd .LCPI1_1(%rip), %xmm3 # xmm3 = [255,255]
movdqa .LCPI1_2(%rip), %xmm4 # xmm4 = [2,2]
movl $2, %ecx
.LBB1_1: # =>This Inner Loop Header: Depth=1
movdqa %xmm0, %xmm5
psllq $3, %xmm5
pand %xmm2, %xmm5
movdqa %xmm1, %xmm6
psrlq %xmm5, %xmm6
pshufd $78, %xmm5, %xmm5 # xmm5 = xmm5[2,3,0,1]
movdqa %xmm1, %xmm7
psrlq %xmm5, %xmm7
movsd %xmm6, %xmm7 # xmm7 = xmm6[0],xmm7[1]
andpd %xmm3, %xmm7
packuswb %xmm7, %xmm7
packuswb %xmm7, %xmm7
packuswb %xmm7, %xmm7
movd %xmm7, %edx
movw %dx, (%rsi,%rax)
addq $2, %rax
paddq %xmm4, %xmm0
cmpq %rcx, %rax
jne .LBB1_1
# %bb.2:
retq
.Lfunc_end1:
.size _Z6store2tPh, .Lfunc_end1-_Z6store2tPh
.cfi_endproc
Well, the result surprised me quite a bit as I expected "store2" function to be
unrolled to something similar as "store1" or better yet, changed into code
which gcc 8 generates:
store2(unsigned short, unsigned char*):
mov WORD PTR [rsi], di
ret
The result occurs only in -Os, whereas O1, O2, O3, Oz and Og produce much more
"regular" assembly. It also seems to show up in clang 6, 7 and trunk but not in
clang 5 and previous ones (checked on Compiler Explorer).
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181008/c6d21d25/attachment-0001.html>
More information about the llvm-bugs
mailing list