[LLVMbugs] [Bug 12312] New: inefficient code for 128bit int comparison comparison with sse41
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Mon Mar 19 13:48:32 PDT 2012
http://llvm.org/bugs/show_bug.cgi?id=12312
Bug #: 12312
Summary: inefficient code for 128bit int comparison comparison
with sse41
Product: libraries
Version: trunk
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
AssignedTo: unassignedbugs at nondot.org
ReportedBy: sroland at vmware.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
llvm always seems to extract the largest possible elements (i.e. quadword on
x86-64, dword otherwise) and do the comparison on the int stack even if that's
completely unnecessary.
This snippet
define i32 @veccond(<4 x i32> %input) {
entry:
%0 = bitcast <4 x i32> %input to i128
%1 = icmp ne i128 %0, 0
br i1 %1, label %if-true-block, label %endif-block
if-true-block: ; preds = %entry
ret i32 0
endif-block: ; preds = %entry,
%if-true-block
ret i32 1
}
compiles to
pextrq $1, %xmm0, %rax
movd %xmm0, %rcx
orq %rax, %rcx
je .LBB0_2
# BB#1: # %if-true-block
xorl %eax, %eax
ret
.LBB0_2: # %endif-block
movl $1, %eax
ret
(looks much worse with 32bit arch and/or 256bit int for obvious reasons though
I didn't actually test the latter yet)
This looks like a near ideal case for PTEST to me,
i.e. something like
ptest %xmm0, %xmm0
je .LBB0_2
etc.
should be much better.
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list