[LLVMbugs] [Bug 12312] New: inefficient code for 128bit int comparison comparison with sse41

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Mon Mar 19 13:48:32 PDT 2012


http://llvm.org/bugs/show_bug.cgi?id=12312

             Bug #: 12312
           Summary: inefficient code for 128bit int comparison comparison
                    with sse41
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: sroland at vmware.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified


llvm always seems to extract the largest possible elements (i.e. quadword on
x86-64, dword otherwise) and do the comparison on the int stack even if that's
completely unnecessary.

This snippet

define i32 @veccond(<4 x i32> %input) {
entry:
  %0 = bitcast <4 x i32> %input to i128
  %1 = icmp ne i128 %0, 0
  br i1 %1, label %if-true-block, label %endif-block

if-true-block:                                    ; preds = %entry
  ret i32 0
endif-block:                                      ; preds = %entry,
%if-true-block
  ret i32 1
}

compiles to

    pextrq    $1, %xmm0, %rax
    movd    %xmm0, %rcx
    orq    %rax, %rcx
    je    .LBB0_2
# BB#1:                                 # %if-true-block
    xorl    %eax, %eax
    ret
.LBB0_2:                                # %endif-block
    movl    $1, %eax
    ret

(looks much worse with 32bit arch and/or 256bit int for obvious reasons though
I didn't actually test the latter yet)

This looks like a near ideal case for PTEST to me,
i.e. something like 

    ptest    %xmm0, %xmm0
    je    .LBB0_2
etc.

should be much better.

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.



More information about the llvm-bugs mailing list