[LLVMbugs] [Bug 11289] New: Inefficient x86 vector code generation for compare less than 0

Tue Nov 1 13:09:55 PDT 2011

http://llvm.org/bugs/show_bug.cgi?id=11289

             Bug #: 11289
           Summary: Inefficient x86 vector code generation for compare
                    less than 0
           Product: libraries
           Version: trunk
          Platform: Macintosh
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: fbossen at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Code generated for a vector comparison to zero is inefficient.

For example the C code below creates mask where each result vector element is
equal to -1 if the input vector element is negative, and 0 otherwise.

#include <emmintrin.h>

void test(__m128i* p)
{
  *p = _mm_cmplt_epi8(*p, _mm_setzero_si128());
}

Using http://llvm.org/demo/index.cgi the generated LLVM assembly is

; ModuleID = '/tmp/webcompile/_11830_0.bc'
target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

define void @test(<2 x i64>* nocapture %p) nounwind {
  %1 = load <2 x i64>* %p, align 16, !tbaa !0
  %2 = bitcast <2 x i64> %1 to <16 x i8>
  %.lobit.i.i = ashr <16 x i8> %2, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7,
i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
  %3 = bitcast <16 x i8> %.lobit.i.i to <2 x i64>
  store <2 x i64> %3, <2 x i64>* %p, align 16, !tbaa !0
  ret void
}

!0 = metadata !{metadata !"omnipotent char", metadata !1}
!1 = metadata !{metadata !"Simple C/C++ TBAA", null}

where the comparison < 0 is replaced by >> 7

Alternatively the vector comparison could also be written as

define <16 x i8> @test(<16 x i8> %a) {
       %b = icmp sgt <16 x i8> zeroinitializer, %a
       %c = sext <16 x i1> %b to <16 x i8>
       ret <16 x i8> %c
}

In both cases the generated code is convoluted where individual vector elements
are extracted using pextrb instructions, shifted using sarb instructions, and
reinserted using pinsrb instructions.

This was tested using r143475 from trunk.

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.