[LLVMbugs] [Bug 2314] New: Poor codegen of select + vector ops

Sun May 11 09:59:21 PDT 2008

http://llvm.org/bugs/show_bug.cgi?id=2314

           Summary: Poor codegen of select + vector ops
           Product: new-bugs
           Version: unspecified
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: new bugs
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: resistor at mac.com
                CC: llvmbugs at cs.uiuc.edu

This message to the list seems to have gotten lost:

Hello LLVM-ers, 

I'm now working on writing LLVM IR backend for my SIMD language MUDA. 

http://lucille.sourceforge.net/muda/

LLVM is so nice, I observed LLVM do good SIMD optimization 
during initial my MUDA -> LLVM IR experience. 

but I found, for x86 backend, "select" instruction is mapped to branch, 
not triple of and*/andn*/or* (or blend* in SSE4 case), 
as described in lib/Target/X86/README-SSE.txt. 

Is there any (technical) reason that current(I'm using LLVM trunk) LLVM x86
backend 
doesn't do such a optimization? 

Is it easy for the LLVM newbie to write a patch to emit 
and/andn/or x86 instruction for "select" LLVM instruction? 
If so, where should I look at to start to hack it? 

Thanks in advance, 

Syoyo 

FYI, I wrote a following portable vectorized max() function 

;; max.ll 
define <4xfloat> @muda_maxf4(<4xfloat> %a, <4xfloat> %b) 
{ 

        ;; extract 
        %a0 = extractelement <4xfloat> %a, i32 0 
        %a1 = extractelement <4xfloat> %a, i32 1 
        %a2 = extractelement <4xfloat> %a, i32 2 
        %a3 = extractelement <4xfloat> %a, i32 3 

        %b0 = extractelement <4xfloat> %b, i32 0 
        %b1 = extractelement <4xfloat> %b, i32 1 
        %b2 = extractelement <4xfloat> %b, i32 2 
        %b3 = extractelement <4xfloat> %b, i32 3 

        ;; c[N] = a[N] > b[N] 
        %c0 = fcmp ogt float %a0, %b0 
        %c1 = fcmp ogt float %a1, %b1 
        %c2 = fcmp ogt float %a2, %b2 
        %c3 = fcmp ogt float %a3, %b3 

        ;; if %c[N] == 1 then %a[N] else %b[N] 

        %r0 = select i1 %c0, float %a0, float %b0 
        %r1 = select i1 %c1, float %a1, float %b1 
        %r2 = select i1 %c2, float %a2, float %b2 
        %r3 = select i1 %c3, float %a3, float %b3 

        ;; pack 

        %tmp0 = insertelement <4xfloat> undef, float %r0, i32 0 
        %tmp1 = insertelement <4xfloat> %tmp0, float %r1, i32 1 
        %tmp2 = insertelement <4xfloat> %tmp1, float %r2, i32 2 
        %r    = insertelement <4xfloat> %tmp2, float %r3, i32 3 

        ret <4xfloat> %r 
} 

and got following assembly code, 

$ llvm-as max.ll; opt -std-compile-opts max.bc -f | llc -march=x86 -mcpu=penryn
-f 

        .text 
        .align  16 
        .globl  muda_maxf4 
        .type   muda_maxf4, at function 
muda_maxf4: 
        extractps       $3, %xmm1, %xmm2 
        extractps       $3, %xmm0, %xmm3 
        ucomiss %xmm2, %xmm3 
        ja      .LBB1_2 # 
.LBB1_1:        # 
        movaps  %xmm2, %xmm3 
.LBB1_2:        # 
        extractps       $1, %xmm1, %xmm2 
        extractps       $1, %xmm0, %xmm4 
        ucomiss %xmm2, %xmm4 
        ja      .LBB1_4 # 
.LBB1_3:        # 
        movaps  %xmm2, %xmm4 
.LBB1_4:        # 
        movss   %xmm4, %xmm2 
        unpcklps        %xmm3, %xmm2 
        extractps       $2, %xmm1, %xmm3 
        extractps       $2, %xmm0, %xmm4 
        ucomiss %xmm3, %xmm4 
        ja      .LBB1_6 # 
.LBB1_5:        # 
        movaps  %xmm3, %xmm4 
.LBB1_6:        # 
        ucomiss %xmm1, %xmm0 
        ja      .LBB1_8 # 
.LBB1_7:        # 
        movaps  %xmm1, %xmm0 
.LBB1_8:        # 
        unpcklps        %xmm4, %xmm0 
        unpcklps        %xmm2, %xmm0 
        ret 
        .size   muda_maxf4, .-muda_maxf4 

where I expected to get one of following 

- cmpps + andps + andnps + orps 
- cmpps + blendps(in SSE4 case) 
- maxps

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.