[LLVMbugs] [Bug 2314] New: Poor codegen of select + vector ops
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Sun May 11 09:59:21 PDT 2008
http://llvm.org/bugs/show_bug.cgi?id=2314
Summary: Poor codegen of select + vector ops
Product: new-bugs
Version: unspecified
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: resistor at mac.com
CC: llvmbugs at cs.uiuc.edu
This message to the list seems to have gotten lost:
Hello LLVM-ers,
I'm now working on writing LLVM IR backend for my SIMD language MUDA.
http://lucille.sourceforge.net/muda/
LLVM is so nice, I observed LLVM do good SIMD optimization
during initial my MUDA -> LLVM IR experience.
but I found, for x86 backend, "select" instruction is mapped to branch,
not triple of and*/andn*/or* (or blend* in SSE4 case),
as described in lib/Target/X86/README-SSE.txt.
Is there any (technical) reason that current(I'm using LLVM trunk) LLVM x86
backend
doesn't do such a optimization?
Is it easy for the LLVM newbie to write a patch to emit
and/andn/or x86 instruction for "select" LLVM instruction?
If so, where should I look at to start to hack it?
Thanks in advance,
Syoyo
FYI, I wrote a following portable vectorized max() function
;; max.ll
define <4xfloat> @muda_maxf4(<4xfloat> %a, <4xfloat> %b)
{
;; extract
%a0 = extractelement <4xfloat> %a, i32 0
%a1 = extractelement <4xfloat> %a, i32 1
%a2 = extractelement <4xfloat> %a, i32 2
%a3 = extractelement <4xfloat> %a, i32 3
%b0 = extractelement <4xfloat> %b, i32 0
%b1 = extractelement <4xfloat> %b, i32 1
%b2 = extractelement <4xfloat> %b, i32 2
%b3 = extractelement <4xfloat> %b, i32 3
;; c[N] = a[N] > b[N]
%c0 = fcmp ogt float %a0, %b0
%c1 = fcmp ogt float %a1, %b1
%c2 = fcmp ogt float %a2, %b2
%c3 = fcmp ogt float %a3, %b3
;; if %c[N] == 1 then %a[N] else %b[N]
%r0 = select i1 %c0, float %a0, float %b0
%r1 = select i1 %c1, float %a1, float %b1
%r2 = select i1 %c2, float %a2, float %b2
%r3 = select i1 %c3, float %a3, float %b3
;; pack
%tmp0 = insertelement <4xfloat> undef, float %r0, i32 0
%tmp1 = insertelement <4xfloat> %tmp0, float %r1, i32 1
%tmp2 = insertelement <4xfloat> %tmp1, float %r2, i32 2
%r = insertelement <4xfloat> %tmp2, float %r3, i32 3
ret <4xfloat> %r
}
and got following assembly code,
$ llvm-as max.ll; opt -std-compile-opts max.bc -f | llc -march=x86 -mcpu=penryn
-f
.text
.align 16
.globl muda_maxf4
.type muda_maxf4, at function
muda_maxf4:
extractps $3, %xmm1, %xmm2
extractps $3, %xmm0, %xmm3
ucomiss %xmm2, %xmm3
ja .LBB1_2 #
.LBB1_1: #
movaps %xmm2, %xmm3
.LBB1_2: #
extractps $1, %xmm1, %xmm2
extractps $1, %xmm0, %xmm4
ucomiss %xmm2, %xmm4
ja .LBB1_4 #
.LBB1_3: #
movaps %xmm2, %xmm4
.LBB1_4: #
movss %xmm4, %xmm2
unpcklps %xmm3, %xmm2
extractps $2, %xmm1, %xmm3
extractps $2, %xmm0, %xmm4
ucomiss %xmm3, %xmm4
ja .LBB1_6 #
.LBB1_5: #
movaps %xmm3, %xmm4
.LBB1_6: #
ucomiss %xmm1, %xmm0
ja .LBB1_8 #
.LBB1_7: #
movaps %xmm1, %xmm0
.LBB1_8: #
unpcklps %xmm4, %xmm0
unpcklps %xmm2, %xmm0
ret
.size muda_maxf4, .-muda_maxf4
where I expected to get one of following
- cmpps + andps + andnps + orps
- cmpps + blendps(in SSE4 case)
- maxps
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list