[LLVMbugs] [Bug 2647] New: extractps selected too eagerly
bugzilla-daemon at cs.uiuc.edu
bugzilla-daemon at cs.uiuc.edu
Thu Aug 7 04:36:06 PDT 2008
http://llvm.org/bugs/show_bug.cgi?id=2647
Summary: extractps selected too eagerly
Product: new-bugs
Version: unspecified
Platform: PC
OS/Version: Windows NT
Status: NEW
Severity: enhancement
Priority: P2
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: nicolas at capens.net
CC: llvmbugs at cs.uiuc.edu
The following LLVM IR compiles to suboptimal code on x86 CPUs with SSE4
support, but optimizes fine on older CPUs:
external global float, align 16 ; <float*>:0 [#uses=2]
define internal void @""() {
load float* @0, align 16 ; <float>:1 [#uses=1]
insertelement <4 x float> undef, float %1, i32 0 ; <<4 x
float>>:2 [#uses=1]
call <4 x float> @llvm.x86.sse.rsqrt.ss( <4 x float> %2 )
; <<4 x float>>:3 [#uses=1]
extractelement <4 x float> %3, i32 0 ; <float>:4 [#uses=1]
store float %4, float* @0, align 16
ret void
}
declare <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone
Here's the result on a Penryn CPU:
push ebp
mov ebp,esp
and esp,0FFFFFFF0h
rsqrtss xmm0,dword ptr ds:[1762ED0h]
extractps eax, xmm0
movd xmm0,eax
movss dword ptr ds:[1762ED0h],xmm0
mov esp,ebp
pop ebp
ret
And this is the lovable code I get on Conroe:
rsqrtss xmm0,dword ptr ds:[1762ED0h]
movss dword ptr ds:[1762ED0h],xmm0
ret
Ignoring the stack setup for now, it looks like extractps is selected too
eagerly for an extractelement v4f32, 0.
P.S: To quickly test with and without SSE4 support just force X86SSELevel to
the desired value in X86Subtarget::AutoDetectSubtargetFeatures().
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list