[LLVMbugs] [Bug 6393] New: LLVM re-orders MXCSR access in an invalid manner
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Mon Feb 22 13:26:09 PST 2010
http://llvm.org/bugs/show_bug.cgi?id=6393
Summary: LLVM re-orders MXCSR access in an invalid manner
Product: libraries
Version: 2.6
Platform: PC
OS/Version: Linux
Status: NEW
Severity: minor
Priority: P5
Component: Backend: X86
AssignedTo: unassignedbugs at nondot.org
ReportedBy: llvm at henning-thielemann.de
CC: llvmbugs at cs.uiuc.edu
Estimated Hours: 0.0
I want to write a vectorised 'floor' by setting the MXCSR register accordingly
before converting a float vector to an integer vector. However it seems that
LLVM is not aware of the dependency of the rounding operation on that global
flag and reorders the instructions in an invalid way.
My LLVM code is
define <4 x float> @_floor(<4 x float>) {
%oldmxcsr = alloca i32
call void @llvm.x86.sse.stmxcsr(i32* %oldmxcsr)
%newmxcsr = alloca i32
store i32 8192, i32* %newmxcsr
call void @llvm.x86.sse.ldmxcsr(i32* %newmxcsr)
%floorInt = call <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float> %0) readnone
%result = sitofp <4 x i32> %floorInt to <4 x float>
call void @llvm.x86.sse.ldmxcsr(i32* %oldmxcsr)
ret <4 x float> %result
}
declare void @llvm.x86.sse.stmxcsr(i32*) nounwind
declare void @llvm.x86.sse.ldmxcsr(i32*) nounwind
declare <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>) nounwind readnone
which is compiled to
cvtps2dq %xmm0, %xmm0
cvtdq2ps %xmm0, %xmm0
stmxcsr 4(%esp)
movl $8192, (%esp)
ldmxcsr (%esp)
ldmxcsr 4(%esp)
One problem may be that in IntrinsicsX86.td the ldmxcsr instruction has
attribute IntrWriteMem, where I would intuitively say that it must be
IntrReadMem.
As a work-around I just interpret the vector result of cmpps as integers (0 or
-1) and add them for adjusting an truncation operation.
define <4 x float> @_floor(<4 x float>) {
%truncInt = fptosi <4 x float> %0 to <4 x i32>
%truncFloat = sitofp <4 x i32> %truncInt to <4 x float>
%gts = call <4 x float> @llvm.x86.sse.cmp.ps(<4 x float> %0, <4 x float>
%truncFloat, i8 1)
%gtsInt = bitcast <4 x float> %gts to <4 x i32>
%gtsFloat = sitofp <4 x i32> %gtsInt to <4 x float>
%result = add <4 x float> %truncFloat, %gtsFloat
ret <4 x float> %result
}
declare <4 x float> @llvm.x86.sse.cmp.ps(<4 x float>, <4 x float>, i8) nounwind
readnone
This is compiled to
cvttps2dq %xmm0, %xmm1
cvtdq2ps %xmm1, %xmm1
cmpltps %xmm1, %xmm0
cvtdq2ps %xmm0, %xmm0
addps %xmm1, %xmm0
This does not need to modify the MXCSR register at all and maybe it's even
better than the solution of Floor rounding caused by MXCSR configuration.
Btw. the same trick also works for implementation of a fraction function that
always returns a number in the interval [0,1).
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list