[LLVMbugs] [Bug 6393] New: LLVM re-orders MXCSR access in an invalid manner

Mon Feb 22 13:26:09 PST 2010

http://llvm.org/bugs/show_bug.cgi?id=6393

           Summary: LLVM re-orders MXCSR access in an invalid manner
           Product: libraries
           Version: 2.6
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P5
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: llvm at henning-thielemann.de
                CC: llvmbugs at cs.uiuc.edu
   Estimated Hours: 0.0

I want to write a vectorised 'floor' by setting the MXCSR register accordingly
before converting a float vector to an integer vector. However it seems that
LLVM is not aware of the dependency of the rounding operation on that global
flag and reorders the instructions in an invalid way.

My LLVM code is

define <4 x float> @_floor(<4 x float>) {
  %oldmxcsr = alloca i32
  call void @llvm.x86.sse.stmxcsr(i32* %oldmxcsr)
  %newmxcsr = alloca i32
  store i32 8192, i32* %newmxcsr
  call void @llvm.x86.sse.ldmxcsr(i32* %newmxcsr)
  %floorInt = call <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float> %0) readnone
  %result = sitofp <4 x i32> %floorInt to <4 x float>
  call void @llvm.x86.sse.ldmxcsr(i32* %oldmxcsr)
  ret <4 x float> %result
}

declare void @llvm.x86.sse.stmxcsr(i32*) nounwind
declare void @llvm.x86.sse.ldmxcsr(i32*) nounwind
declare <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>) nounwind readnone

which is compiled to

    cvtps2dq    %xmm0, %xmm0
    cvtdq2ps    %xmm0, %xmm0
    stmxcsr    4(%esp)
    movl    $8192, (%esp)
    ldmxcsr    (%esp)
    ldmxcsr    4(%esp)

One problem may be that in IntrinsicsX86.td the ldmxcsr instruction has
attribute IntrWriteMem, where I would intuitively say that it must be
IntrReadMem.

As a work-around I just interpret the vector result of cmpps as integers (0 or
-1) and add them for adjusting an truncation operation.

define <4 x float> @_floor(<4 x float>) {
  %truncInt   = fptosi <4 x float> %0 to <4 x i32>
  %truncFloat = sitofp <4 x i32> %truncInt to <4 x float>
  %gts = call <4 x float> @llvm.x86.sse.cmp.ps(<4 x float> %0, <4 x float>
%truncFloat, i8 1)
  %gtsInt = bitcast <4 x float> %gts to <4 x i32>
  %gtsFloat = sitofp <4 x i32> %gtsInt to <4 x float>
  %result = add <4 x float> %truncFloat, %gtsFloat
  ret <4 x float> %result
}

declare <4 x float> @llvm.x86.sse.cmp.ps(<4 x float>, <4 x float>, i8) nounwind
readnone

This is compiled to

    cvttps2dq    %xmm0, %xmm1
    cvtdq2ps    %xmm1, %xmm1
    cmpltps    %xmm1, %xmm0
    cvtdq2ps    %xmm0, %xmm0
    addps    %xmm1, %xmm0

This does not need to modify the MXCSR register at all and maybe it's even
better than the solution of Floor rounding caused by MXCSR configuration.
Btw. the same trick also works for implementation of a fraction function that
always returns a number in the interval [0,1).

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.