[llvm-dev] The semantics of the fptrunc instruction with an example of incorrect optimisation

Dan Liew via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 21 12:36:37 PDT 2015

I've recently been looking at how to implement in LLVM IR the rounding
of floating point values when casting using different rounding modes
and I've hit some problems.

It seems that when casting down floats to less precise types the
``fptrunc`` LLVM IR instruction is used. The LLVM language reference
suggests that it just truncates the value (which would be equivalent
to rounding towards zero) but this seems to be very misleading because
on the target I'm using (x86_64) that **is not** what happens.

Consider the following example in C

#include <stdio.h>
#include <fenv.h>
int main() {
    double x = 0.3;
    float y = (float) x;
    printf("y (nearest):%a\n", y);
    y = (float) x;
    printf("y (upward):%a\n", y);
    y = (float) x;
    printf("y (downward):%a\n", y);
    return (int) y;

If I get the unoptimised LLVM IR for this by running ``clang -O0
float.c -emit-llvm -c -o float.clang.o0.bc`` I can see that the cast
of variable x is being handled using LLVM IR's ``fptrunc``

  store double 3.000000e-01, double* %x, align 8
  %call = call i32 @fesetround(i32 0) #3
  %0 = load double, double* %x, align 8
  %conv = fptrunc double %0 to float

If I look at the codegened assembly I see that the ``cvtsd2ss`` x86
instruction is used (how rounding is done is controlled by the MXCSR
register apparently).  So this instruction might not "truncate"
depending on how MXCSR is set.

If I run the program
$ clang -O0 float.c -lm -o float.clang.o0
$ ./float.clang.o0
y (nearest):0x1.333334p-2
y (upward):0x1.333334p-2
y (downward):0x1.333332p-2

I can see that the last cast gives a different result because the
rounding mode has been changed as expected.

Now let's see what clang does when we ask it to optimize.

y (nearest):0x1.333334p-2
y (upward):0x1.333334p-2
y (downward):0x1.333334p-2

The result of the last cast is wrong (note gcc at -O3 also seems to do
this) and looking at the optimized LLVM IR reveals why

define i32 @main() #0 {
  %call = tail call i32 @fesetround(i32 0) #2
  %call2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds
([16 x i8], [16 x i8]* @.str, i64 0, i64 0), double
0x3FD3333340000000) #2
  %call3 = tail call i32 @fesetround(i32 2048) #2
  %call6 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds
([15 x i8], [15 x i8]* @.str.1, i64 0, i64 0), double
0x3FD3333340000000) #2
  %call7 = tail call i32 @fesetround(i32 1024) #2
  %call10 = tail call i32 (i8*, ...) @printf(i8* getelementptr
inbounds ([17 x i8], [17 x i8]* @.str.2, i64 0, i64 0), double
0x3FD3333340000000) #2
  ret i32 0

the cast of a constant has been constant folded incorrectly (I guess
that clang is assuming a particular rounding mode which in this case
is sometimes the wrong rounding mode).

I'm not sure if there's a good way to fix this. First I thought it
would be better if the rounding mode was an operand to ``fptrunc``
(which would make constant folding correct) but then I realized that
for codegen to be always correct, every time a ``fptrunc`` is about to
be executed the rounding mode might to be reset which most of the time
would be a very wasteful thing to do.

In general its not (at least in C) possible always know what the
rounding mode is going to be statically at any point during the
program because it's part of the currently executing thread's state.

On the other hand LLVM IR isn't supposed to be tied to C so I feel
like there ought to be away to specify how certain floating point
operations do rounding. (I think these rounding issues apply to more
than just ``fptrunc``)

Any thoughts on this? At the very least the LLVM IR documentation
needs to be more specific about how rounding is done.


More information about the llvm-dev mailing list