-fp-rsqrt flag along with -enable-unsafe-fp-math flag controls the generatation of X86 rsqrt instruction generation and some derived optimizations possible from generated rsqrt instructions. We have observed some effects on precision and hence has put these transformations under the mentioned flag. Following is the details of the -fp-rsqrt flag along with its values and enabled optimizations. -fp-rsqrt - Enable rsqrt ops =off - No rsqrt =on - y/sqrt(x) => y * rsqrt(x) =advance - Standard, sqrt(x) => x * rsqrt(x) =fda - Advance, Derive FMA Generate LLVM IR by gfortran -S -O1 -o - -fplugin=dragonegg.so rsqrt_on.f -fplugin-arg-dragonegg-emit-ir .ll -> .s file generation by llc -O1 -enable-unsafe-fp-math -fp-rsqrt=off/on/advance/fda rsqrt_on.ll -filetype=asm Example 1. Source code real*4 x real*4 y real*4 r r = y/sqrt(x) Input ------- x = 3.0 y = 2.0 LLVM IR %0 = load float* %y, align 4 %1 = load float* %x, align 4 %2 = tail call float @sqrtf(float %1) nounwind readnone %3 = fdiv float %0, %2 store float %3, float* %r, align 4 -fp-rsqrt=off vmovss (%rdi), %xmm0 vsqrtss %xmm0, %xmm0, %xmm0 vmovss (%rsi), %xmm1 vdivss %xmm0, %xmm1, %xmm0 vmovss %xmm0, (%rdx) -fp-rsqrt=on/advance/fda vmovss (%rdi), %xmm0 vrsqrtss %xmm0, %xmm0, %xmm0 vmulss (%rsi), %xmm0, %xmm0 vmovss %xmm0, (%rdx) Input ------- x = 3.0 y = 2.0 Output --------- without rsqrt :: r = 1.15470052 with rsqrt :: r = 1.15469360 Example 2. Source real*4 x real*4 y real*4 r r = sqrt(x) LLVM IR %0 = load float* %x, align 4 %1 = tail call float @sqrtf(float %0) nounwind readnone store float %1, float* %r, align 4 -fp-rsqrt=off/on rsqrt_advance.ll -filetype=asm vmovss (%rdi), %xmm0 vsqrtss %xmm0, %xmm0, %xmm0 vmovss %xmm0, (%rdx) -fp-rsqrt=advance vmovss (%rdi), %xmm0 vrsqrtss %xmm0, %xmm0, %xmm1 vmulss %xmm1, %xmm0, %xmm0 vmovss %xmm0, (%rdx) Input ------- x=2.0 Output --------- without rsqrt:: r = 1.41421354 with rsqrt:: r = 1.41419983 Example 3 Source real*4 x real*4 y real*4 z real*4 t real*4 r t = x+y/sqrt(z) r = x+sqrt(t) LLVM-IR %0 = load float* %x, align 4 %1 = load float* %y, align 4 %2 = load float* %z, align 4 %3 = tail call float @sqrtf(float %2) nounwind readnone %4 = fdiv float %1, %3 %5 = fadd float %0, %4 store float %5, float* %t, align 4 %6 = load float* %x, align 4 %7 = tail call float @sqrtf(float %5) nounwind readnone %8 = fadd float %6, %7 store float %8, float* %r, align 4 -fp-rsqrt=off/on/advance vmovss (%rdx), %xmm0 vrsqrtss %xmm0, %xmm0, %xmm0 vmulss (%rsi), %xmm0, %xmm0 vaddss (%rdi), %xmm0, %xmm0 vmovss %xmm0, (%rcx) vrsqrtss %xmm0, %xmm0, %xmm1 vmulss %xmm1, %xmm0, %xmm0 vaddss (%rdi), %xmm0, %xmm0 vmovss %xmm0, (%r8) -fp-rsqrt=fda vmovss (%rdx), %xmm0 vrsqrtss %xmm0, %xmm0, %xmm0 vmovss (%rsi), %xmm1 vfmadd213ss (%rdi), %xmm1, %xmm0 vmovss %xmm0, (%rcx) vxorps %xmm1, %xmm1, %xmm1 vrsqrtss %xmm0, %xmm0, %xmm1 vfmadd213ss (%rdi), %xmm0, %xmm1 vmovss %xmm1, (%r8) Input ------ x=1.0, y = 2.0, z=3.0 Output -------- -fp-rsqrt=off : t = 2.15469360 r = 2.46788836 -fp-rsqrt=fda: t = 2.15469360 r = 2.46788836