-fp-rsqrt flag along with -enable-unsafe-fp-math flag controls the generatation of X86 rsqrt instruction generation with some minor precision variations as illustrated below with different examples. Some derived optimizations(e.g. more FMA generation) become possible from generated rsqrt and multiplication instructions. -fp-rsqrt - Enable rsqrt ops =off - No rsqrt =on - y/sqrt(x) => y * rsqrt(x) =advance - Standard, sqrt(x) => x * rsqrt(x) Generate LLVM IR by gfortran -S -O1 -o - -fplugin=dragonegg.so rsqrt_on.f -fplugin-arg-dragonegg-emit-ir .ll -> .s file generation by llc -O1 -enable-unsafe-fp-math -fp-rsqrt=off/on/advance/fda rsqrt_on.ll -filetype=asm Example 1. Source code real*4 x real*4 y real*4 r r = y/sqrt(x) Input ------- x = 3.0 y = 2.0 LLVM IR %0 = load float* %y, align 4 %1 = load float* %x, align 4 %2 = tail call float @sqrtf(float %1) nounwind readnone %3 = fdiv float %0, %2 store float %3, float* %r, align 4 -fp-rsqrt=off vmovss (%rdi), %xmm0 vsqrtss %xmm0, %xmm0, %xmm0 vmovss (%rsi), %xmm1 vdivss %xmm0, %xmm1, %xmm0 vmovss %xmm0, (%rdx) -fp-rsqrt=on/advance vmovss (%rdi), %xmm0 vrsqrtss %xmm0, %xmm0, %xmm0 vmulss (%rsi), %xmm0, %xmm0 vmovss %xmm0, (%rdx) Input ------- x = 3.0 y = 2.0 Output --------- without rsqrt :: r = 1.15470052 with rsqrt :: r = 1.15469360 Example 2. Source real*4 x real*4 y real*4 r r = sqrt(x) LLVM IR %0 = load float* %x, align 4 %1 = tail call float @sqrtf(float %0) nounwind readnone store float %1, float* %r, align 4 -fp-rsqrt=off/on rsqrt_advance.ll -filetype=asm vmovss (%rdi), %xmm0 vsqrtss %xmm0, %xmm0, %xmm0 vmovss %xmm0, (%rdx) -fp-rsqrt=advance vmovss (%rdi), %xmm0 vrsqrtss %xmm0, %xmm0, %xmm1 vmulss %xmm1, %xmm0, %xmm0 vmovss %xmm0, (%rdx) Input ------- x=2.0 Output --------- without rsqrt:: r = 1.41421354 with rsqrt:: r = 1.41419983 Example 3 Source real*4 x real*4 y real*4 z real*4 t real*4 r t = x+y/sqrt(z) r = x+sqrt(t) LLVM-IR %0 = load float* %x, align 4 %1 = load float* %y, align 4 %2 = load float* %z, align 4 %3 = tail call float @sqrtf(float %2) nounwind readnone %4 = fdiv float %1, %3 %5 = fadd float %0, %4 store float %5, float* %t, align 4 %6 = load float* %x, align 4 %7 = tail call float @sqrtf(float %5) nounwind readnone %8 = fadd float %6, %7 store float %8, float* %r, align 4 -fp-rsqrt=on/advance vmovss (%rdx), %xmm0 vrsqrtss %xmm0, %xmm0, %xmm0 vmovss (%rsi), %xmm1 vfmadd213ss (%rdi), %xmm1, %xmm0 vmovss %xmm0, (%rcx) vxorps %xmm1, %xmm1, %xmm1 vrsqrtss %xmm0, %xmm0, %xmm1 vfmadd213ss (%rdi), %xmm0, %xmm1 vmovss %xmm1, (%r8) Input ------ x=1.0, y = 2.0, z=3.0 Output -------- -fp-rsqrt=off : t = 2.15469360 r = 2.46788836 -fp-rsqrt=on/advance: t = 2.15469360 r = 2.46788836