[PATCH] Restore the sqrt -> llvm.sqrt mapping in fast-math mode

Thu Sep 12 14:14:51 PDT 2013

On 09/12/2013 11:09 PM, Hal Finkel wrote:
> With the patch this time;)
>
>   -Hal
>
> ----- Original Message -----
>> >On 09/12/2013 11:03 PM, Hal Finkel wrote:
>>> > >Hello,
>>> > >
>>> > >Please review the attached patch which restores the libm sqrt* ->
>>> > >@llvm.sqrt* mapping, but only in fast-math mode (specifically,
>>> > >when the UnsafeFPMath or NoNaNsFPMath CodeGen options are
>>> > >enabled). The @llvm.sqrt* intrinsics have slightly different
>>> > >semantics from the libm call, specifically, they are undefined
>>> > >when given a non-zero negative number (the libm calls will always
>>> > >return NaN for any negative number).
>>> > >
>>> > >This mapping was removed in r100613, and replaced with a TODO, but
>>> > >at that time the fast-math flags were not yet implemented. Now
>>> > >that we have these, restoring this mapping is important because it
>>> > >will enable autovectorization of sqrt calls in loops (at least in
>>> > >fast-math mode).
>> >
>> >Patch?
>> >
>> >Tobias
>> >
>> >
> -- Hal Finkel Assistant Computational Scientist Leadership Computing
> Facility Argonne National Laboratory
>
>
> sqrt-intrin-fm-mapping.patch
>
>
> Index: test/CodeGen/libcalls.c
> ===================================================================
> --- test/CodeGen/libcalls.c	(revision 190624)
> +++ test/CodeGen/libcalls.c	(working copy)
> @@ -1,8 +1,10 @@
>   // RUN: %clang_cc1 -fmath-errno -emit-llvm -o - %s -triple i386-unknown-unknown | FileCheck -check-prefix CHECK-YES %s
>   // RUN: %clang_cc1 -emit-llvm -o - %s -triple i386-unknown-unknown | FileCheck -check-prefix CHECK-NO %s
> +// RUN: %clang_cc1 -ffast-math -menable-unsafe-fp-math -emit-llvm -o - %s -triple i386-unknown-unknown | FileCheck -check-prefix CHECK-FAST %s

Do you need -ffast-math here?

>
>   // CHECK-YES-LABEL: define void @test_sqrt
>   // CHECK-NO-LABEL: define void @test_sqrt
> +// CHECK-FAST-LABEL: define void @test_sqrt
>   void test_sqrt(float a0, double a1, long double a2) {
>     // Following llvm-gcc's lead, we never emit these as intrinsics;
>     // no-math-errno isn't good enough.  We could probably use intrinsics
> @@ -27,6 +29,9 @@
>   // CHECK-NO: declare float @sqrtf(float) [[NUW_RN:#[0-9]+]]
>   // CHECK-NO: declare double @sqrt(double) [[NUW_RN]]
>   // CHECK-NO: declare x86_fp80 @sqrtl(x86_fp80) [[NUW_RN]]
> +// CHECK-FAST: declare float @llvm.sqrt.f32(float)
> +// CHECK-FAST: declare double @llvm.sqrt.f64(double)
> +// CHECK-FAST: declare x86_fp80 @llvm.sqrt.f80(x86_fp80)
>
>   // CHECK-YES-LABEL: define void @test_pow
>   // CHECK-NO-LABEL: define void @test_pow
> Index: lib/CodeGen/CGBuiltin.cpp
> ===================================================================
> --- lib/CodeGen/CGBuiltin.cpp	(revision 190624)
> +++ lib/CodeGen/CGBuiltin.cpp	(working copy)
> @@ -1282,12 +1282,19 @@
>     case Builtin::BIsqrt:
>     case Builtin::BIsqrtf:
>     case Builtin::BIsqrtl: {
> -    // TODO: there is currently no set of optimizer flags
> -    // sufficient for us to rewrite sqrt to @llvm.sqrt.
> -    // -fmath-errno=0 is not good enough; we need finiteness.
> -    // We could probably precondition the call with an ult
> -    // against 0, but is that worth the complexity?
> -    break;
> +    // Transform a call to sqrt* into a @llvm.sqrt.* intrinsic call, but only
> +    // in finite- or unsafe-math mode (the intrinsic has different semantics
> +    // for handling negative numbers compared to the library function, so
> +    // -fmath-errno=0 is not enough).

I did not read the surrounding code, but I assume -fmath-errno=0 is 
already checked outside.

> +    if (!FD->hasAttr<ConstAttr>())
> +      break;
> +    if (!(CGM.getCodeGenOpts().UnsafeFPMath ||
> +          CGM.getCodeGenOpts().NoNaNsFPMath))

Otherwise this looks good from my side, but needs a clang person to 
accept it.

Tobias