[PATCH] The LoopVectorizer and libm sqrt

Thu Sep 12 13:03:35 PDT 2013

On 09/12/2013 09:58 PM, Arnold Schwaighofer wrote:
>
> On Sep 12, 2013, at 2:52 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>> ----- Original Message -----
>>> On 09/12/2013 09:34 PM, Hal Finkel wrote:
>>>> ----- Original Message -----
>>>>> ----- Original Message -----
>>>>>> I did not write this code but I assume this was done on purpose
>>>>>> because our llvm.sqrt intrinsics has a slightly different
>>>>>> semantics:
>>>>>>
>>>>>> The ‘llvm.sqrt‘ intrinsics return the sqrt of the specified
>>>>>> operand,
>>>>>> returning the same value as the libm ‘sqrt‘ functions would.
>>>>>> Unlike
>>>>>> sqrt in libm, however, llvm.sqrt has undefined behavior for
>>>>>> negative
>>>>>> numbers other than -0.0 (which allows for better optimization,
>>>>>> because there is no need to worry about errno being set).
>>>>>> llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sort.
>>>>>
>>>>> Hrmm... okay; I'll send a revised patch where we explicitly check
>>>>> for
>>>>> fast-math mode.
>>>>
>>>> Or perhaps not; the TargetOptions are not available at the IR-level
>>>> right now, and so this seems to leave us with two options:
>>>>
>>>>   1. Feed something through TTI
>>>>
>>>>   2. Have Clang generate the intrinsic directly in fast-math mode
>>>>
>>>> I'm leaning toward (1), because I'd like to give the target the
>>>> ability to declare the availability of a vectorized sqrt that is
>>>> suitable as a libm sqrt replacement.
>>>>
>>>> What do you think?
>>>
>>> It seems only 2) would allow to mix fast-math and non-fast-math
>>> modes,
>>> which in fact my be very helpful in case of LTO and/or math libraries
>>> that provide fast as well as precise versions of a function.
>>
>> I don't think that's right. For one thing, the 'fast math' flags are now stored in function-level attributes (specifically for this reason).
>
> Except that the inliner ignores them afaik :).
>
> I believe that floating point function level attributes are in a somewhat broken state, they work because I think LTO will remove them if they mismatch (at least I hope so :)

Are you aware of why they were added in the first place? It seems they 
implement something similar as the per-instruction flags. As there are 
obvious problems both due to inlining, but also just with having two 
ways to pass fast-math information, I wonder what was the motivation to 
add them in the first place?

Tobias