[cfe-dev] [llvm-dev] Do we need intrinsics for floating-point classification functions?

Serge Pavlov via cfe-dev cfe-dev at lists.llvm.org
Wed Sep 22 10:43:01 PDT 2021


Ok, that makes sense.
Thank you!

--Serge


On Thu, Sep 23, 2021 at 12:05 AM James Y Knight <jyknight at google.com> wrote:

> Yes, you should add intrinsics again.
>
> I'm not sure "excluding the changes for fast-mode" is the right way to go
> about it, however. I think Clang ought to generate llvm.isnan and friends
> unconditionally in the frontend. The decision on how we want fast-math to
> affect these intrinsics should be reflected in optimizations and
> potentially which fast-math-flags Clang puts on the calls, not in whether
> it emits the intrinsics or something else.
>
> I believe it would be useful to split up the patches in a different way
> than you submitted the first time. I'd suggest ordering it as:
> 1. Introduce the new intrinsics in LLVM IR (all of them, not just isnan).
> 2. Update optimizations, as necessary, to have the desired
> transformations/analyses.
> 3. Switch Clang to generate the intrinsics (unconditionally).
>
> On Tue, Sep 21, 2021 at 1:39 PM Serge Pavlov via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> If nobody argues, in a couple of days I will put back the `llvm.isnan`
>> implementation excluding the changes for fast-mode.
>>
>> Thanks,
>> --Serge
>>
>>
>> On Fri, Sep 3, 2021 at 1:46 PM Serge Pavlov <sepavloff at gmail.com> wrote:
>>
>>> Thank you for summarizing.
>>>
>>> I would prefer Option 2, a separate tool for separate tasks. It seems to
>>> me that a specialized function is easier to implement than a universal one.
>>> As implementation of llvm.isnan demonstrated, there may be various issues
>>> in implementing such functions for different targets. It is easier to
>>> provide optimized versions of small functions which are used more
>>> frequently than `fpclassify`. Besides, `fpclassify` may be used itself and
>>> a user may choose arbitrary constants, which complicates code generation.
>>> It seems that it is more acceptable to tolerate some inefficiency in
>>> `fpclassify` than in basic classification intrinsics. And yes, we should
>>> consider `signbit` with them.
>>>
>>> Another consideration in favor of dedicated intrinsic rather than
>>> ordinary function. Compiler may do some optimizations when it knows the
>>> semantics of the function. In https://reviews.llvm.org/D104854 this
>>> intrinsic was optimized out if its argument was  provided by an operation
>>> with 'nnan' flag. We can also think about an optimization that determines
>>> basic blocks guarded by `llvm.isnan` and assign flag 'nnan' in them. It
>>> could make code faster in many practical cases, and make use of notorious
>>> -ffast-math less attractive.
>>>
>>> Thanks,
>>> --Serge
>>>
>>>
>>> On Fri, Sep 3, 2021 at 12:22 PM James Y Knight <jyknight at google.com>
>>> wrote:
>>>
>>>> As you say, we don't strictly need a new intrinsic -- we can emit the
>>>> code to do the correct integer-based bit-checking in the frontend -- as was
>>>> done before. Yet, that is not ideal. The rationale to have an intrinsic
>>>> instead of the frontend generating integer-bit-manipulation seem good, IMO.
>>>>
>>>> So the question that arises: *which* new intrinsic(s) do we want to
>>>> add? I list 3 options below. I think Option 1 should be discarded from
>>>> consideration, and I'm not sure which of 2 or 3 is best. I'm leaning
>>>> towards 3 -- it seems like it may be simpler -- although I'm not certain.
>>>>
>>>> *Option 1*: we could theoretically address the problem with an
>>>> llvm.experimental.constrained.superquiet_fcmp, which would be the same as
>>>> fcmp, except it would not raise an fp exception even for an sNAN. This
>>>> would be a straightforward substitution for fcmp in Clang's existing
>>>> codegen, and it should work.
>>>>
>>>> However, I don't think this would be a good idea, because it's a poor
>>>> match for hardware. The "superquiet_fcmp" operation doesn't map to CPU
>>>> functionality in any ISA i'm aware of. Generally, the implementation would
>>>> have to be to filter out sNANs prior to using the hardware fp-compare
>>>> instruction. And how you you detect an sNAN without raising an fp
>>>> exceptoin? Fall down to integer ops. But by doing that, you've done work
>>>> almost equivalent to what you needed for the actual classification function
>>>> that you were trying to implement in the first place. Not real useful.
>>>>
>>>> *Option 2*: We could add a whole family of classification intrinsics.
>>>> I think that would be:
>>>>
>>>> llvm.isnan
>>>> llvm.issignaling
>>>> llvm.isinf
>>>> llvm.isfinite
>>>> llvm.isnormal
>>>> llvm.issubnormal
>>>> llvm.iszero
>>>> (Note, some of these are missing corresponding __builtin_is* in Clang
>>>> at the moment -- we have no __builtin_issignaling, __builtin_issubnormal,
>>>> or __builtin_iszero. Probably ought to.)
>>>>
>>>> We don't necessarily need an intrinsic for fpclassify if we have the
>>>> above intrinsics, since it can be built with llvm.isnan, llvm.isinf,
>>>> llvm.iszero, and llvm.isnormal.
>>>>
>>>> *Option 3*: Add only an fpclassify intrinsic.
>>>>
>>>> That is, something like:
>>>>    i32 llvm.fpclassify.i32.f32(i32 %if_snan, i32 %if_qnan, i32
>>>> %if_infinite, i32 %if_normal, i32 %if_subnormal, i32 %if_zero, float %value)
>>>> which classifies the given value, returning the value of the argument
>>>> corresponding to its categorization. We can say the %if_* args are required
>>>> to be constant integers, if we like, for simplicity of implementation.
>>>>
>>>> Thus, Clang would codegen __builtin_isnan/etc like:
>>>>   %isnan = call i1 llvm.fpclassify.i1.f32(i1 1, i1 1, i1 0, i1 0, i1 0,
>>>> i1 0, float %value)
>>>> And for fpclassify, we might generate something like:
>>>>   %ret = call i32 llvm.fpclassify.i32.f32(i32 0, i32 0, i32 1, i32 4,
>>>> i32 3, i32 2, float %value)
>>>>
>>>> On most architectures, we'd expand this intrinsic into appropriate
>>>> integer operations (skipping the parts of classification which are
>>>> irrelevant for the given constant arguments), since there's no
>>>> corresponding hardware instructions available for float classification.
>>>> Or, for non-strictfp functions, we could continue to expand into an
>>>> fcmp-based set of tests...although looking at the asm we currently
>>>> generate, the integer versions may well be faster, other than isnan.
>>>>
>>>> On SystemZ/s390x, this intrinsic would translate almost directly into
>>>> the "test data class" instruction -- so long as the %if_* arguments are all
>>>> 0/1. That's kinda nice. ("Test data class" takes a fp value and a bitmask,
>>>> and returns true if a bit is set in the position corresponding to the
>>>> classification of the fp value.)
>>>>
>>>>
>>>> *Separately*, we have the signbit operation. I think that's the only
>>>> other operation that needs to be addressed related to this RFC. Currently,
>>>> Clang always generates integer bit-ops for __builtin_signbit in the
>>>> frontend. This is arguably OK as is. Yet, completing the set of IR fp
>>>> classification intrinsics seems like it'd be a good idea. So, we could also
>>>> (along with any of the above options) add:
>>>>   i1 llvm.signbit.f32(float %value)
>>>>
>>>>
>>>> On Thu, Sep 2, 2021 at 8:33 AM Serge Pavlov via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Some time ago a new intrinsic `llvm.isnan` was introduced, which was
>>>>> intended to represent IEEE-754 operation `isNaN` as well as a family of C
>>>>> library functions `isnan*`. Then a concern was raised (see
>>>>> https://reviews.llvm.org/D104854) that this functionality should be
>>>>> removed. Discussion in the subsequent RFC (
>>>>> https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html)
>>>>> came to consensus that such intrinsic is necessary. Nevertheless the
>>>>> patches related to the new intrinsic were reverted. I have to restart the
>>>>> discussion in hope to convince the community that this intrinsic and other
>>>>> classification functions are necessary.
>>>>>
>>>>> There are two main reasons why this intrinsic is necessary:
>>>>> 1. It allows correct implementation of `isnan` if strict floating
>>>>> point semantics is in effect,
>>>>> 2. It allows preserving the check in -ffast-math compilation.
>>>>>
>>>>> To facilitate the discussion let's concentrate on the first problem.
>>>>>
>>>>> Previously the frontend intrinsic `__builtin_isnan` was converted into
>>>>> `cmp uno` during IR generation in clang codegen. This solution is not
>>>>> suitable if FP exceptions are not ignored, because compare instructions
>>>>> raise exceptions if its argument is signaling NaN. Both IEEE-754 (5.7.2) an
>>>>> C standard  (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf,
>>>>> F.3p6) demand that this function does not raise floating point exceptions.
>>>>> There was no target-independent IR construct that could represent `isnan`.
>>>>>
>>>>> This drawback was significant enough and some attempts to alleviate it
>>>>> were undertaken. In https://reviews.llvm.org/D95948 `isnan` was
>>>>> implemented using integer operations in strictfp functions. It however is
>>>>> not suitable for targets where a more efficient way exists, like dedicated
>>>>> instruction. Another solution was implemented in
>>>>> https://reviews.llvm.org/D96568, where a hook
>>>>> `clang::TargetCodeGenInfo::testFPKind` was introduced, which injects target
>>>>> specific code into IR. Such a solution makes IR more target-dependent and
>>>>> prevents some IR-level optimizations.
>>>>>
>>>>> To have a solution suitable for all cases, a new intrinsic function
>>>>> `llvm.isnan` was introduced (https://reviews.llvm.org/D104854). It
>>>>> protects the check from undesirable optimizations and preserves it till
>>>>> selector, where it can be lowered in optimal for a particular target way.
>>>>>
>>>>> Other classification functions also need their own intrinsics. In
>>>>> strictfp mode even a check for zero (`iszero`) cannot be made by comparing
>>>>> a value against zero, - if the value is signaling NaN, FP exceptions would
>>>>> be raised. James Y Knight in the previous discussion (
>>>>> https://lists.llvm.org/pipermail/llvm-dev/2021-August/152282.html)
>>>>> listed such "non-computational" functions, which should not signal if
>>>>> provided with an sNAN argument.
>>>>>
>>>>> It looks like new intrinsic is the only consistent and in
>>>>> target-agnostic way to implement these checks in all environments including
>>>>> the case when FP exceptions are not ignored.
>>>>>
>>>>> Any feedback is welcome.
>>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210923/672e0ddb/attachment-0001.html>


More information about the cfe-dev mailing list