[llvm-dev] [cfe-dev] Handling of FP denormal values
Cameron McInally via llvm-dev
llvm-dev at lists.llvm.org
Tue Sep 17 08:55:08 PDT 2019
On Tue, Sep 17, 2019 at 11:07 AM Cameron McInally <cameron.mcinally at nyu.edu>
wrote:
> On Mon, Sep 16, 2019 at 9:43 PM Matt Arsenault via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Sep 16, 2019, at 19:57, Kaylor, Andrew via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>> Do we need an ftz fast-math flag?
>>
>>
>> This would be useful for matching a handful of AMDGPU instructions (a
>> fmad that only always flushes being the most important). We have a
>> dedicated intrinsic to allow flushing in this case when denormals are
>> enabled
>>
>
> +1
>
> For FTZ/DAZ, we're currently getting cases like this incorrect:
>
> %add = fadd nnan ninf nsz float %a, 0.000000e+00
>
> That cannot be safely optimized to 'a' with FTZ/DAZ enabled. Although,
> there's admittedly a small chance of problems, since a following FP
> operation would normalize it, but here be dragons.
>
> Are there any other facets to this problem that I've overlooked?
>>
>>
>> For AMDGPU we need to split -denormal-fp-math into per-FP type flags (and
>> the corresponding IR attribute). The denorm mode register has separate
>> fields for f32, and f64+f16. The default for each of these is different
>> depending on the subtarget/language combination. Mostly we want f64+f16 to
>> always be on, and only change the f32 mode. The current naming implies
>> changing all of the modes.
>>
>> The different sign of 0 modes as exist now aren’t available. There are
>> however separate flags for enabling flushing on input and output. This
>> isn’t particular important, and currently we just set both bits at the same
>> time but it might be something to think about if this is being expanded.
>>
>
> At the command-line level, I don't see a lot of value in separating the
> two flags. At the Function/Loop/Block/Instruction level, separating the
> two would be more useful though. E.g. normalizing input/output; or
> sacrificing accuracy to speed up a hot loop.
>
EDIT: 'accuracy' should be 'precision'.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190917/fcdb0c6c/attachment.html>
More information about the llvm-dev
mailing list