[cfe-dev] [llvm-dev] Handling of FP denormal values

Tue Sep 17 09:27:08 PDT 2019

>> At the command-line level, I don't see a lot of value in separating the two flags. At the Function/Loop/Block/Instruction level, separating the two would be more useful though. E.g. normalizing input/output; or sacrificing accuracy to speed up a hot loop.
> EDIT: 'accuracy' should be 'precision'.

How do you imagine that being specified in the local scope? Two ways that come to mind would be a pragma or an intrinsic. The pragma would probably be the cleanest, though more work for the front end. I suspect most architectures already have intrinsics to control this locally, but we could possibly add a target-independent intrinsic that would be better for the optimizer. But I think you want this to set or clear a flag on individual operations to help with instruction selection, right?

From: Cameron McInally <cameron.mcinally at nyu.edu>
Sent: Tuesday, September 17, 2019 8:55 AM
To: Matt Arsenault <arsenm2 at gmail.com>
Cc: Kaylor, Andrew <andrew.kaylor at intel.com>; LLVM Developers Mailing List <llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Handling of FP denormal values

On Tue, Sep 17, 2019 at 11:07 AM Cameron McInally <cameron.mcinally at nyu.edu<mailto:cameron.mcinally at nyu.edu>> wrote:
On Mon, Sep 16, 2019 at 9:43 PM Matt Arsenault via cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

On Sep 16, 2019, at 19:57, Kaylor, Andrew via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Do we need an ftz fast-math flag?

This would be useful for matching a handful of AMDGPU instructions (a fmad that only always flushes being the most important). We have a dedicated intrinsic to allow flushing in this case when denormals are enabled

+1

For FTZ/DAZ, we're currently getting cases like this incorrect:

  %add = fadd nnan ninf nsz float %a, 0.000000e+00

That cannot be safely optimized to 'a' with FTZ/DAZ enabled. Although, there's admittedly a small chance of problems, since a following FP operation would normalize it, but here be dragons.

Are there any other facets to this problem that I've overlooked?

For AMDGPU we need to split -denormal-fp-math into per-FP type flags (and the corresponding IR attribute). The denorm mode register has separate fields for f32, and f64+f16. The default for each of these is different depending on the subtarget/language combination. Mostly we want f64+f16 to always be on, and only change the f32 mode. The current naming implies changing all of the modes.

The different sign of 0 modes as exist now aren’t available. There are however separate flags for enabling flushing on input and output. This isn’t particular important, and currently we just set both bits at the same time but it might be something to think about if this is being expanded.

At the command-line level, I don't see a lot of value in separating the two flags. At the Function/Loop/Block/Instruction level, separating the two would be more useful though. E.g. normalizing input/output; or sacrificing accuracy to speed up a hot loop.

EDIT: 'accuracy' should be 'precision'.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190917/51a93387/attachment.html>