[llvm] [InstCombine] Fold more 'fcmp' 'select' instrs idioms into 'fabs' (PR #83381)

Wed Mar 20 13:58:51 PDT 2024

andykaylor wrote:

> > * if(BB->getParent()->hasFnAttribute("unsafe-fp-math")){
> > * PN->setFast(true);
> > * }
> 
> I don't think unsafe-fp-math actually directly corresponds to setting all fast math flags, but it's so poorly defined I think it's impossible to make a conclusive statement
> 

Although I don't know if this is documented anywhere, I assume that the `"unsafe-fp-math"` attribute is meant to correspond to the -funsafe-math-optimizations command line option. That's carried over from gcc, where the documentation says it "enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math." So for our purposes, that would mean it implies `nsz`, `reassoc`, and `arcp`.

This actually is the direction I had intended, but I think a different approach might be possible. There are some problems that just can't be solved with the current fast-math flags without the function attributes, but I really would like to eliminate the function attributes, so I'd like to explore other ways around this.

I'm not sure I understand what you mean by "introduce the flags on the phi based on the context instructions when eliminating the alloca." What are the "context instructions"? In the case described in #51601, which this PR is trying to address, the `phi` is constructed by SROA combining the load of value produced by an `fneg` instruction (which has the fast-math flags set) and the load of a value stored from a function argument, which currently has no way of expressing the `nsz` flag apart from this function attribute.

If the `phi` were being constructed from two operations that had both had fast-math flags set, SROA should be able to transfer the flags to the `phi`, but it looks like it currently doesn't.

https://godbolt.org/z/P7o3Y6ac7

If we fixed that and added a way to attach fast-math flags to parameters, that might also be a way to solve this. That seems to be the direction you were going with the `nofpclass` attributes. We talked about this a bit in today LLVM FP working group meeting, and @jcranmer-intel expressed general concerns about setting fast-math flags on `select` and `phi` instructions. I think his concerns about the vagueness of the semantics in that case would also apply to using it on a function argument.

Specifically, Joshua's concern is that select and phi don't actually perform an operation, they are just data moving operations, so fast-math shouldn't affect their results. That's a general concern he has, not specific to the `nsz` flag, I think, but the current definition of `nsz` is definitely insufficient. It says "Allow optimizations to treat the sign of a zero argument or zero result as insignificant." That's a bit ambiguous. I think we need it to say something like "Allow optimizations to treat 0.0 and -0.0 as if they were exactly the same value when used as arguments and return values."

The idea here is that this would be similar to the assumption we make everywhere about signaling and quiet NaNs being the same. This would explicitly allow the optimizer to replace a -0.0 result with 0.0 (or vice versa? -- I think there could be problems with that). And in that case, if we had an equivalent parameter attribute, we wouldn't even need the `nsz` flag on the select operation.

Consider the motivating example for this PR:

```
define dso_local double @floatingAbs(double nsz %0)  {
  %2 = fcmp fast olt double %0, 0.000000e+00
  br i1 %2, label %3, label %5

3:                                                ; preds = %1
  %4 = fneg fast double %0
  br label %6

5:                                                ; preds = %1
  br label %6

6:                                                ; preds = %5, %3
  %.0 = phi double [ %4, %3 ], [ %0, %5 ]
  ret double %.0
}
```

I've added a hypothetical `nsz` parameter attribute here for the sake of discussion. Now suppose `%0` is `-0.0` in this case. The `fcmp` instruction will always return `false` when `%0` is `-0.0`, regardless of the fast-math flags, so the `fneg` instruction won't be used and the `phi` instruction will return `%0` in this case. Since the `nsz` attribute is set on the `%0` parameter, we can treat it as though 0.0 and -0.0 were exactly the same value. This allows the llvm.fabs optimization without the `nsz` flag being set on the `phi` instruction.

Does that seem like a reasonable solution?

https://github.com/llvm/llvm-project/pull/83381