[llvm] [InstCombine] Canonicalize reassoc contract fmuladd to fmul + fadd (PR #90434)
Andy Kaylor via llvm-commits
llvm-commits at lists.llvm.org
Wed May 8 11:35:40 PDT 2024
================
@@ -204,6 +204,34 @@ define float @fmuladd_fneg_x_fneg_y_fast(float %x, float %y, float %z) {
ret float %fmuladd
}
+define float @fmuladd_unfold(float %x, float %y, float %z) {
+; CHECK-LABEL: @fmuladd_unfold(
+; CHECK-NEXT: [[TMP1:%.*]] = fmul reassoc contract float [[X:%.*]], [[Y:%.*]]
+; CHECK-NEXT: [[FMULADD:%.*]] = fadd reassoc contract float [[TMP1]], [[Z:%.*]]
+; CHECK-NEXT: ret float [[FMULADD]]
+;
+ %fmuladd = call reassoc contract float @llvm.fmuladd.f32(float %x, float %y, float %z)
+ ret float %fmuladd
+}
+
+define float @fmuladd_unfold_missing_reassoc(float %x, float %y, float %z) {
+; CHECK-LABEL: @fmuladd_unfold_missing_reassoc(
+; CHECK-NEXT: [[FMULADD:%.*]] = call contract float @llvm.fmuladd.f32(float [[X:%.*]], float [[Y:%.*]], float [[Z:%.*]])
+; CHECK-NEXT: ret float [[FMULADD]]
+;
+ %fmuladd = call contract float @llvm.fmuladd.f32(float %x, float %y, float %z)
+ ret float %fmuladd
+}
+
+define float @fmuladd_unfold_missing_contract(float %x, float %y, float %z) {
+; CHECK-LABEL: @fmuladd_unfold_missing_contract(
+; CHECK-NEXT: [[FMULADD:%.*]] = call reassoc float @llvm.fmuladd.f32(float [[X:%.*]], float [[Y:%.*]], float [[Z:%.*]])
+; CHECK-NEXT: ret float [[FMULADD]]
+;
+ %fmuladd = call reassoc float @llvm.fmuladd.f32(float %x, float %y, float %z)
+ ret float %fmuladd
+}
+
----------------
andykaylor wrote:
> * I also add **FMF.allowReassoc()** because of the case **a + fmuladd (b, c, d)**
> If we unfold it into **a + b * c + d**, then the calculation sequence is **(a + b * c) + d**, which is different from the original **a + (b * c + d)**.
I don't think that case is a problem if we consider this a legal transformation (more on that in a second). Visually, it looks like a problem, but when you write it out in IR it isn't because `a + fmuladd(b, c, d)` would be represented like this:
```
%t1 = call contract float @llvm.fmuladd(float %b, float %c, float %d)
%t2 = fadd contract float %t1, %a
```
And your transformation would rewrite it like this:
```
%t1 = fmul contract float %b, %c
%t2 = fadd contract float %t1, %d
%t3 = fadd contract float %t2, %a
```
So, the reassociation implied by writing out this case in algebraic form isn't there in the IR.
However, the issue with the implied arithmetic fence that Matt brought up is worth exploring. I was going to raise this issue on @jcranmer-intel 's FMF RFC, because it also comes up with the vector reduce intrinsics.
The reason the front end uses `llvm.fmuladd` with fp-contract=on is that the intrinsic itself means that the operations can be contracted with each other, but not with other operations. We didn't have the arithmetic fence intrinsic when this problem was originally being addressed, but if we did it could have been used to express what the `llvm.fmuladd` intrinsic does. That is, without any fast-math flags this
`%r = call float @llvm.fmuladd(float %b, float %c, float %d)`
is semantically equivalent to this:
```
%t1 = fmul contract float %b, %c
%t2 = fadd contract float %t1, %d
%r = call float @llvm.arithmetic.fence.f32(float %t2)
```
We can contract %t1 and %t2 with one another, but not with any other operation.
Now, introducing the `contract` fast-math flag raises an interesting question that isn't clearly answered in the current language reference. Do the fast-math flags on an intrinsic apply to the implied operations within the intrinsic or only to the intrinsic as an atomic operation interacting with other operations?
Here's what I mean. If you have this IR:
```
%t1 = call contract float @llvm.fmuladd(float %b, float %c, float %d)
%t2 = fadd contract float %t1, %a
```
And you expand it, keeping the implied arithmetic fence, you get this:
```
%t1 = fmul contract float %b, %c
%t2 = fadd contract float %t1, %d
%t3 = call contract float @llvm.arithmetic.fence.f32(float %t2)
%t4 = fadd contract float %t4, %a
```
In this case, the arithmetic fence prevents the interior `fmul` and `fadd` from the original intrinsic from being contracted with the exterior `fadd` so you didn't really gain anything from the expansion.
On the other hand, if you take the approach that the `contract` flag overrides the implied arithmetic fence, this works.
This brings me back to my question about what the front end that generated the IR intended. If you had been compiling with clang and -ffp-contract=fast, clang wouldn't have used the intrinsic because it would intend to allow the operations to be contracted in any way the optimizer thinks is profitable. The fact that it was coming from classic flang makes this a bit more interesting, because I don't think Fortran has any requirement about operations being in the same expression to enable fused operations.
That said, I don't think the fast-math flags should eliminate the implied arithmetic fence. I think both the original canonicalization and this proposed extension to it are illegal. If flag had intended to allow contraction across operations, it shouldn't have used the intrinsic.
https://github.com/llvm/llvm-project/pull/90434
More information about the llvm-commits
mailing list