[PATCH] D117450: [LoopInterchange] Support loop interchange with floating point reductions

Tue Jan 18 21:04:39 PST 2022

congzhe added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:900
+      Instruction *I = dyn_cast<Instruction>(followLCSSA(&PHI));
+      if (!F->getFnAttribute("unsafe-fp-math").getValueAsBool() ||
+          I == nullptr || !I->isFast())
----------------
Meinersbur wrote:
> I couldn't find a reference for `unsafe-fp-math`, but I would have guessed that it forces all instructions to be fast-math, even if not marked as fast. If already marked as fast, then no additional tag `unsafe-fp-math` should be necessary.
> 
> Instead of `isFast`, would `allowReassoc` be sufficient (or whatever flag controls commutativity)?
> 
> The logic here will check only one fp instruction. What if two instructions are involved? Such as:
> ```
>   for (...)
>     for (...) {
>       sum += A[i];
>       sum += B[j];
>    }
> print(sum);
> ```
Thanks a lot for the review! I updated the patch accordingly.

Instead of using `isFast()`, I now use `hasAllowReassoc()`.

Instead of only checking the variable obtained from `followLCSSA(&PHI)`, now I added a simple data flow analysis `areAllInstsReassoc()` that checks whether all insturctions involved in the FP reduction allow reassociation. Hopefully this is sufficient for the two-fp-instruction case you provided (and cases where multiple fp instructions are involved).

Regarding "unsafe-fp-math" VS "fast": thank you and I see your point. And yes, I did see cases where the function has the "unsafe-fp-math" attribute but its instructions do not (please see example 1 below). Nevertheless I did also see test cases where instructions are marked "fast" but the surrounding function does not have the "unsafe-fp-math" attribute (please see example 2 below).  It seems like there is some minor inconsistency between the attribute and the flag, so I just wanted to be conservatively correct by checking both -- if we have the attribute then that's fine, otherwise we continue checking instruction flags. However, if you think the logic can be simpilfied I'll be glad to update the code.

Example 1 (clang/test/CodeGen/fp-options-to-fast-math-flags.c):

Note that the `fast` flag is not generated even if compiled with `-ffast-math`.

```
float test(float a) {
  return a + fn(a);
}

// CHECK-FAST: [[CALL_RES:%.+]] = call reassoc nnan ninf nsz arcp afn float @fn(float noundef {{%.+}})
// CHECK-FAST: {{%.+}} = fadd reassoc nnan ninf nsz arcp afn float {{%.+}}, [[CALL_RES]]
```

Example 2 (`llvm/test/Transforms/InstSimplify/ConstProp/math-1.ll`):

```
define double @f_acos() {
; CHECK-LABEL: @f_acos(
; CHECK-NEXT:    ret double 0.000000e+00
;
  %res = tail call fast double @acos(double 1.0)
  ret double %res
}
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117450/new/

https://reviews.llvm.org/D117450