[PATCH] D99675: RFC [llvm][clang] Create new intrinsic llvm.arith.fence to control FP optimization at expression level

Tue Apr 6 11:46:57 PDT 2021

kbsmith1 added a comment.

In D99675#2671924 <https://reviews.llvm.org/D99675#2671924>, @efriedma wrote:

>> The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:
>>
>> llvm.arith.fence(a * b) + c  →  llvm.arith.fence(FMA(a, b, c))
>
> Does this actually block later transforms from unpacking the FMA?  Maybe if the FMA isn't marked "fast"...

I think we could define llvm.arith.fence to be such that this FMA contraction isn't legal/correct, or it could be left as is.  In the implementation that was used for the Intel compiler FMA contraction did not occur across an an __fence boundary.  It is unclear whether that was intended as the semantic, or if we just never bothered to implement that contraction.
Not allowing the FMA contraction across the llvm.arith.fence would make unpacking an FMA allowed under the same circumstances that LLVM currently allows that.

> ----
>
> How is llvm.arith.fence() different from using "freeze" on a floating-point value?  The goal isn't really the same, sure, but the effects seem similar at first glance.

They are similar.  However, fence is a no-op if the operand can be proven not to be undef or poison, and in such circumstances could be removed by an optimizer.  llvm.arith.fence cannot be removed by an optimizer, because doing so might allow instructions that were "outside" the fence from being reassociated/distrbuted with the instructions/operands that were inside the fence.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99675/new/

https://reviews.llvm.org/D99675