[PATCH] D99675: RFC [llvm][clang] Create new intrinsic llvm.arith.fence to control FP optimization at expression level

Andy Kaylor via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 6 13:15:28 PDT 2021


andrew.w.kaylor added a comment.

In D99675#2671924 <https://reviews.llvm.org/D99675#2671924>, @efriedma wrote:

>> The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:
>>
>> llvm.arith.fence(a * b) + c  →  llvm.arith.fence(FMA(a, b, c))
>
> Does this actually block later transforms from unpacking the FMA?  Maybe if the FMA isn't marked "fast"...

Later transforms could unpack the FMA, but the result would be fenced. The intent isn't so much to prevent the FMA from being unpacked as to prevent losing the original fence semantics. That said, it doesn't quite work. For example, you might have this:

  %mul = fmul fast float %a, %b
  %fenced_mul = call float @llvm.arith.fence.f32(%mul)
  %result = fadd fast float %fenced_mul, %c

If there are no other uses of %fenced_mul, that could become

  %tmp = call fast float @llvm.fmuladd.f32(float %a, float %b, float %c)
  %result = call float @llvm.arith.fence.f32(%tmp)

If a later optimization decided to unpack this, it would become this:

  %mul = fmul fast float %a, %b
  %tmp = fadd fast float %mul, %c
  %result = call float @llvm.arith.fence.f32(%tmp)

I suggested this as a way of enabling the FMA optimization. It brings the fadd into the fence, but still protects the fmul from being reassociated or otherwise transformed with other operations outside the fence. In a purely practical sense, this would probably work. In a more strict sense, though, I now see that it has the problem that you could legally distribute the addition within the fence. I can't see a practical reason anyone would do that, but the semantics would allow it. The same ("legal but not practical") is true of forming the fmuladd intrinsic before codegen, I think.

So, no, I don't think this works the way it was intended.

That might push us back to Kevin's suggestion of just not allowing the FMA optimization across a fence.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99675/new/

https://reviews.llvm.org/D99675



More information about the llvm-commits mailing list