[PATCH] D156538: [AArch64] Try to combine FMUL with FDIV

Wed Aug 2 01:30:34 PDT 2023

dmgreen added a comment.

In D156538#4551097 <https://reviews.llvm.org/D156538#4551097>, @jaykang10 wrote:

>> Does it make sense to keep them selection fdiv, or should they always just match fmul? It would seem we only need one, and fmul is more canonical.
>
> Additionally, `multiclass FPToIntegerScaled` and `multiclass IntegerToFP` share the `SelectCVTFixedPosOperand` as below.
>
>   class fixedpoint_i32<ValueType FloatVT>
>     : Operand<FloatVT>,
>       ComplexPattern<FloatVT, 1, "SelectCVTFixedPosOperand<32>", [fpimm, ld]> {
>     let EncoderMethod = "getFixedPointScaleOpValue";
>     let DecoderMethod = "DecodeFixedPointScaleImm32";
>     let ParserMatchClass = Imm1_32Operand;
>   }
>   ...
>   
>   def fixedpoint_f32_i64 : fixedpoint_i64<f32>;
>   ...
>   
>   multiclass FPToIntegerScaled<bits<2> rmode, bits<3> opcode, string asm,
>                                SDPatternOperator OpN> {
>   ...
>     def SXSri : BaseFPToInteger<0b00, rmode, opcode, FPR32, GPR64,
>                                 fixedpoint_f32_i64, asm,
>                 [(set GPR64:$Rd, (OpN (fmul FPR32:$Rn,
>                                             fixedpoint_f32_i64:$scale)))]> {
>       let Inst{31} = 1; // 64-bit GPR flag
>     }
>   ...
>   multiclass IntegerToFP<bit isUnsigned, string asm, SDPatternOperator node> {
>   ...
>     def SXSri: BaseIntegerToFP<isUnsigned, GPR64, FPR32, fixedpoint_f32_i64, asm,
>                                [(set FPR32:$Rd,
>                                      (fdiv (node GPR64:$Rn),
>                                            fixedpoint_f32_i64:$scale))]> {
>       let Inst{31} = 1; // 64-bit GPR flag
>       let Inst{23-22} = 0b00; // 32-bit FPR flag
>     }
>
> The `SelectCVTFixedPosOperand` is for `fcvt` and checks the constant is `2^fbits`.
> If we keep the `fdiv` for `scvtf`, we can use `SelectCVTFixedPosOperand`.
> If we replace the `fdiv` with `fmul` for `scvtf`, we need other complex pattern which checks the constant `1/2^fbits`.
> GCC uses `fmul` for `fcvt` and `scvft` but it has two patterns for `2^n` and `1/2^n`.
> From my personal opinion, as current implementation, it would also be good to keep one complex pattern with `fmul` and `fdiv`.

I see. That makes sense, but we may need to take that route anyway. I worry that if we do it this way we will just end up in a loop, transforming fdiv to fmul and back again.

There is a generic DAG combine in this bit of code that does the inverse transform: https://github.com/llvm/llvm-project/blob/c2093b85044d87805c39267c65ac9032d5454e0e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L16543. It currently only triggers with UnsafeFPMath or AllowReciprocal, which is probably why it doesn't come up in the tests. According to the InstCombine version it should be fine for any constant that has an exact inverse (which seems the same as what you have here too), so should be more generally applicable.

I think my vote would still be for changing IntegerToFP to use fmul with a difference ComplexPat, but if you do go this route it will need some way of preventing the infinite folding back and forth.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156538/new/

https://reviews.llvm.org/D156538