[PATCH] D156538: [AArch64] Try to combine FMUL with FDIV

Fri Jul 28 10:30:30 PDT 2023

jaykang10 added a comment.

In D156538#4542937 <https://reviews.llvm.org/D156538#4542937>, @dmgreen wrote:

> All of these examples seem to canonicalize to fmul in the midend: https://godbolt.org/z/hqPv3azjf
> Is it worth keeping the currently lowering for fdiv(sitofp)? Or should we just change that to work with fmul?

I guess you are mentioning `performFDivCombine` function. As you can see, the function convert the `fdiv(sitofp)` into `Intrinsic::aarch64_neon_vcvtfxs2fp` which is vector version and it is matched with below pattern.

  multiclass SIMDFPScalarRShift<bit U, bits<5> opc, string asm> {
    let Predicates = [HasNEON, HasFullFP16] in {
    def h : BaseSIMDScalarShift<U, opc, {0,0,1,?,?,?,?},
                                FPR16, FPR16, vecshiftR16, asm, []> { 
      let Inst{19-16} = imm{3-0};
    }
    } // Predicates = [HasNEON, HasFullFP16]
    def s : BaseSIMDScalarShift<U, opc, {0,1,?,?,?,?,?},
                                FPR32, FPR32, vecshiftR32, asm, []> { 
      let Inst{20-16} = imm{4-0};
    }
    def d : BaseSIMDScalarShift<U, opc, {1,?,?,?,?,?,?},
                                FPR64, FPR64, vecshiftR64, asm, []> { 
      let Inst{21-16} = imm{5-0};
    }
  }
  ...
  defm SCVTF  : SIMDFPScalarRShift<0, 0b11100, "scvtf">;
  ...
  def : Pat<(int_aarch64_neon_vcvtfxs2fp FPR32:$Rn, vecshiftR32:$imm),
            (SCVTFs FPR32:$Rn, vecshiftR32:$imm)>;

As you can see on the `multiclass SIMDFPScalarRShift`, the MIR definition expects `FPR` register classes for input/output. It causes `COPY` MIR, which is `fmov`, between `FPR` and `GPR`.
For scalar version, AArch64 target has below patterns.

  multiclass IntegerToFP<bit isUnsigned, string asm, SDPatternOperator node> {
  ...
    def SWSri: BaseIntegerToFP<isUnsigned, GPR32, FPR32, fixedpoint_f32_i32, asm,
                               [(set FPR32:$Rd,
                                     (fdiv (node GPR32:$Rn),
                                           fixedpoint_f32_i32:$scale))]> {
      let Inst{31} = 0; // 32-bit GPR flag
      let Inst{23-22} = 0b00; // 32-bit FPR flag
      let scale{5} = 1;
    }
  ...
  defm SCVTF : IntegerToFP<0, "scvtf", any_sint_to_fp>;

We need to keep `fdiv(sitofp)` node to match above pattern.
In order to use current patterns for scalar version, I have converted `fmul` to `fdiv` using dagcombine with `fmul`.
Do you want to add some code, which handles `fmul`,  in `performFDivCombine`. I am not sure whether that is better than this patch or not...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156538/new/

https://reviews.llvm.org/D156538