[PATCH] D156538: [AArch64] Try to combine FMUL with FDIV
JinGu Kang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 28 10:30:30 PDT 2023
jaykang10 added a comment.
In D156538#4542937 <https://reviews.llvm.org/D156538#4542937>, @dmgreen wrote:
> All of these examples seem to canonicalize to fmul in the midend: https://godbolt.org/z/hqPv3azjf
> Is it worth keeping the currently lowering for fdiv(sitofp)? Or should we just change that to work with fmul?
I guess you are mentioning `performFDivCombine` function. As you can see, the function convert the `fdiv(sitofp)` into `Intrinsic::aarch64_neon_vcvtfxs2fp` which is vector version and it is matched with below pattern.
multiclass SIMDFPScalarRShift<bit U, bits<5> opc, string asm> {
let Predicates = [HasNEON, HasFullFP16] in {
def h : BaseSIMDScalarShift<U, opc, {0,0,1,?,?,?,?},
FPR16, FPR16, vecshiftR16, asm, []> {
let Inst{19-16} = imm{3-0};
}
} // Predicates = [HasNEON, HasFullFP16]
def s : BaseSIMDScalarShift<U, opc, {0,1,?,?,?,?,?},
FPR32, FPR32, vecshiftR32, asm, []> {
let Inst{20-16} = imm{4-0};
}
def d : BaseSIMDScalarShift<U, opc, {1,?,?,?,?,?,?},
FPR64, FPR64, vecshiftR64, asm, []> {
let Inst{21-16} = imm{5-0};
}
}
...
defm SCVTF : SIMDFPScalarRShift<0, 0b11100, "scvtf">;
...
def : Pat<(int_aarch64_neon_vcvtfxs2fp FPR32:$Rn, vecshiftR32:$imm),
(SCVTFs FPR32:$Rn, vecshiftR32:$imm)>;
As you can see on the `multiclass SIMDFPScalarRShift`, the MIR definition expects `FPR` register classes for input/output. It causes `COPY` MIR, which is `fmov`, between `FPR` and `GPR`.
For scalar version, AArch64 target has below patterns.
multiclass IntegerToFP<bit isUnsigned, string asm, SDPatternOperator node> {
...
def SWSri: BaseIntegerToFP<isUnsigned, GPR32, FPR32, fixedpoint_f32_i32, asm,
[(set FPR32:$Rd,
(fdiv (node GPR32:$Rn),
fixedpoint_f32_i32:$scale))]> {
let Inst{31} = 0; // 32-bit GPR flag
let Inst{23-22} = 0b00; // 32-bit FPR flag
let scale{5} = 1;
}
...
defm SCVTF : IntegerToFP<0, "scvtf", any_sint_to_fp>;
We need to keep `fdiv(sitofp)` node to match above pattern.
In order to use current patterns for scalar version, I have converted `fmul` to `fdiv` using dagcombine with `fmul`.
Do you want to add some code, which handles `fmul`, in `performFDivCombine`. I am not sure whether that is better than this patch or not...
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D156538/new/
https://reviews.llvm.org/D156538
More information about the llvm-commits
mailing list