[PATCH] D46498: [X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 15 18:03:52 PDT 2022
craig.topper added inline comments.
================
Comment at: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp:17823
+ // There is no FSQRT for 512-bits, but there is RSQRT14.
+ unsigned Opcode = VT == MVT::v16f32 ? X86ISD::RSQRT14 : X86ISD::FRSQRT;
+ return DAG.getNode(Opcode, SDLoc(Op), VT, Op);
----------------
LuoYuanke wrote:
> @craig.topper, for v4f32 and v8f32, if avx512f is available, do we prefer RSQRT14 or FRSQRT?
FRSQRT is a shorter encoding but the result would probably be more accurate with RSQRT14. Not sure what’s best.
================
Comment at: test/CodeGen/X86/recip-fastmath.ll:1226
-; KNL-NEXT: vbroadcastss {{.*#+}} zmm1 = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] sched: [10:1.00]
-; KNL-NEXT: vdivps %zmm0, %zmm1, %zmm0 # sched: [12:1.00]
; KNL-NEXT: retq # sched: [7:1.00]
----------------
spatel wrote:
> Not sure where the timing is defined (cc @RKSimon), but that vdivps timing can't be right. Agner has it at 32:20. Might want to verify the new instruction sequence timings too
KNL is using the Haswell scheduler model I think. And last I looked all the divide instructions were using InstRWs for each instruction. Since Haswell doesn't have VDIVPSZrr it probably just got some garbage default.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D46498/new/
https://reviews.llvm.org/D46498
More information about the llvm-commits
mailing list