[PATCH] D46498: [X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS

Sat Oct 15 18:03:52 PDT 2022

craig.topper added inline comments.

================
Comment at: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp:17823
+    // There is no FSQRT for 512-bits, but there is RSQRT14.
+    unsigned Opcode = VT == MVT::v16f32 ? X86ISD::RSQRT14 : X86ISD::FRSQRT;
+    return DAG.getNode(Opcode, SDLoc(Op), VT, Op);
----------------
LuoYuanke wrote:
> @craig.topper, for v4f32 and v8f32, if avx512f is available, do we prefer RSQRT14 or FRSQRT?
FRSQRT is a shorter encoding but the result would probably be more accurate with RSQRT14. Not sure what’s best.

================
Comment at: test/CodeGen/X86/recip-fastmath.ll:1226
-; KNL-NEXT:    vbroadcastss {{.*#+}} zmm1 = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] sched: [10:1.00]
-; KNL-NEXT:    vdivps %zmm0, %zmm1, %zmm0 # sched: [12:1.00]
 ; KNL-NEXT:    retq # sched: [7:1.00]
----------------
spatel wrote:
> Not sure where the timing is defined (cc @RKSimon), but that vdivps timing can't be right. Agner has it at 32:20. Might want to verify the new instruction sequence timings too
KNL is using the Haswell scheduler model I think. And last I looked all the divide instructions were using InstRWs for each instruction. Since Haswell doesn't have VDIVPSZrr it probably just got some garbage default.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D46498/new/

https://reviews.llvm.org/D46498