[PATCH] D46498: [X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS

Sun May 6 08:47:48 PDT 2018

spatel accepted this revision.
spatel added a subscriber: RKSimon.
spatel added a comment.
This revision is now accepted and ready to land.

LGTM - see inline for possible improvements.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17813-17817
   if ((VT == MVT::f32 && Subtarget.hasSSE1()) ||
       (VT == MVT::v4f32 && Subtarget.hasSSE1() && Reciprocal) ||
       (VT == MVT::v4f32 && Subtarget.hasSSE2() && !Reciprocal) ||
-      (VT == MVT::v8f32 && Subtarget.hasAVX())) {
+      (VT == MVT::v8f32 && Subtarget.hasAVX()) ||
+      (VT == MVT::v16f32 && Subtarget.useAVX512Regs())) {
----------------
Potential enhancements for follow-up patches: 
1. Use the new scalar estimate (VRSQRT14SS) if we have the required AVX-ness. 
2. Use VRSQRT14SD for an f64.
3. Use VRSQRT14PD for vectors of f64.
4. Repeat all of the above for VRCP14xx.

================
Comment at: test/CodeGen/X86/recip-fastmath.ll:1226
-; KNL-NEXT:    vbroadcastss {{.*#+}} zmm1 = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] sched: [10:1.00]
-; KNL-NEXT:    vdivps %zmm0, %zmm1, %zmm0 # sched: [12:1.00]
 ; KNL-NEXT:    retq # sched: [7:1.00]
----------------
Not sure where the timing is defined (cc @RKSimon), but that vdivps timing can't be right. Agner has it at 32:20. Might want to verify the new instruction sequence timings too

https://reviews.llvm.org/D46498