[PATCH] D46498: [X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun May 6 08:47:48 PDT 2018
spatel accepted this revision.
spatel added a subscriber: RKSimon.
spatel added a comment.
This revision is now accepted and ready to land.
LGTM - see inline for possible improvements.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:17813-17817
if ((VT == MVT::f32 && Subtarget.hasSSE1()) ||
(VT == MVT::v4f32 && Subtarget.hasSSE1() && Reciprocal) ||
(VT == MVT::v4f32 && Subtarget.hasSSE2() && !Reciprocal) ||
- (VT == MVT::v8f32 && Subtarget.hasAVX())) {
+ (VT == MVT::v8f32 && Subtarget.hasAVX()) ||
+ (VT == MVT::v16f32 && Subtarget.useAVX512Regs())) {
----------------
Potential enhancements for follow-up patches:
1. Use the new scalar estimate (VRSQRT14SS) if we have the required AVX-ness.
2. Use VRSQRT14SD for an f64.
3. Use VRSQRT14PD for vectors of f64.
4. Repeat all of the above for VRCP14xx.
================
Comment at: test/CodeGen/X86/recip-fastmath.ll:1226
-; KNL-NEXT: vbroadcastss {{.*#+}} zmm1 = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] sched: [10:1.00]
-; KNL-NEXT: vdivps %zmm0, %zmm1, %zmm0 # sched: [12:1.00]
; KNL-NEXT: retq # sched: [7:1.00]
----------------
Not sure where the timing is defined (cc @RKSimon), but that vdivps timing can't be right. Agner has it at 32:20. Might want to verify the new instruction sequence timings too
https://reviews.llvm.org/D46498
More information about the llvm-commits
mailing list