[llvm] [DAG] Fold fdiv X, c2 -> fmul X, 1/c2 without AllowReciprocal if exact (PR #93882)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Fri May 31 00:10:50 PDT 2024
================
@@ -17258,26 +17258,29 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
if (SDValue V = combineRepeatedFPDivisors(N))
return V;
- if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()) {
- // fold (fdiv X, c2) -> fmul X, 1/c2 if losing precision is acceptable.
- if (auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1)) {
- // Compute the reciprocal 1.0 / c2.
- const APFloat &N1APF = N1CFP->getValueAPF();
- APFloat Recip(N1APF.getSemantics(), 1); // 1.0
- APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
- // Only do the transform if the reciprocal is a legal fp immediate that
- // isn't too nasty (eg NaN, denormal, ...).
- if ((st == APFloat::opOK || st == APFloat::opInexact) && // Not too nasty
- (!LegalOperations ||
- // FIXME: custom lowering of ConstantFP might fail (see e.g. ARM
- // backend)... we should handle this gracefully after Legalize.
- // TLI.isOperationLegalOrCustom(ISD::ConstantFP, VT) ||
- TLI.isOperationLegal(ISD::ConstantFP, VT) ||
- TLI.isFPImmLegal(Recip, VT, ForCodeSize)))
- return DAG.getNode(ISD::FMUL, DL, VT, N0,
- DAG.getConstantFP(Recip, DL, VT));
- }
+ // fold (fdiv X, c2) -> (fmul X, 1/c2) if there is no loss in precision, or
+ // the loss is acceptable with AllowReciprocal.
+ if (auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1)) {
+ // Compute the reciprocal 1.0 / c2.
+ const APFloat &N1APF = N1CFP->getValueAPF();
+ APFloat Recip(N1APF.getSemantics(), 1); // 1.0
+ APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
+ // Only do the transform if the reciprocal is a legal fp immediate that
+ // isn't too nasty (eg NaN, denormal, ...).
+ if (((st == APFloat::opOK && !Recip.isDenormal()) ||
----------------
jayfoad wrote:
Agreed. In practice I think this only affects dividing by the max power of 2, e.g. for single precision dividing by 2^127 would be the same as multiplying by 2^-127, but 2^-127 is denormal.
I think you've also changed the behaviour of the `arcp` case here. Previously we would optimize `fdiv arcp float %x, 2^127` but with this patch we will not. That might be OK but it should be a deliberate change and it needs a test.
https://github.com/llvm/llvm-project/pull/93882
More information about the llvm-commits
mailing list