[llvm] r372333 - [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
Ilya Biryukov via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 24 06:47:20 PDT 2019
Reverted in r372756.
On Tue, Sep 24, 2019 at 3:36 PM Ilya Biryukov <ibiryukov at google.com> wrote:
> Hi Simon,
>
> The change seems to cause sever compile time regressions for LLVM IR
> generated by JAX, up to the point that our tests are timing out and
> blocking our integrate.
>
> I'll revert the change to unblock our integrate, to reproduce run "llc
> -mcpu=haswell" on the following .ll file:
> https://drive.google.com/open?id=1Lw2xNup9KYB4HvF3QvLOGK_53viEVDJ3
>
> On Thu, Sep 19, 2019 at 5:14 PM Simon Pilgrim via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Author: rksimon
>> Date: Thu Sep 19 08:02:47 2019
>> New Revision: 372333
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=372333&view=rev
>> Log:
>> [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target
>> hook (PR42863)
>>
>> This patch converts the DAGCombine
>> isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and
>> includes a demonstration X86 implementation.
>>
>> The intention is to let us extend existing FNEG combines to work more
>> generally with negatible float ops, allowing it work with target specific
>> combines and opcodes (e.g. X86's FMA variants).
>>
>> Unlike the SimplifyDemandedBits, we can't just handle target nodes
>> through a Target callback, we need to do this as an override to allow
>> targets to handle generic opcodes as well. This does mean that the target
>> implementations has to duplicate some checks (recursion depth etc.).
>>
>> I've only begun to replace X86's FNEG handling here, handling
>> FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA
>> negatation propagation). We can build on this in future patches.
>>
>> Differential Revision: https://reviews.llvm.org/D67557
>>
>> Modified:
>> llvm/trunk/include/llvm/CodeGen/TargetLowering.h
>> llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp
>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> llvm/trunk/lib/Target/X86/X86ISelLowering.h
>> llvm/trunk/test/CodeGen/X86/recip-fastmath.ll
>> llvm/trunk/test/CodeGen/X86/recip-fastmath2.ll
>>
>> Modified: llvm/trunk/include/llvm/CodeGen/TargetLowering.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/TargetLowering.h?rev=372333&r1=372332&r2=372333&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/include/llvm/CodeGen/TargetLowering.h (original)
>> +++ llvm/trunk/include/llvm/CodeGen/TargetLowering.h Thu Sep 19 08:02:47
>> 2019
>> @@ -3365,6 +3365,18 @@ public:
>> llvm_unreachable("Not Implemented");
>> }
>>
>> + /// Return 1 if we can compute the negated form of the specified
>> expression
>> + /// for the same cost as the expression itself, or 2 if we can compute
>> the
>> + /// negated form more cheaply than the expression itself. Else return
>> 0.
>> + virtual char isNegatibleForFree(SDValue Op, SelectionDAG &DAG,
>> + bool LegalOperations, bool ForCodeSize,
>> + unsigned Depth = 0) const;
>> +
>> + /// If isNegatibleForFree returns true, return the newly negated
>> expression.
>> + virtual SDValue getNegatedExpression(SDValue Op, SelectionDAG &DAG,
>> + bool LegalOperations, bool
>> ForCodeSize,
>> + unsigned Depth = 0) const;
>> +
>>
>> //===--------------------------------------------------------------------===//
>> // Lowering methods - These methods must be implemented by targets so
>> that
>> // the SelectionDAGBuilder code knows how to lower these.
>>
>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=372333&r1=372332&r2=372333&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Thu Sep 19
>> 08:02:47 2019
>> @@ -785,252 +785,6 @@ void DAGCombiner::deleteAndRecombine(SDN
>> DAG.DeleteNode(N);
>> }
>>
>> -/// Return 1 if we can compute the negated form of the specified
>> expression for
>> -/// the same cost as the expression itself, or 2 if we can compute the
>> negated
>> -/// form more cheaply than the expression itself.
>> -static char isNegatibleForFree(SDValue Op, bool LegalOperations,
>> - const TargetLowering &TLI,
>> - const TargetOptions *Options,
>> - bool ForCodeSize,
>> - unsigned Depth = 0) {
>> - // fneg is removable even if it has multiple uses.
>> - if (Op.getOpcode() == ISD::FNEG)
>> - return 2;
>> -
>> - // Don't allow anything with multiple uses unless we know it is free.
>> - EVT VT = Op.getValueType();
>> - const SDNodeFlags Flags = Op->getFlags();
>> - if (!Op.hasOneUse() &&
>> - !(Op.getOpcode() == ISD::FP_EXTEND &&
>> - TLI.isFPExtFree(VT, Op.getOperand(0).getValueType())))
>> - return 0;
>> -
>> - // Don't recurse exponentially.
>> - if (Depth > SelectionDAG::MaxRecursionDepth)
>> - return 0;
>> -
>> - switch (Op.getOpcode()) {
>> - default: return false;
>> - case ISD::ConstantFP: {
>> - if (!LegalOperations)
>> - return 1;
>> -
>> - // Don't invert constant FP values after legalization unless the
>> target says
>> - // the negated constant is legal.
>> - return TLI.isOperationLegal(ISD::ConstantFP, VT) ||
>> -
>> TLI.isFPImmLegal(neg(cast<ConstantFPSDNode>(Op)->getValueAPF()), VT,
>> - ForCodeSize);
>> - }
>> - case ISD::BUILD_VECTOR: {
>> - // Only permit BUILD_VECTOR of constants.
>> - if (llvm::any_of(Op->op_values(), [&](SDValue N) {
>> - return !N.isUndef() && !isa<ConstantFPSDNode>(N);
>> - }))
>> - return 0;
>> - if (!LegalOperations)
>> - return 1;
>> - if (TLI.isOperationLegal(ISD::ConstantFP, VT) &&
>> - TLI.isOperationLegal(ISD::BUILD_VECTOR, VT))
>> - return 1;
>> - return llvm::all_of(Op->op_values(), [&](SDValue N) {
>> - return N.isUndef() ||
>> -
>> TLI.isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()), VT,
>> - ForCodeSize);
>> - });
>> - }
>> - case ISD::FADD:
>> - if (!Options->NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
>> - return 0;
>> -
>> - // After operation legalization, it might not be legal to create new
>> FSUBs.
>> - if (LegalOperations && !TLI.isOperationLegalOrCustom(ISD::FSUB, VT))
>> - return 0;
>> -
>> - // fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
>> - if (char V = isNegatibleForFree(Op.getOperand(0), LegalOperations,
>> TLI,
>> - Options, ForCodeSize, Depth + 1))
>> - return V;
>> - // fold (fneg (fadd A, B)) -> (fsub (fneg B), A)
>> - return isNegatibleForFree(Op.getOperand(1), LegalOperations, TLI,
>> Options,
>> - ForCodeSize, Depth + 1);
>> - case ISD::FSUB:
>> - // We can't turn -(A-B) into B-A when we honor signed zeros.
>> - if (!Options->NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
>> - return 0;
>> -
>> - // fold (fneg (fsub A, B)) -> (fsub B, A)
>> - return 1;
>> -
>> - case ISD::FMUL:
>> - case ISD::FDIV:
>> - // fold (fneg (fmul X, Y)) -> (fmul (fneg X), Y) or (fmul X, (fneg
>> Y))
>> - if (char V = isNegatibleForFree(Op.getOperand(0), LegalOperations,
>> TLI,
>> - Options, ForCodeSize, Depth + 1))
>> - return V;
>> -
>> - // Ignore X * 2.0 because that is expected to be canonicalized to X
>> + X.
>> - if (auto *C = isConstOrConstSplatFP(Op.getOperand(1)))
>> - if (C->isExactlyValue(2.0) && Op.getOpcode() == ISD::FMUL)
>> - return 0;
>> -
>> - return isNegatibleForFree(Op.getOperand(1), LegalOperations, TLI,
>> Options,
>> - ForCodeSize, Depth + 1);
>> -
>> - case ISD::FMA:
>> - case ISD::FMAD: {
>> - if (!Options->NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
>> - return 0;
>> -
>> - // fold (fneg (fma X, Y, Z)) -> (fma (fneg X), Y, (fneg Z))
>> - // fold (fneg (fma X, Y, Z)) -> (fma X, (fneg Y), (fneg Z))
>> - char V2 = isNegatibleForFree(Op.getOperand(2), LegalOperations, TLI,
>> - Options, ForCodeSize, Depth + 1);
>> - if (!V2)
>> - return 0;
>> -
>> - // One of Op0/Op1 must be cheaply negatible, then select the
>> cheapest.
>> - char V0 = isNegatibleForFree(Op.getOperand(0), LegalOperations, TLI,
>> - Options, ForCodeSize, Depth + 1);
>> - char V1 = isNegatibleForFree(Op.getOperand(1), LegalOperations, TLI,
>> - Options, ForCodeSize, Depth + 1);
>> - char V01 = std::max(V0, V1);
>> - return V01 ? std::max(V01, V2) : 0;
>> - }
>> -
>> - case ISD::FP_EXTEND:
>> - case ISD::FP_ROUND:
>> - case ISD::FSIN:
>> - return isNegatibleForFree(Op.getOperand(0), LegalOperations, TLI,
>> Options,
>> - ForCodeSize, Depth + 1);
>> - }
>> -}
>> -
>> -/// If isNegatibleForFree returns true, return the newly negated
>> expression.
>> -static SDValue GetNegatedExpression(SDValue Op, SelectionDAG &DAG,
>> - bool LegalOperations, bool
>> ForCodeSize,
>> - unsigned Depth = 0) {
>> - // fneg is removable even if it has multiple uses.
>> - if (Op.getOpcode() == ISD::FNEG)
>> - return Op.getOperand(0);
>> -
>> - assert(Depth <= SelectionDAG::MaxRecursionDepth &&
>> - "GetNegatedExpression doesn't match isNegatibleForFree");
>> - const TargetOptions &Options = DAG.getTarget().Options;
>> - const SDNodeFlags Flags = Op->getFlags();
>> -
>> - switch (Op.getOpcode()) {
>> - default: llvm_unreachable("Unknown code");
>> - case ISD::ConstantFP: {
>> - APFloat V = cast<ConstantFPSDNode>(Op)->getValueAPF();
>> - V.changeSign();
>> - return DAG.getConstantFP(V, SDLoc(Op), Op.getValueType());
>> - }
>> - case ISD::BUILD_VECTOR: {
>> - SmallVector<SDValue, 4> Ops;
>> - for (SDValue C : Op->op_values()) {
>> - if (C.isUndef()) {
>> - Ops.push_back(C);
>> - continue;
>> - }
>> - APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();
>> - V.changeSign();
>> - Ops.push_back(DAG.getConstantFP(V, SDLoc(Op), C.getValueType()));
>> - }
>> - return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Ops);
>> - }
>> - case ISD::FADD:
>> - assert((Options.NoSignedZerosFPMath || Flags.hasNoSignedZeros()) &&
>> - "Expected NSZ fp-flag");
>> -
>> - // fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
>> - if (isNegatibleForFree(Op.getOperand(0), LegalOperations,
>> - DAG.getTargetLoweringInfo(), &Options,
>> ForCodeSize,
>> - Depth + 1))
>> - return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
>> - GetNegatedExpression(Op.getOperand(0), DAG,
>> - LegalOperations,
>> ForCodeSize,
>> - Depth + 1),
>> - Op.getOperand(1), Flags);
>> - // fold (fneg (fadd A, B)) -> (fsub (fneg B), A)
>> - return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
>> - GetNegatedExpression(Op.getOperand(1), DAG,
>> - LegalOperations, ForCodeSize,
>> - Depth + 1),
>> - Op.getOperand(0), Flags);
>> - case ISD::FSUB:
>> - // fold (fneg (fsub 0, B)) -> B
>> - if (ConstantFPSDNode *N0CFP =
>> - isConstOrConstSplatFP(Op.getOperand(0), /*AllowUndefs*/
>> true))
>> - if (N0CFP->isZero())
>> - return Op.getOperand(1);
>> -
>> - // fold (fneg (fsub A, B)) -> (fsub B, A)
>> - return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
>> - Op.getOperand(1), Op.getOperand(0), Flags);
>> -
>> - case ISD::FMUL:
>> - case ISD::FDIV:
>> - // fold (fneg (fmul X, Y)) -> (fmul (fneg X), Y)
>> - if (isNegatibleForFree(Op.getOperand(0), LegalOperations,
>> - DAG.getTargetLoweringInfo(), &Options,
>> ForCodeSize,
>> - Depth + 1))
>> - return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> - GetNegatedExpression(Op.getOperand(0), DAG,
>> - LegalOperations,
>> ForCodeSize,
>> - Depth + 1),
>> - Op.getOperand(1), Flags);
>> -
>> - // fold (fneg (fmul X, Y)) -> (fmul X, (fneg Y))
>> - return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> - Op.getOperand(0),
>> - GetNegatedExpression(Op.getOperand(1), DAG,
>> - LegalOperations, ForCodeSize,
>> - Depth + 1), Flags);
>> -
>> - case ISD::FMA:
>> - case ISD::FMAD: {
>> - assert((Options.NoSignedZerosFPMath || Flags.hasNoSignedZeros()) &&
>> - "Expected NSZ fp-flag");
>> -
>> - SDValue Neg2 = GetNegatedExpression(Op.getOperand(2), DAG,
>> LegalOperations,
>> - ForCodeSize, Depth + 1);
>> -
>> - char V0 = isNegatibleForFree(Op.getOperand(0), LegalOperations,
>> - DAG.getTargetLoweringInfo(), &Options,
>> - ForCodeSize, Depth + 1);
>> - char V1 = isNegatibleForFree(Op.getOperand(1), LegalOperations,
>> - DAG.getTargetLoweringInfo(), &Options,
>> - ForCodeSize, Depth + 1);
>> - if (V0 >= V1) {
>> - // fold (fneg (fma X, Y, Z)) -> (fma (fneg X), Y, (fneg Z))
>> - SDValue Neg0 = GetNegatedExpression(
>> - Op.getOperand(0), DAG, LegalOperations, ForCodeSize, Depth +
>> 1);
>> - return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> Neg0,
>> - Op.getOperand(1), Neg2, Flags);
>> - }
>> -
>> - // fold (fneg (fma X, Y, Z)) -> (fma X, (fneg Y), (fneg Z))
>> - SDValue Neg1 = GetNegatedExpression(Op.getOperand(1), DAG,
>> LegalOperations,
>> - ForCodeSize, Depth + 1);
>> - return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> - Op.getOperand(0), Neg1, Neg2, Flags);
>> - }
>> -
>> - case ISD::FP_EXTEND:
>> - case ISD::FSIN:
>> - return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> - GetNegatedExpression(Op.getOperand(0), DAG,
>> - LegalOperations, ForCodeSize,
>> - Depth + 1));
>> - case ISD::FP_ROUND:
>> - return DAG.getNode(ISD::FP_ROUND, SDLoc(Op), Op.getValueType(),
>> - GetNegatedExpression(Op.getOperand(0), DAG,
>> - LegalOperations, ForCodeSize,
>> - Depth + 1),
>> - Op.getOperand(1));
>> - }
>> -}
>> -
>> // APInts must be the same size for most operations, this helper
>> // function zero extends the shorter of the pair so that they match.
>> // We provide an Offset so that we can create bitwidths that won't
>> overflow.
>> @@ -12052,17 +11806,17 @@ SDValue DAGCombiner::visitFADD(SDNode *N
>>
>> // fold (fadd A, (fneg B)) -> (fsub A, B)
>> if ((!LegalOperations || TLI.isOperationLegalOrCustom(ISD::FSUB, VT))
>> &&
>> - isNegatibleForFree(N1, LegalOperations, TLI, &Options,
>> ForCodeSize) == 2)
>> - return DAG.getNode(ISD::FSUB, DL, VT, N0,
>> - GetNegatedExpression(N1, DAG, LegalOperations,
>> - ForCodeSize), Flags);
>> + TLI.isNegatibleForFree(N1, DAG, LegalOperations, ForCodeSize) == 2)
>> + return DAG.getNode(
>> + ISD::FSUB, DL, VT, N0,
>> + TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize),
>> Flags);
>>
>> // fold (fadd (fneg A), B) -> (fsub B, A)
>> if ((!LegalOperations || TLI.isOperationLegalOrCustom(ISD::FSUB, VT))
>> &&
>> - isNegatibleForFree(N0, LegalOperations, TLI, &Options,
>> ForCodeSize) == 2)
>> - return DAG.getNode(ISD::FSUB, DL, VT, N1,
>> - GetNegatedExpression(N0, DAG, LegalOperations,
>> - ForCodeSize), Flags);
>> + TLI.isNegatibleForFree(N0, DAG, LegalOperations, ForCodeSize) == 2)
>> + return DAG.getNode(
>> + ISD::FSUB, DL, VT, N1,
>> + TLI.getNegatedExpression(N0, DAG, LegalOperations, ForCodeSize),
>> Flags);
>>
>> auto isFMulNegTwo = [](SDValue FMul) {
>> if (!FMul.hasOneUse() || FMul.getOpcode() != ISD::FMUL)
>> @@ -12241,16 +11995,16 @@ SDValue DAGCombiner::visitFSUB(SDNode *N
>> if (N0CFP && N0CFP->isZero()) {
>> if (N0CFP->isNegative() ||
>> (Options.NoSignedZerosFPMath || Flags.hasNoSignedZeros())) {
>> - if (isNegatibleForFree(N1, LegalOperations, TLI, &Options,
>> ForCodeSize))
>> - return GetNegatedExpression(N1, DAG, LegalOperations,
>> ForCodeSize);
>> + if (TLI.isNegatibleForFree(N1, DAG, LegalOperations, ForCodeSize))
>> + return TLI.getNegatedExpression(N1, DAG, LegalOperations,
>> ForCodeSize);
>> if (!LegalOperations || TLI.isOperationLegal(ISD::FNEG, VT))
>> return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
>> }
>> }
>>
>> if (((Options.UnsafeFPMath && Options.NoSignedZerosFPMath) ||
>> - (Flags.hasAllowReassociation() && Flags.hasNoSignedZeros()))
>> - && N1.getOpcode() == ISD::FADD) {
>> + (Flags.hasAllowReassociation() && Flags.hasNoSignedZeros())) &&
>> + N1.getOpcode() == ISD::FADD) {
>> // X - (X + Y) -> -Y
>> if (N0 == N1->getOperand(0))
>> return DAG.getNode(ISD::FNEG, DL, VT, N1->getOperand(1), Flags);
>> @@ -12260,10 +12014,10 @@ SDValue DAGCombiner::visitFSUB(SDNode *N
>> }
>>
>> // fold (fsub A, (fneg B)) -> (fadd A, B)
>> - if (isNegatibleForFree(N1, LegalOperations, TLI, &Options,
>> ForCodeSize))
>> - return DAG.getNode(ISD::FADD, DL, VT, N0,
>> - GetNegatedExpression(N1, DAG, LegalOperations,
>> - ForCodeSize), Flags);
>> + if (TLI.isNegatibleForFree(N1, DAG, LegalOperations, ForCodeSize))
>> + return DAG.getNode(
>> + ISD::FADD, DL, VT, N0,
>> + TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize),
>> Flags);
>>
>> // FSUB -> FMA combines:
>> if (SDValue Fused = visitFSUBForFMACombine(N)) {
>> @@ -12277,11 +12031,10 @@ SDValue DAGCombiner::visitFSUB(SDNode *N
>> /// Return true if both inputs are at least as cheap in negated form and
>> at
>> /// least one input is strictly cheaper in negated form.
>> bool DAGCombiner::isCheaperToUseNegatedFPOps(SDValue X, SDValue Y) {
>> - const TargetOptions &Options = DAG.getTarget().Options;
>> - if (char LHSNeg = isNegatibleForFree(X, LegalOperations, TLI, &Options,
>> - ForCodeSize))
>> - if (char RHSNeg = isNegatibleForFree(Y, LegalOperations, TLI,
>> &Options,
>> - ForCodeSize))
>> + if (char LHSNeg =
>> + TLI.isNegatibleForFree(X, DAG, LegalOperations, ForCodeSize))
>> + if (char RHSNeg =
>> + TLI.isNegatibleForFree(Y, DAG, LegalOperations, ForCodeSize))
>> // Both negated operands are at least as cheap as their
>> counterparts.
>> // Check to see if at least one is cheaper negated.
>> if (LHSNeg == 2 || RHSNeg == 2)
>> @@ -12362,8 +12115,10 @@ SDValue DAGCombiner::visitFMUL(SDNode *N
>>
>> // -N0 * -N1 --> N0 * N1
>> if (isCheaperToUseNegatedFPOps(N0, N1)) {
>> - SDValue NegN0 = GetNegatedExpression(N0, DAG, LegalOperations,
>> ForCodeSize);
>> - SDValue NegN1 = GetNegatedExpression(N1, DAG, LegalOperations,
>> ForCodeSize);
>> + SDValue NegN0 =
>> + TLI.getNegatedExpression(N0, DAG, LegalOperations, ForCodeSize);
>> + SDValue NegN1 =
>> + TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);
>> return DAG.getNode(ISD::FMUL, DL, VT, NegN0, NegN1, Flags);
>> }
>>
>> @@ -12445,8 +12200,10 @@ SDValue DAGCombiner::visitFMA(SDNode *N)
>>
>> // (-N0 * -N1) + N2 --> (N0 * N1) + N2
>> if (isCheaperToUseNegatedFPOps(N0, N1)) {
>> - SDValue NegN0 = GetNegatedExpression(N0, DAG, LegalOperations,
>> ForCodeSize);
>> - SDValue NegN1 = GetNegatedExpression(N1, DAG, LegalOperations,
>> ForCodeSize);
>> + SDValue NegN0 =
>> + TLI.getNegatedExpression(N0, DAG, LegalOperations, ForCodeSize);
>> + SDValue NegN1 =
>> + TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);
>> return DAG.getNode(ISD::FMA, DL, VT, NegN0, NegN1, N2, Flags);
>> }
>>
>> @@ -12707,8 +12464,8 @@ SDValue DAGCombiner::visitFDIV(SDNode *N
>> if (isCheaperToUseNegatedFPOps(N0, N1))
>> return DAG.getNode(
>> ISD::FDIV, SDLoc(N), VT,
>> - GetNegatedExpression(N0, DAG, LegalOperations, ForCodeSize),
>> - GetNegatedExpression(N1, DAG, LegalOperations, ForCodeSize),
>> Flags);
>> + TLI.getNegatedExpression(N0, DAG, LegalOperations, ForCodeSize),
>> + TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize),
>> Flags);
>>
>> return SDValue();
>> }
>> @@ -13262,9 +13019,8 @@ SDValue DAGCombiner::visitFNEG(SDNode *N
>> if (isConstantFPBuildVectorOrConstantFP(N0))
>> return DAG.getNode(ISD::FNEG, SDLoc(N), VT, N0);
>>
>> - if (isNegatibleForFree(N0, LegalOperations,
>> DAG.getTargetLoweringInfo(),
>> - &DAG.getTarget().Options, ForCodeSize))
>> - return GetNegatedExpression(N0, DAG, LegalOperations, ForCodeSize);
>> + if (TLI.isNegatibleForFree(N0, DAG, LegalOperations, ForCodeSize))
>> + return TLI.getNegatedExpression(N0, DAG, LegalOperations,
>> ForCodeSize);
>>
>> // Transform fneg(bitconvert(x)) -> bitconvert(x ^ sign) to avoid
>> loading
>> // constant pool values.
>>
>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp?rev=372333&r1=372332&r2=372333&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp (original)
>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/TargetLowering.cpp Thu Sep 19
>> 08:02:47 2019
>> @@ -5331,6 +5331,246 @@ verifyReturnAddressArgumentIsConstant(SD
>> return false;
>> }
>>
>> +char TargetLowering::isNegatibleForFree(SDValue Op, SelectionDAG &DAG,
>> + bool LegalOperations, bool
>> ForCodeSize,
>> + unsigned Depth) const {
>> + // fneg is removable even if it has multiple uses.
>> + if (Op.getOpcode() == ISD::FNEG)
>> + return 2;
>> +
>> + // Don't allow anything with multiple uses unless we know it is free.
>> + EVT VT = Op.getValueType();
>> + const SDNodeFlags Flags = Op->getFlags();
>> + const TargetOptions &Options = DAG.getTarget().Options;
>> + if (!Op.hasOneUse() && !(Op.getOpcode() == ISD::FP_EXTEND &&
>> + isFPExtFree(VT,
>> Op.getOperand(0).getValueType())))
>> + return 0;
>> +
>> + // Don't recurse exponentially.
>> + if (Depth > SelectionDAG::MaxRecursionDepth)
>> + return 0;
>> +
>> + switch (Op.getOpcode()) {
>> + case ISD::ConstantFP: {
>> + if (!LegalOperations)
>> + return 1;
>> +
>> + // Don't invert constant FP values after legalization unless the
>> target says
>> + // the negated constant is legal.
>> + return isOperationLegal(ISD::ConstantFP, VT) ||
>> + isFPImmLegal(neg(cast<ConstantFPSDNode>(Op)->getValueAPF()),
>> VT,
>> + ForCodeSize);
>> + }
>> + case ISD::BUILD_VECTOR: {
>> + // Only permit BUILD_VECTOR of constants.
>> + if (llvm::any_of(Op->op_values(), [&](SDValue N) {
>> + return !N.isUndef() && !isa<ConstantFPSDNode>(N);
>> + }))
>> + return 0;
>> + if (!LegalOperations)
>> + return 1;
>> + if (isOperationLegal(ISD::ConstantFP, VT) &&
>> + isOperationLegal(ISD::BUILD_VECTOR, VT))
>> + return 1;
>> + return llvm::all_of(Op->op_values(), [&](SDValue N) {
>> + return N.isUndef() ||
>> + isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()),
>> VT,
>> + ForCodeSize);
>> + });
>> + }
>> + case ISD::FADD:
>> + if (!Options.NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
>> + return 0;
>> +
>> + // After operation legalization, it might not be legal to create new
>> FSUBs.
>> + if (LegalOperations && !isOperationLegalOrCustom(ISD::FSUB, VT))
>> + return 0;
>> +
>> + // fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
>> + if (char V = isNegatibleForFree(Op.getOperand(0), DAG,
>> LegalOperations,
>> + ForCodeSize, Depth + 1))
>> + return V;
>> + // fold (fneg (fadd A, B)) -> (fsub (fneg B), A)
>> + return isNegatibleForFree(Op.getOperand(1), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + case ISD::FSUB:
>> + // We can't turn -(A-B) into B-A when we honor signed zeros.
>> + if (!Options.NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
>> + return 0;
>> +
>> + // fold (fneg (fsub A, B)) -> (fsub B, A)
>> + return 1;
>> +
>> + case ISD::FMUL:
>> + case ISD::FDIV:
>> + // fold (fneg (fmul X, Y)) -> (fmul (fneg X), Y) or (fmul X, (fneg
>> Y))
>> + if (char V = isNegatibleForFree(Op.getOperand(0), DAG,
>> LegalOperations,
>> + ForCodeSize, Depth + 1))
>> + return V;
>> +
>> + // Ignore X * 2.0 because that is expected to be canonicalized to X
>> + X.
>> + if (auto *C = isConstOrConstSplatFP(Op.getOperand(1)))
>> + if (C->isExactlyValue(2.0) && Op.getOpcode() == ISD::FMUL)
>> + return 0;
>> +
>> + return isNegatibleForFree(Op.getOperand(1), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> +
>> + case ISD::FMA:
>> + case ISD::FMAD: {
>> + if (!Options.NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
>> + return 0;
>> +
>> + // fold (fneg (fma X, Y, Z)) -> (fma (fneg X), Y, (fneg Z))
>> + // fold (fneg (fma X, Y, Z)) -> (fma X, (fneg Y), (fneg Z))
>> + char V2 = isNegatibleForFree(Op.getOperand(2), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + if (!V2)
>> + return 0;
>> +
>> + // One of Op0/Op1 must be cheaply negatible, then select the
>> cheapest.
>> + char V0 = isNegatibleForFree(Op.getOperand(0), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + char V1 = isNegatibleForFree(Op.getOperand(1), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + char V01 = std::max(V0, V1);
>> + return V01 ? std::max(V01, V2) : 0;
>> + }
>> +
>> + case ISD::FP_EXTEND:
>> + case ISD::FP_ROUND:
>> + case ISD::FSIN:
>> + return isNegatibleForFree(Op.getOperand(0), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +SDValue TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG
>> &DAG,
>> + bool LegalOperations,
>> + bool ForCodeSize,
>> + unsigned Depth) const {
>> + // fneg is removable even if it has multiple uses.
>> + if (Op.getOpcode() == ISD::FNEG)
>> + return Op.getOperand(0);
>> +
>> + assert(Depth <= SelectionDAG::MaxRecursionDepth &&
>> + "getNegatedExpression doesn't match isNegatibleForFree");
>> + const SDNodeFlags Flags = Op->getFlags();
>> +
>> + switch (Op.getOpcode()) {
>> + case ISD::ConstantFP: {
>> + APFloat V = cast<ConstantFPSDNode>(Op)->getValueAPF();
>> + V.changeSign();
>> + return DAG.getConstantFP(V, SDLoc(Op), Op.getValueType());
>> + }
>> + case ISD::BUILD_VECTOR: {
>> + SmallVector<SDValue, 4> Ops;
>> + for (SDValue C : Op->op_values()) {
>> + if (C.isUndef()) {
>> + Ops.push_back(C);
>> + continue;
>> + }
>> + APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();
>> + V.changeSign();
>> + Ops.push_back(DAG.getConstantFP(V, SDLoc(Op), C.getValueType()));
>> + }
>> + return DAG.getBuildVector(Op.getValueType(), SDLoc(Op), Ops);
>> + }
>> + case ISD::FADD:
>> + assert((DAG.getTarget().Options.NoSignedZerosFPMath ||
>> + Flags.hasNoSignedZeros()) &&
>> + "Expected NSZ fp-flag");
>> +
>> + // fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
>> + if (isNegatibleForFree(Op.getOperand(0), DAG, LegalOperations,
>> ForCodeSize,
>> + Depth + 1))
>> + return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
>> + getNegatedExpression(Op.getOperand(0), DAG,
>> + LegalOperations,
>> ForCodeSize,
>> + Depth + 1),
>> + Op.getOperand(1), Flags);
>> + // fold (fneg (fadd A, B)) -> (fsub (fneg B), A)
>> + return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
>> + getNegatedExpression(Op.getOperand(1), DAG,
>> + LegalOperations, ForCodeSize,
>> + Depth + 1),
>> + Op.getOperand(0), Flags);
>> + case ISD::FSUB:
>> + // fold (fneg (fsub 0, B)) -> B
>> + if (ConstantFPSDNode *N0CFP =
>> + isConstOrConstSplatFP(Op.getOperand(0), /*AllowUndefs*/
>> true))
>> + if (N0CFP->isZero())
>> + return Op.getOperand(1);
>> +
>> + // fold (fneg (fsub A, B)) -> (fsub B, A)
>> + return DAG.getNode(ISD::FSUB, SDLoc(Op), Op.getValueType(),
>> + Op.getOperand(1), Op.getOperand(0), Flags);
>> +
>> + case ISD::FMUL:
>> + case ISD::FDIV:
>> + // fold (fneg (fmul X, Y)) -> (fmul (fneg X), Y)
>> + if (isNegatibleForFree(Op.getOperand(0), DAG, LegalOperations,
>> ForCodeSize,
>> + Depth + 1))
>> + return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> + getNegatedExpression(Op.getOperand(0), DAG,
>> + LegalOperations,
>> ForCodeSize,
>> + Depth + 1),
>> + Op.getOperand(1), Flags);
>> +
>> + // fold (fneg (fmul X, Y)) -> (fmul X, (fneg Y))
>> + return DAG.getNode(
>> + Op.getOpcode(), SDLoc(Op), Op.getValueType(), Op.getOperand(0),
>> + getNegatedExpression(Op.getOperand(1), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1),
>> + Flags);
>> +
>> + case ISD::FMA:
>> + case ISD::FMAD: {
>> + assert((DAG.getTarget().Options.NoSignedZerosFPMath ||
>> + Flags.hasNoSignedZeros()) &&
>> + "Expected NSZ fp-flag");
>> +
>> + SDValue Neg2 = getNegatedExpression(Op.getOperand(2), DAG,
>> LegalOperations,
>> + ForCodeSize, Depth + 1);
>> +
>> + char V0 = isNegatibleForFree(Op.getOperand(0), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + char V1 = isNegatibleForFree(Op.getOperand(1), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + if (V0 >= V1) {
>> + // fold (fneg (fma X, Y, Z)) -> (fma (fneg X), Y, (fneg Z))
>> + SDValue Neg0 = getNegatedExpression(
>> + Op.getOperand(0), DAG, LegalOperations, ForCodeSize, Depth +
>> 1);
>> + return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> Neg0,
>> + Op.getOperand(1), Neg2, Flags);
>> + }
>> +
>> + // fold (fneg (fma X, Y, Z)) -> (fma X, (fneg Y), (fneg Z))
>> + SDValue Neg1 = getNegatedExpression(Op.getOperand(1), DAG,
>> LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> + Op.getOperand(0), Neg1, Neg2, Flags);
>> + }
>> +
>> + case ISD::FP_EXTEND:
>> + case ISD::FSIN:
>> + return DAG.getNode(Op.getOpcode(), SDLoc(Op), Op.getValueType(),
>> + getNegatedExpression(Op.getOperand(0), DAG,
>> + LegalOperations, ForCodeSize,
>> + Depth + 1));
>> + case ISD::FP_ROUND:
>> + return DAG.getNode(ISD::FP_ROUND, SDLoc(Op), Op.getValueType(),
>> + getNegatedExpression(Op.getOperand(0), DAG,
>> + LegalOperations, ForCodeSize,
>> + Depth + 1),
>> + Op.getOperand(1));
>> + }
>> +
>> + llvm_unreachable("Unknown code");
>> +}
>> +
>>
>> //===----------------------------------------------------------------------===//
>> // Legalization Utilities
>>
>> //===----------------------------------------------------------------------===//
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=372333&r1=372332&r2=372333&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Sep 19 08:02:47 2019
>> @@ -42038,6 +42038,101 @@ static SDValue combineFneg(SDNode *N, Se
>> return SDValue();
>> }
>>
>> +char X86TargetLowering::isNegatibleForFree(SDValue Op, SelectionDAG &DAG,
>> + bool LegalOperations,
>> + bool ForCodeSize,
>> + unsigned Depth) const {
>> + // fneg patterns are removable even if they have multiple uses.
>> + if (isFNEG(DAG, Op.getNode()))
>> + return 2;
>> +
>> + // Don't recurse exponentially.
>> + if (Depth > SelectionDAG::MaxRecursionDepth)
>> + return 0;
>> +
>> + EVT VT = Op.getValueType();
>> + EVT SVT = VT.getScalarType();
>> + switch (Op.getOpcode()) {
>> + case ISD::FMA:
>> + case X86ISD::FMSUB:
>> + case X86ISD::FNMADD:
>> + case X86ISD::FNMSUB:
>> + case X86ISD::FMADD_RND:
>> + case X86ISD::FMSUB_RND:
>> + case X86ISD::FNMADD_RND:
>> + case X86ISD::FNMSUB_RND: {
>> + if (!Op.hasOneUse() || !Subtarget.hasAnyFMA() || !isTypeLegal(VT) ||
>> + !(SVT == MVT::f32 || SVT == MVT::f64) || !LegalOperations)
>> + break;
>> +
>> + // This is always negatible for free but we might be able to remove
>> some
>> + // extra operand negations as well.
>> + for (int i = 0; i != 3; ++i) {
>> + char V = isNegatibleForFree(Op.getOperand(i), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + if (V == 2)
>> + return V;
>> + }
>> + return 1;
>> + }
>> + }
>> +
>> + return TargetLowering::isNegatibleForFree(Op, DAG, LegalOperations,
>> + ForCodeSize, Depth);
>> +}
>> +
>> +SDValue X86TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG
>> &DAG,
>> + bool LegalOperations,
>> + bool ForCodeSize,
>> + unsigned Depth) const {
>> + // fneg patterns are removable even if they have multiple uses.
>> + if (SDValue Arg = isFNEG(DAG, Op.getNode()))
>> + return DAG.getBitcast(Op.getValueType(), Arg);
>> +
>> + EVT VT = Op.getValueType();
>> + EVT SVT = VT.getScalarType();
>> + unsigned Opc = Op.getOpcode();
>> + switch (Opc) {
>> + case ISD::FMA:
>> + case X86ISD::FMSUB:
>> + case X86ISD::FNMADD:
>> + case X86ISD::FNMSUB:
>> + case X86ISD::FMADD_RND:
>> + case X86ISD::FMSUB_RND:
>> + case X86ISD::FNMADD_RND:
>> + case X86ISD::FNMSUB_RND: {
>> + if (!Op.hasOneUse() || !Subtarget.hasAnyFMA() || !isTypeLegal(VT) ||
>> + !(SVT == MVT::f32 || SVT == MVT::f64) || !LegalOperations)
>> + break;
>> +
>> + // This is always negatible for free but we might be able to remove
>> some
>> + // extra operand negations as well.
>> + SmallVector<SDValue, 4> NewOps(Op.getNumOperands(), SDValue());
>> + for (int i = 0; i != 3; ++i) {
>> + char V = isNegatibleForFree(Op.getOperand(i), DAG, LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + if (V == 2)
>> + NewOps[i] = getNegatedExpression(Op.getOperand(i), DAG,
>> LegalOperations,
>> + ForCodeSize, Depth + 1);
>> + }
>> +
>> + bool NegA = !!NewOps[0];
>> + bool NegB = !!NewOps[1];
>> + bool NegC = !!NewOps[2];
>> + unsigned NewOpc = negateFMAOpcode(Opc, NegA != NegB, NegC, true);
>> +
>> + // Fill in the non-negated ops with the original values.
>> + for (int i = 0, e = Op.getNumOperands(); i != e; ++i)
>> + if (!NewOps[i])
>> + NewOps[i] = Op.getOperand(i);
>> + return DAG.getNode(NewOpc, SDLoc(Op), VT, NewOps);
>> + }
>> + }
>> +
>> + return TargetLowering::getNegatedExpression(Op, DAG, LegalOperations,
>> + ForCodeSize, Depth);
>> +}
>> +
>> static SDValue lowerX86FPLogicOp(SDNode *N, SelectionDAG &DAG,
>> const X86Subtarget &Subtarget) {
>> MVT VT = N->getSimpleValueType(0);
>> @@ -42967,12 +43062,14 @@ static SDValue combineSext(SDNode *N, Se
>> }
>>
>> static SDValue combineFMA(SDNode *N, SelectionDAG &DAG,
>> + TargetLowering::DAGCombinerInfo &DCI,
>> const X86Subtarget &Subtarget) {
>> SDLoc dl(N);
>> EVT VT = N->getValueType(0);
>>
>> // Let legalize expand this if it isn't a legal type yet.
>> - if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
>> + const TargetLowering &TLI = DAG.getTargetLoweringInfo();
>> + if (!TLI.isTypeLegal(VT))
>> return SDValue();
>>
>> EVT ScalarVT = VT.getScalarType();
>> @@ -42983,17 +43080,21 @@ static SDValue combineFMA(SDNode *N, Sel
>> SDValue B = N->getOperand(1);
>> SDValue C = N->getOperand(2);
>>
>> - auto invertIfNegative = [&DAG](SDValue &V) {
>> - if (SDValue NegVal = isFNEG(DAG, V.getNode())) {
>> - V = DAG.getBitcast(V.getValueType(), NegVal);
>> + auto invertIfNegative = [&DAG, &TLI, &DCI](SDValue &V) {
>> + bool CodeSize = DAG.getMachineFunction().getFunction().hasOptSize();
>> + bool LegalOperations = !DCI.isBeforeLegalizeOps();
>> + if (TLI.isNegatibleForFree(V, DAG, LegalOperations, CodeSize) == 2) {
>> + V = TLI.getNegatedExpression(V, DAG, LegalOperations, CodeSize);
>> return true;
>> }
>> // Look through extract_vector_elts. If it comes from an FNEG,
>> create a
>> // new extract from the FNEG input.
>> if (V.getOpcode() == ISD::EXTRACT_VECTOR_ELT &&
>> isNullConstant(V.getOperand(1))) {
>> - if (SDValue NegVal = isFNEG(DAG, V.getOperand(0).getNode())) {
>> - NegVal = DAG.getBitcast(V.getOperand(0).getValueType(), NegVal);
>> + SDValue Vec = V.getOperand(0);
>> + if (TLI.isNegatibleForFree(Vec, DAG, LegalOperations, CodeSize) ==
>> 2) {
>> + SDValue NegVal =
>> + TLI.getNegatedExpression(Vec, DAG, LegalOperations,
>> CodeSize);
>> V = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SDLoc(V),
>> V.getValueType(),
>> NegVal, V.getOperand(1));
>> return true;
>> @@ -43023,25 +43124,25 @@ static SDValue combineFMA(SDNode *N, Sel
>> // Combine FMADDSUB(A, B, FNEG(C)) -> FMSUBADD(A, B, C)
>> // Combine FMSUBADD(A, B, FNEG(C)) -> FMADDSUB(A, B, C)
>> static SDValue combineFMADDSUB(SDNode *N, SelectionDAG &DAG,
>> - const X86Subtarget &Subtarget) {
>> + TargetLowering::DAGCombinerInfo &DCI) {
>> SDLoc dl(N);
>> EVT VT = N->getValueType(0);
>> + const TargetLowering &TLI = DAG.getTargetLoweringInfo();
>> + bool CodeSize = DAG.getMachineFunction().getFunction().hasOptSize();
>> + bool LegalOperations = !DCI.isBeforeLegalizeOps();
>>
>> - SDValue NegVal = isFNEG(DAG, N->getOperand(2).getNode());
>> - if (!NegVal)
>> - return SDValue();
>> -
>> - // FIXME: Should we bitcast instead?
>> - if (NegVal.getValueType() != VT)
>> + SDValue N2 = N->getOperand(2);
>> + if (!TLI.isNegatibleForFree(N2, DAG, LegalOperations, CodeSize))
>> return SDValue();
>>
>> + SDValue NegN2 = TLI.getNegatedExpression(N2, DAG, LegalOperations,
>> CodeSize);
>> unsigned NewOpcode = negateFMAOpcode(N->getOpcode(), false, true,
>> false);
>>
>> if (N->getNumOperands() == 4)
>> return DAG.getNode(NewOpcode, dl, VT, N->getOperand(0),
>> N->getOperand(1),
>> - NegVal, N->getOperand(3));
>> + NegN2, N->getOperand(3));
>> return DAG.getNode(NewOpcode, dl, VT, N->getOperand(0),
>> N->getOperand(1),
>> - NegVal);
>> + NegN2);
>> }
>>
>> static SDValue combineZext(SDNode *N, SelectionDAG &DAG,
>> @@ -45316,11 +45417,11 @@ SDValue X86TargetLowering::PerformDAGCom
>> case X86ISD::FNMADD_RND:
>> case X86ISD::FNMSUB:
>> case X86ISD::FNMSUB_RND:
>> - case ISD::FMA: return combineFMA(N, DAG, Subtarget);
>> + case ISD::FMA: return combineFMA(N, DAG, DCI, Subtarget);
>> case X86ISD::FMADDSUB_RND:
>> case X86ISD::FMSUBADD_RND:
>> case X86ISD::FMADDSUB:
>> - case X86ISD::FMSUBADD: return combineFMADDSUB(N, DAG, Subtarget);
>> + case X86ISD::FMSUBADD: return combineFMADDSUB(N, DAG, DCI);
>> case X86ISD::MOVMSK: return combineMOVMSK(N, DAG, DCI, Subtarget);
>> case X86ISD::MGATHER:
>> case X86ISD::MSCATTER:
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=372333&r1=372332&r2=372333&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Thu Sep 19 08:02:47 2019
>> @@ -798,6 +798,17 @@ namespace llvm {
>> /// and some i16 instructions are slow.
>> bool IsDesirableToPromoteOp(SDValue Op, EVT &PVT) const override;
>>
>> + /// Return 1 if we can compute the negated form of the specified
>> expression
>> + /// for the same cost as the expression itself, or 2 if we can
>> compute the
>> + /// negated form more cheaply than the expression itself. Else
>> return 0.
>> + char isNegatibleForFree(SDValue Op, SelectionDAG &DAG, bool
>> LegalOperations,
>> + bool ForCodeSize, unsigned Depth) const
>> override;
>> +
>> + /// If isNegatibleForFree returns true, return the newly negated
>> expression.
>> + SDValue getNegatedExpression(SDValue Op, SelectionDAG &DAG,
>> + bool LegalOperations, bool ForCodeSize,
>> + unsigned Depth) const override;
>> +
>> MachineBasicBlock *
>> EmitInstrWithCustomInserter(MachineInstr &MI,
>> MachineBasicBlock *MBB) const override;
>>
>> Modified: llvm/trunk/test/CodeGen/X86/recip-fastmath.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/recip-fastmath.ll?rev=372333&r1=372332&r2=372333&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/recip-fastmath.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/recip-fastmath.ll Thu Sep 19 08:02:47 2019
>> @@ -60,15 +60,15 @@ define float @f32_one_step(float %x) #1
>> ; FMA-RECIP-LABEL: f32_one_step:
>> ; FMA-RECIP: # %bb.0:
>> ; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; FMA-RECIP-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm1 * xmm0) + mem
>> -; FMA-RECIP-NEXT: vfmadd132ss {{.*#+}} xmm0 = (xmm0 * xmm1) + xmm1
>> +; FMA-RECIP-NEXT: vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
>> +; FMA-RECIP-NEXT: vfnmadd132ss {{.*#+}} xmm0 = -(xmm0 * xmm1) + xmm1
>> ; FMA-RECIP-NEXT: retq
>> ;
>> ; BDVER2-LABEL: f32_one_step:
>> ; BDVER2: # %bb.0:
>> ; BDVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; BDVER2-NEXT: vfnmaddss {{.*}}(%rip), %xmm1, %xmm0, %xmm0
>> -; BDVER2-NEXT: vfmaddss %xmm1, %xmm0, %xmm1, %xmm0
>> +; BDVER2-NEXT: vfmaddss {{.*}}(%rip), %xmm1, %xmm0, %xmm0
>> +; BDVER2-NEXT: vfnmaddss %xmm1, %xmm0, %xmm1, %xmm0
>> ; BDVER2-NEXT: retq
>> ;
>> ; BTVER2-LABEL: f32_one_step:
>> @@ -94,8 +94,8 @@ define float @f32_one_step(float %x) #1
>> ; HASWELL-LABEL: f32_one_step:
>> ; HASWELL: # %bb.0:
>> ; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; HASWELL-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm1 * xmm0) + mem
>> -; HASWELL-NEXT: vfmadd132ss {{.*#+}} xmm0 = (xmm0 * xmm1) + xmm1
>> +; HASWELL-NEXT: vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
>> +; HASWELL-NEXT: vfnmadd132ss {{.*#+}} xmm0 = -(xmm0 * xmm1) + xmm1
>> ; HASWELL-NEXT: retq
>> ;
>> ; HASWELL-NO-FMA-LABEL: f32_one_step:
>> @@ -111,8 +111,8 @@ define float @f32_one_step(float %x) #1
>> ; AVX512-LABEL: f32_one_step:
>> ; AVX512: # %bb.0:
>> ; AVX512-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; AVX512-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm1 * xmm0) + mem
>> -; AVX512-NEXT: vfmadd132ss {{.*#+}} xmm0 = (xmm0 * xmm1) + xmm1
>> +; AVX512-NEXT: vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
>> +; AVX512-NEXT: vfnmadd132ss {{.*#+}} xmm0 = -(xmm0 * xmm1) + xmm1
>> ; AVX512-NEXT: retq
>> %div = fdiv fast float 1.0, %x
>> ret float %div
>>
>> Modified: llvm/trunk/test/CodeGen/X86/recip-fastmath2.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/recip-fastmath2.ll?rev=372333&r1=372332&r2=372333&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/recip-fastmath2.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/recip-fastmath2.ll Thu Sep 19 08:02:47
>> 2019
>> @@ -154,8 +154,8 @@ define float @f32_one_step_2_divs(float
>> ; FMA-RECIP-LABEL: f32_one_step_2_divs:
>> ; FMA-RECIP: # %bb.0:
>> ; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; FMA-RECIP-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm1 * xmm0) + mem
>> -; FMA-RECIP-NEXT: vfmadd132ss {{.*#+}} xmm0 = (xmm0 * xmm1) + xmm1
>> +; FMA-RECIP-NEXT: vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
>> +; FMA-RECIP-NEXT: vfnmadd132ss {{.*#+}} xmm0 = -(xmm0 * xmm1) + xmm1
>> ; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1
>> ; FMA-RECIP-NEXT: vmulss %xmm0, %xmm1, %xmm0
>> ; FMA-RECIP-NEXT: retq
>> @@ -163,8 +163,8 @@ define float @f32_one_step_2_divs(float
>> ; BDVER2-LABEL: f32_one_step_2_divs:
>> ; BDVER2: # %bb.0:
>> ; BDVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; BDVER2-NEXT: vfnmaddss {{.*}}(%rip), %xmm1, %xmm0, %xmm0
>> -; BDVER2-NEXT: vfmaddss %xmm1, %xmm0, %xmm1, %xmm0
>> +; BDVER2-NEXT: vfmaddss {{.*}}(%rip), %xmm1, %xmm0, %xmm0
>> +; BDVER2-NEXT: vfnmaddss %xmm1, %xmm0, %xmm1, %xmm0
>> ; BDVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1
>> ; BDVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0
>> ; BDVER2-NEXT: retq
>> @@ -196,8 +196,8 @@ define float @f32_one_step_2_divs(float
>> ; HASWELL-LABEL: f32_one_step_2_divs:
>> ; HASWELL: # %bb.0:
>> ; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; HASWELL-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm1 * xmm0) + mem
>> -; HASWELL-NEXT: vfmadd132ss {{.*#+}} xmm0 = (xmm0 * xmm1) + xmm1
>> +; HASWELL-NEXT: vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
>> +; HASWELL-NEXT: vfnmadd132ss {{.*#+}} xmm0 = -(xmm0 * xmm1) + xmm1
>> ; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1
>> ; HASWELL-NEXT: vmulss %xmm0, %xmm1, %xmm0
>> ; HASWELL-NEXT: retq
>> @@ -217,8 +217,8 @@ define float @f32_one_step_2_divs(float
>> ; AVX512-LABEL: f32_one_step_2_divs:
>> ; AVX512: # %bb.0:
>> ; AVX512-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; AVX512-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm1 * xmm0) + mem
>> -; AVX512-NEXT: vfmadd132ss {{.*#+}} xmm0 = (xmm0 * xmm1) + xmm1
>> +; AVX512-NEXT: vfmadd213ss {{.*#+}} xmm0 = (xmm1 * xmm0) + mem
>> +; AVX512-NEXT: vfnmadd132ss {{.*#+}} xmm0 = -(xmm0 * xmm1) + xmm1
>> ; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1
>> ; AVX512-NEXT: vmulss %xmm0, %xmm1, %xmm0
>> ; AVX512-NEXT: retq
>> @@ -267,8 +267,8 @@ define float @f32_two_step_2(float %x) #
>> ; FMA-RECIP: # %bb.0:
>> ; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> ; FMA-RECIP-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
>> -; FMA-RECIP-NEXT: vfnmadd231ss {{.*#+}} xmm2 = -(xmm0 * xmm1) + xmm2
>> -; FMA-RECIP-NEXT: vfmadd132ss {{.*#+}} xmm2 = (xmm2 * xmm1) + xmm1
>> +; FMA-RECIP-NEXT: vfmadd231ss {{.*#+}} xmm2 = (xmm0 * xmm1) + xmm2
>> +; FMA-RECIP-NEXT: vfnmadd132ss {{.*#+}} xmm2 = -(xmm2 * xmm1) + xmm1
>> ; FMA-RECIP-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
>> ; FMA-RECIP-NEXT: vmulss %xmm1, %xmm2, %xmm3
>> ; FMA-RECIP-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm3 * xmm0) + xmm1
>> @@ -278,9 +278,9 @@ define float @f32_two_step_2(float %x) #
>> ; BDVER2-LABEL: f32_two_step_2:
>> ; BDVER2: # %bb.0:
>> ; BDVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> -; BDVER2-NEXT: vfnmaddss {{.*}}(%rip), %xmm1, %xmm0, %xmm2
>> +; BDVER2-NEXT: vfmaddss {{.*}}(%rip), %xmm1, %xmm0, %xmm2
>> ; BDVER2-NEXT: vmovss {{.*#+}} xmm4 = mem[0],zero,zero,zero
>> -; BDVER2-NEXT: vfmaddss %xmm1, %xmm2, %xmm1, %xmm1
>> +; BDVER2-NEXT: vfnmaddss %xmm1, %xmm2, %xmm1, %xmm1
>> ; BDVER2-NEXT: vmulss %xmm4, %xmm1, %xmm3
>> ; BDVER2-NEXT: vfnmaddss %xmm4, %xmm3, %xmm0, %xmm0
>> ; BDVER2-NEXT: vfmaddss %xmm3, %xmm0, %xmm1, %xmm0
>> @@ -322,8 +322,8 @@ define float @f32_two_step_2(float %x) #
>> ; HASWELL: # %bb.0:
>> ; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> ; HASWELL-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
>> -; HASWELL-NEXT: vfnmadd231ss {{.*#+}} xmm2 = -(xmm0 * xmm1) + xmm2
>> -; HASWELL-NEXT: vfmadd132ss {{.*#+}} xmm2 = (xmm2 * xmm1) + xmm1
>> +; HASWELL-NEXT: vfmadd231ss {{.*#+}} xmm2 = (xmm0 * xmm1) + xmm2
>> +; HASWELL-NEXT: vfnmadd132ss {{.*#+}} xmm2 = -(xmm2 * xmm1) + xmm1
>> ; HASWELL-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
>> ; HASWELL-NEXT: vmulss %xmm1, %xmm2, %xmm3
>> ; HASWELL-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm3 * xmm0) + xmm1
>> @@ -350,8 +350,8 @@ define float @f32_two_step_2(float %x) #
>> ; AVX512: # %bb.0:
>> ; AVX512-NEXT: vrcpss %xmm0, %xmm0, %xmm1
>> ; AVX512-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
>> -; AVX512-NEXT: vfnmadd231ss {{.*#+}} xmm2 = -(xmm0 * xmm1) + xmm2
>> -; AVX512-NEXT: vfmadd132ss {{.*#+}} xmm2 = (xmm2 * xmm1) + xmm1
>> +; AVX512-NEXT: vfmadd231ss {{.*#+}} xmm2 = (xmm0 * xmm1) + xmm2
>> +; AVX512-NEXT: vfnmadd132ss {{.*#+}} xmm2 = -(xmm2 * xmm1) + xmm1
>> ; AVX512-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
>> ; AVX512-NEXT: vmulss %xmm1, %xmm2, %xmm3
>> ; AVX512-NEXT: vfnmadd213ss {{.*#+}} xmm0 = -(xmm3 * xmm0) + xmm1
>> @@ -610,9 +610,9 @@ define <4 x float> @v4f32_two_step2(<4 x
>> ; FMA-RECIP-LABEL: v4f32_two_step2:
>> ; FMA-RECIP: # %bb.0:
>> ; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1
>> -; FMA-RECIP-NEXT: vmovaps {{.*#+}} xmm2 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> -; FMA-RECIP-NEXT: vfnmadd231ps {{.*#+}} xmm2 = -(xmm0 * xmm1) + xmm2
>> -; FMA-RECIP-NEXT: vfmadd132ps {{.*#+}} xmm2 = (xmm2 * xmm1) + xmm1
>> +; FMA-RECIP-NEXT: vmovaps {{.*#+}} xmm2 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> +; FMA-RECIP-NEXT: vfmadd231ps {{.*#+}} xmm2 = (xmm0 * xmm1) + xmm2
>> +; FMA-RECIP-NEXT: vfnmadd132ps {{.*#+}} xmm2 = -(xmm2 * xmm1) + xmm1
>> ; FMA-RECIP-NEXT: vmovaps {{.*#+}} xmm1 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
>> ; FMA-RECIP-NEXT: vmulps %xmm1, %xmm2, %xmm3
>> ; FMA-RECIP-NEXT: vfnmadd213ps {{.*#+}} xmm0 = -(xmm3 * xmm0) + xmm1
>> @@ -622,9 +622,9 @@ define <4 x float> @v4f32_two_step2(<4 x
>> ; BDVER2-LABEL: v4f32_two_step2:
>> ; BDVER2: # %bb.0:
>> ; BDVER2-NEXT: vrcpps %xmm0, %xmm1
>> -; BDVER2-NEXT: vfnmaddps {{.*}}(%rip), %xmm1, %xmm0, %xmm2
>> +; BDVER2-NEXT: vfmaddps {{.*}}(%rip), %xmm1, %xmm0, %xmm2
>> ; BDVER2-NEXT: vmovaps {{.*#+}} xmm4 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
>> -; BDVER2-NEXT: vfmaddps %xmm1, %xmm2, %xmm1, %xmm1
>> +; BDVER2-NEXT: vfnmaddps %xmm1, %xmm2, %xmm1, %xmm1
>> ; BDVER2-NEXT: vmulps %xmm4, %xmm1, %xmm3
>> ; BDVER2-NEXT: vfnmaddps %xmm4, %xmm3, %xmm0, %xmm0
>> ; BDVER2-NEXT: vfmaddps %xmm3, %xmm0, %xmm1, %xmm0
>> @@ -665,9 +665,9 @@ define <4 x float> @v4f32_two_step2(<4 x
>> ; HASWELL-LABEL: v4f32_two_step2:
>> ; HASWELL: # %bb.0:
>> ; HASWELL-NEXT: vrcpps %xmm0, %xmm1
>> -; HASWELL-NEXT: vbroadcastss {{.*#+}} xmm2 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> -; HASWELL-NEXT: vfnmadd231ps {{.*#+}} xmm2 = -(xmm0 * xmm1) + xmm2
>> -; HASWELL-NEXT: vfmadd132ps {{.*#+}} xmm2 = (xmm2 * xmm1) + xmm1
>> +; HASWELL-NEXT: vbroadcastss {{.*#+}} xmm2 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> +; HASWELL-NEXT: vfmadd231ps {{.*#+}} xmm2 = (xmm0 * xmm1) + xmm2
>> +; HASWELL-NEXT: vfnmadd132ps {{.*#+}} xmm2 = -(xmm2 * xmm1) + xmm1
>> ; HASWELL-NEXT: vmovaps {{.*#+}} xmm1 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
>> ; HASWELL-NEXT: vmulps %xmm1, %xmm2, %xmm3
>> ; HASWELL-NEXT: vfnmadd213ps {{.*#+}} xmm0 = -(xmm3 * xmm0) + xmm1
>> @@ -693,9 +693,9 @@ define <4 x float> @v4f32_two_step2(<4 x
>> ; AVX512-LABEL: v4f32_two_step2:
>> ; AVX512: # %bb.0:
>> ; AVX512-NEXT: vrcpps %xmm0, %xmm1
>> -; AVX512-NEXT: vbroadcastss {{.*#+}} xmm2 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> -; AVX512-NEXT: vfnmadd231ps {{.*#+}} xmm2 = -(xmm0 * xmm1) + xmm2
>> -; AVX512-NEXT: vfmadd132ps {{.*#+}} xmm2 = (xmm2 * xmm1) + xmm1
>> +; AVX512-NEXT: vbroadcastss {{.*#+}} xmm2 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> +; AVX512-NEXT: vfmadd231ps {{.*#+}} xmm2 = (xmm0 * xmm1) + xmm2
>> +; AVX512-NEXT: vfnmadd132ps {{.*#+}} xmm2 = -(xmm2 * xmm1) + xmm1
>> ; AVX512-NEXT: vmovaps {{.*#+}} xmm1 = [1.0E+0,2.0E+0,3.0E+0,4.0E+0]
>> ; AVX512-NEXT: vmulps %xmm1, %xmm2, %xmm3
>> ; AVX512-NEXT: vfnmadd213ps {{.*#+}} xmm0 = -(xmm3 * xmm0) + xmm1
>> @@ -987,9 +987,9 @@ define <8 x float> @v8f32_two_step2(<8 x
>> ; FMA-RECIP-LABEL: v8f32_two_step2:
>> ; FMA-RECIP: # %bb.0:
>> ; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm1
>> -; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm2 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> -; FMA-RECIP-NEXT: vfnmadd231ps {{.*#+}} ymm2 = -(ymm0 * ymm1) + ymm2
>> -; FMA-RECIP-NEXT: vfmadd132ps {{.*#+}} ymm2 = (ymm2 * ymm1) + ymm1
>> +; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm2 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> +; FMA-RECIP-NEXT: vfmadd231ps {{.*#+}} ymm2 = (ymm0 * ymm1) + ymm2
>> +; FMA-RECIP-NEXT: vfnmadd132ps {{.*#+}} ymm2 = -(ymm2 * ymm1) + ymm1
>> ; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm1 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0,5.0E+0,6.0E+0,7.0E+0,8.0E+0]
>> ; FMA-RECIP-NEXT: vmulps %ymm1, %ymm2, %ymm3
>> ; FMA-RECIP-NEXT: vfnmadd213ps {{.*#+}} ymm0 = -(ymm3 * ymm0) + ymm1
>> @@ -999,9 +999,9 @@ define <8 x float> @v8f32_two_step2(<8 x
>> ; BDVER2-LABEL: v8f32_two_step2:
>> ; BDVER2: # %bb.0:
>> ; BDVER2-NEXT: vrcpps %ymm0, %ymm1
>> -; BDVER2-NEXT: vfnmaddps {{.*}}(%rip), %ymm1, %ymm0, %ymm2
>> +; BDVER2-NEXT: vfmaddps {{.*}}(%rip), %ymm1, %ymm0, %ymm2
>> ; BDVER2-NEXT: vmovaps {{.*#+}} ymm4 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0,5.0E+0,6.0E+0,7.0E+0,8.0E+0]
>> -; BDVER2-NEXT: vfmaddps %ymm1, %ymm2, %ymm1, %ymm1
>> +; BDVER2-NEXT: vfnmaddps %ymm1, %ymm2, %ymm1, %ymm1
>> ; BDVER2-NEXT: vmulps %ymm4, %ymm1, %ymm3
>> ; BDVER2-NEXT: vfnmaddps %ymm4, %ymm3, %ymm0, %ymm0
>> ; BDVER2-NEXT: vfmaddps %ymm3, %ymm0, %ymm1, %ymm0
>> @@ -1042,9 +1042,9 @@ define <8 x float> @v8f32_two_step2(<8 x
>> ; HASWELL-LABEL: v8f32_two_step2:
>> ; HASWELL: # %bb.0:
>> ; HASWELL-NEXT: vrcpps %ymm0, %ymm1
>> -; HASWELL-NEXT: vbroadcastss {{.*#+}} ymm2 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> -; HASWELL-NEXT: vfnmadd231ps {{.*#+}} ymm2 = -(ymm0 * ymm1) + ymm2
>> -; HASWELL-NEXT: vfmadd132ps {{.*#+}} ymm2 = (ymm2 * ymm1) + ymm1
>> +; HASWELL-NEXT: vbroadcastss {{.*#+}} ymm2 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> +; HASWELL-NEXT: vfmadd231ps {{.*#+}} ymm2 = (ymm0 * ymm1) + ymm2
>> +; HASWELL-NEXT: vfnmadd132ps {{.*#+}} ymm2 = -(ymm2 * ymm1) + ymm1
>> ; HASWELL-NEXT: vmovaps {{.*#+}} ymm1 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0,5.0E+0,6.0E+0,7.0E+0,8.0E+0]
>> ; HASWELL-NEXT: vmulps %ymm1, %ymm2, %ymm3
>> ; HASWELL-NEXT: vfnmadd213ps {{.*#+}} ymm0 = -(ymm3 * ymm0) + ymm1
>> @@ -1070,9 +1070,9 @@ define <8 x float> @v8f32_two_step2(<8 x
>> ; AVX512-LABEL: v8f32_two_step2:
>> ; AVX512: # %bb.0:
>> ; AVX512-NEXT: vrcpps %ymm0, %ymm1
>> -; AVX512-NEXT: vbroadcastss {{.*#+}} ymm2 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> -; AVX512-NEXT: vfnmadd231ps {{.*#+}} ymm2 = -(ymm0 * ymm1) + ymm2
>> -; AVX512-NEXT: vfmadd132ps {{.*#+}} ymm2 = (ymm2 * ymm1) + ymm1
>> +; AVX512-NEXT: vbroadcastss {{.*#+}} ymm2 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> +; AVX512-NEXT: vfmadd231ps {{.*#+}} ymm2 = (ymm0 * ymm1) + ymm2
>> +; AVX512-NEXT: vfnmadd132ps {{.*#+}} ymm2 = -(ymm2 * ymm1) + ymm1
>> ; AVX512-NEXT: vmovaps {{.*#+}} ymm1 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0,5.0E+0,6.0E+0,7.0E+0,8.0E+0]
>> ; AVX512-NEXT: vmulps %ymm1, %ymm2, %ymm3
>> ; AVX512-NEXT: vfnmadd213ps {{.*#+}} ymm0 = -(ymm3 * ymm0) + ymm1
>> @@ -1552,17 +1552,17 @@ define <16 x float> @v16f32_two_step2(<1
>> ; FMA-RECIP-LABEL: v16f32_two_step2:
>> ; FMA-RECIP: # %bb.0:
>> ; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm2
>> -; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm3 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> +; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm3 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> ; FMA-RECIP-NEXT: vmovaps %ymm2, %ymm4
>> -; FMA-RECIP-NEXT: vfnmadd213ps {{.*#+}} ymm4 = -(ymm0 * ymm4) + ymm3
>> -; FMA-RECIP-NEXT: vfmadd132ps {{.*#+}} ymm4 = (ymm4 * ymm2) + ymm2
>> +; FMA-RECIP-NEXT: vfmadd213ps {{.*#+}} ymm4 = (ymm0 * ymm4) + ymm3
>> +; FMA-RECIP-NEXT: vfnmadd132ps {{.*#+}} ymm4 = -(ymm4 * ymm2) + ymm2
>> ; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm2 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0,5.0E+0,6.0E+0,7.0E+0,8.0E+0]
>> ; FMA-RECIP-NEXT: vmulps %ymm2, %ymm4, %ymm5
>> ; FMA-RECIP-NEXT: vfnmadd213ps {{.*#+}} ymm0 = -(ymm5 * ymm0) + ymm2
>> ; FMA-RECIP-NEXT: vfmadd213ps {{.*#+}} ymm0 = (ymm4 * ymm0) + ymm5
>> ; FMA-RECIP-NEXT: vrcpps %ymm1, %ymm2
>> -; FMA-RECIP-NEXT: vfnmadd231ps {{.*#+}} ymm3 = -(ymm1 * ymm2) + ymm3
>> -; FMA-RECIP-NEXT: vfmadd132ps {{.*#+}} ymm3 = (ymm3 * ymm2) + ymm2
>> +; FMA-RECIP-NEXT: vfmadd231ps {{.*#+}} ymm3 = (ymm1 * ymm2) + ymm3
>> +; FMA-RECIP-NEXT: vfnmadd132ps {{.*#+}} ymm3 = -(ymm3 * ymm2) + ymm2
>> ; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm2 =
>> [9.0E+0,1.0E+1,1.1E+1,1.2E+1,1.3E+1,1.4E+1,1.5E+1,1.6E+1]
>> ; FMA-RECIP-NEXT: vmulps %ymm2, %ymm3, %ymm4
>> ; FMA-RECIP-NEXT: vfnmadd213ps {{.*#+}} ymm1 = -(ymm4 * ymm1) + ymm2
>> @@ -1572,17 +1572,17 @@ define <16 x float> @v16f32_two_step2(<1
>> ; BDVER2-LABEL: v16f32_two_step2:
>> ; BDVER2: # %bb.0:
>> ; BDVER2-NEXT: vrcpps %ymm0, %ymm2
>> -; BDVER2-NEXT: vmovaps {{.*#+}} ymm3 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> -; BDVER2-NEXT: vfnmaddps %ymm3, %ymm2, %ymm0, %ymm4
>> -; BDVER2-NEXT: vfmaddps %ymm2, %ymm4, %ymm2, %ymm2
>> +; BDVER2-NEXT: vmovaps {{.*#+}} ymm3 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> +; BDVER2-NEXT: vfmaddps %ymm3, %ymm2, %ymm0, %ymm4
>> +; BDVER2-NEXT: vfnmaddps %ymm2, %ymm4, %ymm2, %ymm2
>> ; BDVER2-NEXT: vmovaps {{.*#+}} ymm4 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0,5.0E+0,6.0E+0,7.0E+0,8.0E+0]
>> ; BDVER2-NEXT: vmulps %ymm4, %ymm2, %ymm5
>> ; BDVER2-NEXT: vfnmaddps %ymm4, %ymm5, %ymm0, %ymm0
>> ; BDVER2-NEXT: vfmaddps %ymm5, %ymm0, %ymm2, %ymm0
>> ; BDVER2-NEXT: vrcpps %ymm1, %ymm2
>> ; BDVER2-NEXT: vmovaps {{.*#+}} ymm5 =
>> [9.0E+0,1.0E+1,1.1E+1,1.2E+1,1.3E+1,1.4E+1,1.5E+1,1.6E+1]
>> -; BDVER2-NEXT: vfnmaddps %ymm3, %ymm2, %ymm1, %ymm3
>> -; BDVER2-NEXT: vfmaddps %ymm2, %ymm3, %ymm2, %ymm2
>> +; BDVER2-NEXT: vfmaddps %ymm3, %ymm2, %ymm1, %ymm3
>> +; BDVER2-NEXT: vfnmaddps %ymm2, %ymm3, %ymm2, %ymm2
>> ; BDVER2-NEXT: vmulps %ymm5, %ymm2, %ymm4
>> ; BDVER2-NEXT: vfnmaddps %ymm5, %ymm4, %ymm1, %ymm1
>> ; BDVER2-NEXT: vfmaddps %ymm4, %ymm1, %ymm2, %ymm1
>> @@ -1645,17 +1645,17 @@ define <16 x float> @v16f32_two_step2(<1
>> ; HASWELL-LABEL: v16f32_two_step2:
>> ; HASWELL: # %bb.0:
>> ; HASWELL-NEXT: vrcpps %ymm0, %ymm2
>> -; HASWELL-NEXT: vbroadcastss {{.*#+}} ymm3 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> +; HASWELL-NEXT: vbroadcastss {{.*#+}} ymm3 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> ; HASWELL-NEXT: vmovaps %ymm2, %ymm4
>> -; HASWELL-NEXT: vfnmadd213ps {{.*#+}} ymm4 = -(ymm0 * ymm4) + ymm3
>> -; HASWELL-NEXT: vfmadd132ps {{.*#+}} ymm4 = (ymm4 * ymm2) + ymm2
>> +; HASWELL-NEXT: vfmadd213ps {{.*#+}} ymm4 = (ymm0 * ymm4) + ymm3
>> +; HASWELL-NEXT: vfnmadd132ps {{.*#+}} ymm4 = -(ymm4 * ymm2) + ymm2
>> ; HASWELL-NEXT: vmovaps {{.*#+}} ymm2 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0,5.0E+0,6.0E+0,7.0E+0,8.0E+0]
>> ; HASWELL-NEXT: vmulps %ymm2, %ymm4, %ymm5
>> ; HASWELL-NEXT: vrcpps %ymm1, %ymm6
>> ; HASWELL-NEXT: vfnmadd213ps {{.*#+}} ymm0 = -(ymm5 * ymm0) + ymm2
>> ; HASWELL-NEXT: vfmadd213ps {{.*#+}} ymm0 = (ymm4 * ymm0) + ymm5
>> -; HASWELL-NEXT: vfnmadd231ps {{.*#+}} ymm3 = -(ymm1 * ymm6) + ymm3
>> -; HASWELL-NEXT: vfmadd132ps {{.*#+}} ymm3 = (ymm3 * ymm6) + ymm6
>> +; HASWELL-NEXT: vfmadd231ps {{.*#+}} ymm3 = (ymm1 * ymm6) + ymm3
>> +; HASWELL-NEXT: vfnmadd132ps {{.*#+}} ymm3 = -(ymm3 * ymm6) + ymm6
>> ; HASWELL-NEXT: vmovaps {{.*#+}} ymm2 =
>> [9.0E+0,1.0E+1,1.1E+1,1.2E+1,1.3E+1,1.4E+1,1.5E+1,1.6E+1]
>> ; HASWELL-NEXT: vmulps %ymm2, %ymm3, %ymm4
>> ; HASWELL-NEXT: vfnmadd213ps {{.*#+}} ymm1 = -(ymm4 * ymm1) + ymm2
>> @@ -1692,9 +1692,9 @@ define <16 x float> @v16f32_two_step2(<1
>> ; AVX512-LABEL: v16f32_two_step2:
>> ; AVX512: # %bb.0:
>> ; AVX512-NEXT: vrcp14ps %zmm0, %zmm1
>> -; AVX512-NEXT: vbroadcastss {{.*#+}} zmm2 =
>> [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
>> -; AVX512-NEXT: vfnmadd231ps {{.*#+}} zmm2 = -(zmm0 * zmm1) + zmm2
>> -; AVX512-NEXT: vfmadd132ps {{.*#+}} zmm2 = (zmm2 * zmm1) + zmm1
>> +; AVX512-NEXT: vbroadcastss {{.*#+}} zmm2 =
>> [-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0,-1.0E+0]
>> +; AVX512-NEXT: vfmadd231ps {{.*#+}} zmm2 = (zmm0 * zmm1) + zmm2
>> +; AVX512-NEXT: vfnmadd132ps {{.*#+}} zmm2 = -(zmm2 * zmm1) + zmm1
>> ; AVX512-NEXT: vmovaps {{.*#+}} zmm1 =
>> [1.0E+0,2.0E+0,3.0E+0,4.0E+0,5.0E+0,6.0E+0,7.0E+0,8.0E+0,9.0E+0,1.0E+1,1.1E+1,1.2E+1,1.3E+1,1.4E+1,1.5E+1,1.6E+1]
>> ; AVX512-NEXT: vmulps %zmm1, %zmm2, %zmm3
>> ; AVX512-NEXT: vfnmadd213ps {{.*#+}} zmm0 = -(zmm3 * zmm0) + zmm1
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
>
> --
> Regards,
> Ilya Biryukov
>
--
Regards,
Ilya Biryukov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190924/f63eaa15/attachment.html>
More information about the llvm-commits
mailing list