[llvm] r242436 - AArch64: Implement conditional compare sequence matching.

Wed Jul 22 14:22:23 PDT 2015

Hi Arnaud,

the attached patches fix the CMP+CSEL behavior for your example. The first leads to more aggressive select combining, while that is a good thing as a DAGCombine given recent experiences it would probably be best to actually benchmark it to make sure we are not unlucky with scheduling/regalloc.

The 2nd patch should always be beneficial but unfortunately the code will never trigger without the first patch.

Could you run your testsuite again with the two patches applied?

Thanks
	Matthias

-------------- next part --------------
A non-text attachment was scrubbed...
Name: selectcombine_1.patch
Type: application/octet-stream
Size: 4228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150722/abe1fc3c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: selectcombine_2.patch
Type: application/octet-stream
Size: 6945 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150722/abe1fc3c/attachment-0001.obj>
-------------- next part --------------

> On Jul 17, 2015, at 8:39 AM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com> wrote:
> 
> Hi Matthias,
> 
> Thanks for working on this ! It brings some nice improvements. We however
> found a 11% regression in one of our performance benchmark on cortex-A57.
> I've attached a reproducer.
> 
>> llc -o - -mcpu=cortex-a57 reg.ll // Without your patch
> 	...
> 	csel	w8, w9, w8, lt
> 	adrp	x9, d
> 	cmp	 w8, #100 
> 	str	w8, [x9, :lo12:d]
> 	csel	w8, w11, w8, lt
> 	cmp	 w8, w11, lsl #2
> 	csel	w8, w11, w8, gt
> 	str	w8, [x9, :lo12:d]
> 	adrp	x9, acc
> 	ldr	x11, [x9, :lo12:acc]
> 	...
> 
>> llc -o - -mcpu=cortex-a57 reg.ll // With your patch
> 	...
> 	csel	w8, w9, w8, lt
> 	cmp	 w8, #100
> 	csel	w12, w11, w8, lt
> 	lsl	w13, w11, #2
> 	adrp	x9, d
> 	str	w8, [x9, :lo12:d]
> 	ccmp	w12, w13, #0, ge
> 	csel	w8, w11, w8, gt
> 	cmp	 w12, w13
> 	str	w8, [x9, :lo12:d]
> 	adrp	x9, acc
> 	csel	w8, w11, w12, gt
> 	ldr	x11, [x9, :lo12:acc]
> 	...
> 
> Before your patch, we had 3 x csel + 2 x cmp
> With your patch applied, we have 4 x csel + 2 x cmp + 1 x ccmp + 1 x lsl
> 
> Could you have a look into this regression ?
> 
> Thanks,
> Arnaud
> 
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> bounces at cs.uiuc.edu] On Behalf Of Matthias Braun
>> Sent: 16 July 2015 22:03
>> To: llvm-commits at cs.uiuc.edu
>> Subject: [llvm] r242436 - AArch64: Implement conditional compare sequence
>> matching.
>> 
>> Author: matze
>> Date: Thu Jul 16 15:02:37 2015
>> New Revision: 242436
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=242436&view=rev
>> Log:
>> AArch64: Implement conditional compare sequence matching.
>> 
>> This is a new iteration of the reverted r238793 /
>> http://reviews.llvm.org/D8232 which wrongly assumed that any and/or trees
>> can be represented by conditional compare sequences, however there are
>> some restrictions to that. This version fixes this and adds comments that
>> explain exactly what types of and/or trees can actually be implemented as
>> conditional compare sequences.
>> 
>> Related to http://llvm.org/PR20927, rdar://18326194
>> 
>> Differential Revision: http://reviews.llvm.org/D10579
>> 
>> Modified:
>>    llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp
>>    llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h
>>    llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td
>>    llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td
>>    llvm/trunk/test/CodeGen/AArch64/arm64-ccmp.ll
>> 
>> Modified: llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp
>> URL: http://llvm.org/viewvc/llvm-
>> project/llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp?rev=24243
>> 6&r1=242435&r2=242436&view=diff
>> ==========================================================
>> ====================
>> --- llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp Thu Jul 16
>> +++ 15:02:37 2015
>> @@ -76,6 +76,9 @@ cl::opt<bool> EnableAArch64ELFLocalDynam
>>     cl::desc("Allow AArch64 Local Dynamic TLS code generation"),
>>     cl::init(false));
>> 
>> +/// Value type used for condition codes.
>> +static const MVT MVT_CC = MVT::i32;
>> +
>> AArch64TargetLowering::AArch64TargetLowering(const TargetMachine
>> &TM,
>>                                              const AArch64Subtarget &STI)
>>     : TargetLowering(TM), Subtarget(&STI) { @@ -809,6 +812,9 @@ const
> char
>> *AArch64TargetLowering::getTa
>>   case AArch64ISD::ADCS:              return "AArch64ISD::ADCS";
>>   case AArch64ISD::SBCS:              return "AArch64ISD::SBCS";
>>   case AArch64ISD::ANDS:              return "AArch64ISD::ANDS";
>> +  case AArch64ISD::CCMP:              return "AArch64ISD::CCMP";
>> +  case AArch64ISD::CCMN:              return "AArch64ISD::CCMN";
>> +  case AArch64ISD::FCCMP:             return "AArch64ISD::FCCMP";
>>   case AArch64ISD::FCMP:              return "AArch64ISD::FCMP";
>>   case AArch64ISD::FMIN:              return "AArch64ISD::FMIN";
>>   case AArch64ISD::FMAX:              return "AArch64ISD::FMAX";
>> @@ -1167,14 +1173,224 @@ static SDValue emitComparison(SDValue LH
>>     LHS = LHS.getOperand(0);
>>   }
>> 
>> -  return DAG.getNode(Opcode, dl, DAG.getVTList(VT, MVT::i32), LHS, RHS)
>> +  return DAG.getNode(Opcode, dl, DAG.getVTList(VT, MVT_CC), LHS, RHS)
>>       .getValue(1);
>> }
>> 
>> +/// \defgroup AArch64CCMP CMP;CCMP matching /// /// These functions
>> +deal with the formation of CMP;CCMP;... sequences.
>> +/// The CCMP/CCMN/FCCMP/FCCMPE instructions allow the conditional
>> +execution of /// a comparison. They set the NZCV flags to a predefined
>> +value if their /// predicate is false. This allows to express arbitrary
>> +conjunctions, for /// example "cmp 0 (and (setCA (cmp A)) (setCB (cmp
>> B))))"
>> +/// expressed as:
>> +///   cmp A
>> +///   ccmp B, inv(CB), CA
>> +///   check for CB flags
>> +///
>> +/// In general we can create code for arbitrary "... (and (and A B) C)"
>> +/// sequences. We can also implement some "or" expressions, because "(or
>> A B)"
>> +/// is equivalent to "not (and (not A) (not B))" and we can implement
>> +some /// negation operations:
>> +/// We can negate the results of a single comparison by inverting the
>> +flags /// used when the predicate fails and inverting the flags tested
>> +in the next /// instruction; We can also negate the results of the
>> +whole previous /// conditional compare sequence by inverting the flags
>> +tested in the next /// instruction. However there is no way to negate
>> +the result of a partial /// sequence.
>> +///
>> +/// Therefore on encountering an "or" expression we can negate the
>> +subtree on /// one side and have to be able to push the negate to the
>> +leafs of the subtree /// on the other side (see also the comments in
> code).
>> As complete example:
>> +/// "or (or (setCA (cmp A)) (setCB (cmp B)))
>> +///     (and (setCC (cmp C)) (setCD (cmp D)))"
>> +/// is transformed to
>> +/// "not (and (not (and (setCC (cmp C)) (setCC (cmp D))))
>> +///           (and (not (setCA (cmp A)) (not (setCB (cmp B))))))"
>> +/// and implemented as:
>> +///   cmp C
>> +///   ccmp D, inv(CD), CC
>> +///   ccmp A, CA, inv(CD)
>> +///   ccmp B, CB, inv(CA)
>> +///   check for CB flags
>> +/// A counterexample is "or (and A B) (and C D)" which cannot be
>> +implemented /// by conditional compare sequences.
>> +/// @{
>> +
>> +/// Create a conditional comparison; Use CCMP, CCMN or FCCMP as
>> apropriate.
>> +static SDValue emitConditionalComparison(SDValue LHS, SDValue RHS,
>> +                                         ISD::CondCode CC, SDValue CCOp,
>> +                                         SDValue Condition, unsigned
> NZCV,
>> +                                         SDLoc DL, SelectionDAG &DAG) {
>> +  unsigned Opcode = 0;
>> +  if (LHS.getValueType().isFloatingPoint())
>> +    Opcode = AArch64ISD::FCCMP;
>> +  else if (RHS.getOpcode() == ISD::SUB) {
>> +    SDValue SubOp0 = RHS.getOperand(0);
>> +    if (const ConstantSDNode *SubOp0C =
>> dyn_cast<ConstantSDNode>(SubOp0))
>> +      if (SubOp0C->isNullValue() && (CC == ISD::SETEQ || CC ==
> ISD::SETNE)) {
>> +        // See emitComparison() on why we can only do this for SETEQ and
>> SETNE.
>> +        Opcode = AArch64ISD::CCMN;
>> +        RHS = RHS.getOperand(1);
>> +      }
>> +  }
>> +  if (Opcode == 0)
>> +    Opcode = AArch64ISD::CCMP;
>> +
>> +  SDValue NZCVOp = DAG.getConstant(NZCV, DL, MVT::i32);
>> +  return DAG.getNode(Opcode, DL, MVT_CC, LHS, RHS, NZCVOp, Condition,
>> +CCOp); }
>> +
>> +/// Returns true if @p Val is a tree of AND/OR/SETCC operations.
>> +/// CanPushNegate is set to true if we can push a negate operation
>> +through /// the tree in a was that we are left with AND operations and
>> +negate operations /// at the leafs only. i.e. "not (or (or x y) z)" can
>> +be changed to /// "and (and (not x) (not y)) (not z)"; "not (or (and x
>> +y) z)" cannot be /// brought into such a form.
>> +static bool isConjunctionDisjunctionTree(const SDValue Val, bool
>> &CanPushNegate,
>> +                                         unsigned Depth = 0) {
>> +  if (!Val.hasOneUse())
>> +    return false;
>> +  unsigned Opcode = Val->getOpcode();
>> +  if (Opcode == ISD::SETCC) {
>> +    CanPushNegate = true;
>> +    return true;
>> +  }
>> +  // Protect against stack overflow.
>> +  if (Depth > 15)
>> +    return false;
>> +  if (Opcode == ISD::AND || Opcode == ISD::OR) {
>> +    SDValue O0 = Val->getOperand(0);
>> +    SDValue O1 = Val->getOperand(1);
>> +    bool CanPushNegateL;
>> +    if (!isConjunctionDisjunctionTree(O0, CanPushNegateL, Depth+1))
>> +      return false;
>> +    bool CanPushNegateR;
>> +    if (!isConjunctionDisjunctionTree(O1, CanPushNegateR, Depth+1))
>> +      return false;
>> +    // We cannot push a negate through an AND operation (it would become
>> an OR),
>> +    // we can however change a (not (or x y)) to (and (not x) (not y)) if
> we can
>> +    // push the negate through the x/y subtrees.
>> +    CanPushNegate = (Opcode == ISD::OR) && CanPushNegateL &&
>> CanPushNegateR;
>> +    return true;
>> +  }
>> +  return false;
>> +}
>> +
>> +/// Emit conjunction or disjunction tree with the CMP/FCMP followed by
>> +a chain /// of CCMP/CFCMP ops. See @ref AArch64CCMP.
>> +/// Tries to transform the given i1 producing node @p Val to a series
>> +compare /// and conditional compare operations. @returns an NZCV flags
>> +producing node /// and sets @p OutCC to the flags that should be tested
>> +or returns SDValue() if /// transformation was not possible.
>> +/// On recursive invocations @p PushNegate may be set to true to have
>> +negation /// effects pushed to the tree leafs; @p Predicate is an NZCV
>> +flag predicate /// for the comparisons in the current subtree; @p Depth
>> +limits the search /// depth to avoid stack overflow.
>> +static SDValue emitConjunctionDisjunctionTree(SelectionDAG &DAG,
>> SDValue Val,
>> +    AArch64CC::CondCode &OutCC, bool PushNegate = false,
>> +    SDValue CCOp = SDValue(), AArch64CC::CondCode Predicate =
>> AArch64CC::AL,
>> +    unsigned Depth = 0) {
>> +  // We're at a tree leaf, produce a conditional comparison operation.
>> +  unsigned Opcode = Val->getOpcode();
>> +  if (Opcode == ISD::SETCC) {
>> +    SDValue LHS = Val->getOperand(0);
>> +    SDValue RHS = Val->getOperand(1);
>> +    ISD::CondCode CC = cast<CondCodeSDNode>(Val->getOperand(2))-
>>> get();
>> +    bool isInteger = LHS.getValueType().isInteger();
>> +    if (PushNegate)
>> +      CC = getSetCCInverse(CC, isInteger);
>> +    SDLoc DL(Val);
>> +    // Determine OutCC and handle FP special case.
>> +    if (isInteger) {
>> +      OutCC = changeIntCCToAArch64CC(CC);
>> +    } else {
>> +      assert(LHS.getValueType().isFloatingPoint());
>> +      AArch64CC::CondCode ExtraCC;
>> +      changeFPCCToAArch64CC(CC, OutCC, ExtraCC);
>> +      // Surpisingly some floating point conditions can't be tested with
> a
>> +      // single condition code. Construct an additional comparison in
> this case.
>> +      // See comment below on how we deal with OR conditions.
>> +      if (ExtraCC != AArch64CC::AL) {
>> +        SDValue ExtraCmp;
>> +        if (!CCOp.getNode())
>> +          ExtraCmp = emitComparison(LHS, RHS, CC, DL, DAG);
>> +        else {
>> +          SDValue ConditionOp = DAG.getConstant(Predicate, DL, MVT_CC);
>> +          // Note that we want the inverse of ExtraCC, so NZCV is not
> inversed.
>> +          unsigned NZCV = AArch64CC::getNZCVToSatisfyCondCode(ExtraCC);
>> +          ExtraCmp = emitConditionalComparison(LHS, RHS, CC, CCOp,
>> ConditionOp,
>> +                                               NZCV, DL, DAG);
>> +        }
>> +        CCOp = ExtraCmp;
>> +        Predicate = AArch64CC::getInvertedCondCode(ExtraCC);
>> +        OutCC = AArch64CC::getInvertedCondCode(OutCC);
>> +      }
>> +    }
>> +
>> +    // Produce a normal comparison if we are first in the chain
>> +    if (!CCOp.getNode())
>> +      return emitComparison(LHS, RHS, CC, DL, DAG);
>> +    // Otherwise produce a ccmp.
>> +    SDValue ConditionOp = DAG.getConstant(Predicate, DL, MVT_CC);
>> +    AArch64CC::CondCode InvOutCC =
>> AArch64CC::getInvertedCondCode(OutCC);
>> +    unsigned NZCV = AArch64CC::getNZCVToSatisfyCondCode(InvOutCC);
>> +    return emitConditionalComparison(LHS, RHS, CC, CCOp, ConditionOp,
>> NZCV, DL,
>> +                                     DAG);  } else if (Opcode !=
>> + ISD::AND && Opcode != ISD::OR)
>> +    return SDValue();
>> +
>> +  assert((Opcode == ISD::OR || !PushNegate)
>> +         && "Can only push negate through OR operation");
>> +
>> +  // Check if both sides can be transformed.
>> +  SDValue LHS = Val->getOperand(0);
>> +  SDValue RHS = Val->getOperand(1);
>> +  bool CanPushNegateL;
>> +  if (!isConjunctionDisjunctionTree(LHS, CanPushNegateL, Depth+1))
>> +    return SDValue();
>> +  bool CanPushNegateR;
>> +  if (!isConjunctionDisjunctionTree(RHS, CanPushNegateR, Depth+1))
>> +    return SDValue();
>> +
>> +  // Do we need to negate our operands?
>> +  bool NegateOperands = Opcode == ISD::OR;  // We can negate the
>> + results of all previous operations by inverting the  // predicate
>> + flags giving us a free negation for one side. For the other side  //
>> + we need to be able to push the negation to the leafs of the tree.
>> +  if (NegateOperands) {
>> +    if (!CanPushNegateL && !CanPushNegateR)
>> +      return SDValue();
>> +    // Order the side where we can push the negate through to LHS.
>> +    if (!CanPushNegateL && CanPushNegateR) {
>> +      std::swap(LHS, RHS);
>> +      CanPushNegateL = true;
>> +    }
>> +  }
>> +
>> +  // Emit RHS. If we want to negate the tree we only need to push a
>> +negate
>> +  // through if we are already in a PushNegate case, otherwise we can
>> +negate
>> +  // the "flags to test" afterwards.
>> +  AArch64CC::CondCode RHSCC;
>> +  SDValue CmpR = emitConjunctionDisjunctionTree(DAG, RHS, RHSCC,
>> PushNegate,
>> +                                                CCOp, Predicate,
>> +Depth+1);
>> +  if (NegateOperands && !PushNegate)
>> +    RHSCC = AArch64CC::getInvertedCondCode(RHSCC);
>> +  // Emit LHS. We must push the negate through if we need to negate it.
>> +  SDValue CmpL = emitConjunctionDisjunctionTree(DAG, LHS, OutCC,
>> NegateOperands,
>> +                                                CmpR, RHSCC, Depth+1);
>> +  // If we transformed an OR to and AND then we have to negate the
>> +result
>> +  // (or absorb a PushNegate resulting in a double negation).
>> +  if (Opcode == ISD::OR && !PushNegate)
>> +    OutCC = AArch64CC::getInvertedCondCode(OutCC);
>> +  return CmpL;
>> +}
>> +
>> +/// @}
>> +
>> static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode
>> CC,
>>                              SDValue &AArch64cc, SelectionDAG &DAG, SDLoc
> dl) {
>> -  SDValue Cmp;
>> -  AArch64CC::CondCode AArch64CC;
>>   if (ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS.getNode()))
>> {
>>     EVT VT = RHS.getValueType();
>>     uint64_t C = RHSC->getZExtValue();
>> @@ -1229,47 +1445,56 @@ static SDValue getAArch64Cmp(SDValue LHS
>>       }
>>     }
>>   }
>> -  // The imm operand of ADDS is an unsigned immediate, in the range 0 to
>> 4095.
>> -  // For the i8 operand, the largest immediate is 255, so this can be
> easily
>> -  // encoded in the compare instruction. For the i16 operand, however,
> the
>> -  // largest immediate cannot be encoded in the compare.
>> -  // Therefore, use a sign extending load and cmn to avoid materializing
> the -
>> 1
>> -  // constant. For example,
>> -  // movz w1, #65535
>> -  // ldrh w0, [x0, #0]
>> -  // cmp w0, w1
>> -  // >
>> -  // ldrsh w0, [x0, #0]
>> -  // cmn w0, #1
>> -  // Fundamental, we're relying on the property that (zext LHS) == (zext
> RHS)
>> -  // if and only if (sext LHS) == (sext RHS). The checks are in place to
> ensure
>> -  // both the LHS and RHS are truely zero extended and to make sure the
>> -  // transformation is profitable.
>> +  SDValue Cmp;
>> +  AArch64CC::CondCode AArch64CC;
>>   if ((CC == ISD::SETEQ || CC == ISD::SETNE) && isa<ConstantSDNode>(RHS))
>> {
>> -    if ((cast<ConstantSDNode>(RHS)->getZExtValue() >> 16 == 0) &&
>> -        isa<LoadSDNode>(LHS)) {
>> -      if (cast<LoadSDNode>(LHS)->getExtensionType() == ISD::ZEXTLOAD &&
>> -          cast<LoadSDNode>(LHS)->getMemoryVT() == MVT::i16 &&
>> -          LHS.getNode()->hasNUsesOfValue(1, 0)) {
>> -        int16_t ValueofRHS = cast<ConstantSDNode>(RHS)->getZExtValue();
>> -        if (ValueofRHS < 0 && isLegalArithImmed(-ValueofRHS)) {
>> -          SDValue SExt =
>> -              DAG.getNode(ISD::SIGN_EXTEND_INREG, dl, LHS.getValueType(),
>> LHS,
>> -                          DAG.getValueType(MVT::i16));
>> -          Cmp = emitComparison(SExt,
>> -                               DAG.getConstant(ValueofRHS, dl,
>> -                                               RHS.getValueType()),
>> -                               CC, dl, DAG);
>> -          AArch64CC = changeIntCCToAArch64CC(CC);
>> -          AArch64cc = DAG.getConstant(AArch64CC, dl, MVT::i32);
>> -          return Cmp;
>> -        }
>> +    const ConstantSDNode *RHSC = cast<ConstantSDNode>(RHS);
>> +
>> +    // The imm operand of ADDS is an unsigned immediate, in the range 0
> to
>> 4095.
>> +    // For the i8 operand, the largest immediate is 255, so this can be
> easily
>> +    // encoded in the compare instruction. For the i16 operand, however,
> the
>> +    // largest immediate cannot be encoded in the compare.
>> +    // Therefore, use a sign extending load and cmn to avoid
> materializing the
>> +    // -1 constant. For example,
>> +    // movz w1, #65535
>> +    // ldrh w0, [x0, #0]
>> +    // cmp w0, w1
>> +    // >
>> +    // ldrsh w0, [x0, #0]
>> +    // cmn w0, #1
>> +    // Fundamental, we're relying on the property that (zext LHS) ==
> (zext
>> RHS)
>> +    // if and only if (sext LHS) == (sext RHS). The checks are in place
> to
>> +    // ensure both the LHS and RHS are truely zero extended and to make
>> sure the
>> +    // transformation is profitable.
>> +    if ((RHSC->getZExtValue() >> 16 == 0) && isa<LoadSDNode>(LHS) &&
>> +        cast<LoadSDNode>(LHS)->getExtensionType() == ISD::ZEXTLOAD &&
>> +        cast<LoadSDNode>(LHS)->getMemoryVT() == MVT::i16 &&
>> +        LHS.getNode()->hasNUsesOfValue(1, 0)) {
>> +      int16_t ValueofRHS = cast<ConstantSDNode>(RHS)->getZExtValue();
>> +      if (ValueofRHS < 0 && isLegalArithImmed(-ValueofRHS)) {
>> +        SDValue SExt =
>> +            DAG.getNode(ISD::SIGN_EXTEND_INREG, dl, LHS.getValueType(),
>> LHS,
>> +                        DAG.getValueType(MVT::i16));
>> +        Cmp = emitComparison(SExt, DAG.getConstant(ValueofRHS, dl,
>> +                                                   RHS.getValueType()),
>> +                             CC, dl, DAG);
>> +        AArch64CC = changeIntCCToAArch64CC(CC);
>> +      }
>> +    }
>> +
>> +    if (!Cmp && (RHSC->isNullValue() || RHSC->isOne())) {
>> +      if ((Cmp = emitConjunctionDisjunctionTree(DAG, LHS, AArch64CC))) {
>> +        if ((CC == ISD::SETNE) ^ RHSC->isNullValue())
>> +          AArch64CC = AArch64CC::getInvertedCondCode(AArch64CC);
>>       }
>>     }
>>   }
>> -  Cmp = emitComparison(LHS, RHS, CC, dl, DAG);
>> -  AArch64CC = changeIntCCToAArch64CC(CC);
>> -  AArch64cc = DAG.getConstant(AArch64CC, dl, MVT::i32);
>> +
>> +  if (!Cmp) {
>> +    Cmp = emitComparison(LHS, RHS, CC, dl, DAG);
>> +    AArch64CC = changeIntCCToAArch64CC(CC);  }  AArch64cc =
>> + DAG.getConstant(AArch64CC, dl, MVT_CC);
>>   return Cmp;
>> }
>> 
>> @@ -9294,3 +9519,8 @@ bool AArch64TargetLowering::functionArgu
>>     Type *Ty, CallingConv::ID CallConv, bool isVarArg) const {
>>   return Ty->isArrayTy();
>> }
>> +
>> +bool
>> AArch64TargetLowering::shouldNormalizeToSelectSequence(LLVMContext
>> &,
>> +                                                            EVT) const
>> +{
>> +  return false;
>> +}
>> 
>> Modified: llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h
>> URL: http://llvm.org/viewvc/llvm-
>> project/llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h?rev=242436&
>> r1=242435&r2=242436&view=diff
>> ==========================================================
>> ====================
>> --- llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h (original)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.h Thu Jul 16
>> +++ 15:02:37 2015
>> @@ -58,6 +58,11 @@ enum NodeType : unsigned {
>>   SBCS,
>>   ANDS,
>> 
>> +  // Conditional compares. Operands: left,right,falsecc,cc,flags  CCMP,
>> + CCMN,  FCCMP,
>> +
>>   // Floating point comparison
>>   FCMP,
>> 
>> @@ -516,6 +521,8 @@ private:
>>   bool functionArgumentNeedsConsecutiveRegisters(Type *Ty,
>>                                                  CallingConv::ID
> CallConv,
>>                                                  bool isVarArg) const
> override;
>> +
>> +  bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const
>> + override;
>> };
>> 
>> namespace AArch64 {
>> 
>> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td
>> URL: http://llvm.org/viewvc/llvm-
>> project/llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td?rev=242436
>> &r1=242435&r2=242436&view=diff
>> ==========================================================
>> ====================
>> --- llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td (original)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td Thu Jul 16
>> +++ 15:02:37 2015
>> @@ -525,6 +525,13 @@ def imm0_31 : Operand<i64>, ImmLeaf<i64,
>>   let ParserMatchClass = Imm0_31Operand;  }
>> 
>> +// True if the 32-bit immediate is in the range [0,31] def imm32_0_31 :
>> +Operand<i32>, ImmLeaf<i32, [{
>> +  return ((uint64_t)Imm) < 32;
>> +}]> {
>> +  let ParserMatchClass = Imm0_31Operand; }
>> +
>> // imm0_15 predicate - True if the immediate is in the range [0,15]  def
>> imm0_15 : Operand<i64>, ImmLeaf<i64, [{
>>   return ((uint64_t)Imm) < 16;
>> @@ -542,7 +549,9 @@ def imm0_7 : Operand<i64>, ImmLeaf<i64,  //
>> imm32_0_15 predicate - True if the 32-bit immediate is in the range [0,15]
>> def imm32_0_15 : Operand<i32>, ImmLeaf<i32, [{
>>   return ((uint32_t)Imm) < 16;
>> -}]>;
>> +}]> {
>> +  let ParserMatchClass = Imm0_15Operand; }
>> 
>> // An arithmetic shifter operand:
>> //  {7-6} - shift type: 00 = lsl, 01 = lsr, 10 = asr @@ -2108,9 +2117,12
> @@
>> multiclass LogicalRegS<bits<2> opc, bit
>> //---
>> 
>> let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in -class
>> BaseCondSetFlagsImm<bit op, RegisterClass regtype, string asm>
>> -    : I<(outs), (ins regtype:$Rn, imm0_31:$imm, imm0_15:$nzcv,
>> ccode:$cond),
>> -         asm, "\t$Rn, $imm, $nzcv, $cond", "", []>,
>> +class BaseCondComparisonImm<bit op, RegisterClass regtype, ImmLeaf
>> immtype,
>> +                            string mnemonic, SDNode OpNode>
>> +    : I<(outs), (ins regtype:$Rn, immtype:$imm, imm32_0_15:$nzcv,
>> ccode:$cond),
>> +         mnemonic, "\t$Rn, $imm, $nzcv, $cond", "",
>> +         [(set NZCV, (OpNode regtype:$Rn, immtype:$imm, (i32 imm:$nzcv),
>> +                             (i32 imm:$cond), NZCV))]>,
>>       Sched<[WriteI, ReadI]> {
>>   let Uses = [NZCV];
>>   let Defs = [NZCV];
>> @@ -2130,19 +2142,13 @@ class BaseCondSetFlagsImm<bit op, Regist
>>   let Inst{3-0}   = nzcv;
>> }
>> 
>> -multiclass CondSetFlagsImm<bit op, string asm> {
>> -  def Wi : BaseCondSetFlagsImm<op, GPR32, asm> {
>> -    let Inst{31} = 0;
>> -  }
>> -  def Xi : BaseCondSetFlagsImm<op, GPR64, asm> {
>> -    let Inst{31} = 1;
>> -  }
>> -}
>> -
>> let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in -class
>> BaseCondSetFlagsReg<bit op, RegisterClass regtype, string asm>
>> -    : I<(outs), (ins regtype:$Rn, regtype:$Rm, imm0_15:$nzcv,
> ccode:$cond),
>> -         asm, "\t$Rn, $Rm, $nzcv, $cond", "", []>,
>> +class BaseCondComparisonReg<bit op, RegisterClass regtype, string
>> mnemonic,
>> +                            SDNode OpNode>
>> +    : I<(outs), (ins regtype:$Rn, regtype:$Rm, imm32_0_15:$nzcv,
>> ccode:$cond),
>> +         mnemonic, "\t$Rn, $Rm, $nzcv, $cond", "",
>> +         [(set NZCV, (OpNode regtype:$Rn, regtype:$Rm, (i32 imm:$nzcv),
>> +                             (i32 imm:$cond), NZCV))]>,
>>       Sched<[WriteI, ReadI, ReadI]> {
>>   let Uses = [NZCV];
>>   let Defs = [NZCV];
>> @@ -2162,11 +2168,19 @@ class BaseCondSetFlagsReg<bit op, Regist
>>   let Inst{3-0}   = nzcv;
>> }
>> 
>> -multiclass CondSetFlagsReg<bit op, string asm> {
>> -  def Wr : BaseCondSetFlagsReg<op, GPR32, asm> {
>> +multiclass CondComparison<bit op, string mnemonic, SDNode OpNode> {
>> +  // immediate operand variants
>> +  def Wi : BaseCondComparisonImm<op, GPR32, imm32_0_31, mnemonic,
>> +OpNode> {
>>     let Inst{31} = 0;
>>   }
>> -  def Xr : BaseCondSetFlagsReg<op, GPR64, asm> {
>> +  def Xi : BaseCondComparisonImm<op, GPR64, imm0_31, mnemonic,
>> OpNode> {
>> +    let Inst{31} = 1;
>> +  }
>> +  // register operand variants
>> +  def Wr : BaseCondComparisonReg<op, GPR32, mnemonic, OpNode> {
>> +    let Inst{31} = 0;
>> +  }
>> +  def Xr : BaseCondComparisonReg<op, GPR64, mnemonic, OpNode> {
>>     let Inst{31} = 1;
>>   }
>> }
>> @@ -3974,11 +3988,14 @@ multiclass FPComparison<bit signalAllNan
>> //---
>> 
>> let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in -class
>> BaseFPCondComparison<bit signalAllNans,
>> -                              RegisterClass regtype, string asm>
>> -    : I<(outs), (ins regtype:$Rn, regtype:$Rm, imm0_15:$nzcv,
> ccode:$cond),
>> -         asm, "\t$Rn, $Rm, $nzcv, $cond", "", []>,
>> +class BaseFPCondComparison<bit signalAllNans, RegisterClass regtype,
>> +                           string mnemonic, list<dag> pat>
>> +    : I<(outs), (ins regtype:$Rn, regtype:$Rm, imm32_0_15:$nzcv,
>> ccode:$cond),
>> +         mnemonic, "\t$Rn, $Rm, $nzcv, $cond", "", pat>,
>>       Sched<[WriteFCmp]> {
>> +  let Uses = [NZCV];
>> +  let Defs = [NZCV];
>> +
>>   bits<5> Rn;
>>   bits<5> Rm;
>>   bits<4> nzcv;
>> @@ -3994,16 +4011,18 @@ class BaseFPCondComparison<bit signalAll
>>   let Inst{3-0}   = nzcv;
>> }
>> 
>> -multiclass FPCondComparison<bit signalAllNans, string asm> {
>> -  let Defs = [NZCV], Uses = [NZCV] in {
>> -  def Srr : BaseFPCondComparison<signalAllNans, FPR32, asm> {
>> +multiclass FPCondComparison<bit signalAllNans, string mnemonic,
>> +                            SDPatternOperator OpNode = null_frag> {
>> +  def Srr : BaseFPCondComparison<signalAllNans, FPR32, mnemonic,
>> +      [(set NZCV, (OpNode (f32 FPR32:$Rn), (f32 FPR32:$Rm), (i32
> imm:$nzcv),
>> +                          (i32 imm:$cond), NZCV))]> {
>>     let Inst{22} = 0;
>>   }
>> -
>> -  def Drr : BaseFPCondComparison<signalAllNans, FPR64, asm> {
>> +  def Drr : BaseFPCondComparison<signalAllNans, FPR64, mnemonic,
>> +      [(set NZCV, (OpNode (f64 FPR64:$Rn), (f64 FPR64:$Rm), (i32
> imm:$nzcv),
>> +                          (i32 imm:$cond), NZCV))]> {
>>     let Inst{22} = 1;
>>   }
>> -  } // Defs = [NZCV], Uses = [NZCV]
>> }
>> 
>> //---
>> 
>> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td
>> URL: http://llvm.org/viewvc/llvm-
>> project/llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td?rev=242436&r1=
>> 242435&r2=242436&view=diff
>> ==========================================================
>> ====================
>> --- llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td (original)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td Thu Jul 16
>> +++ 15:02:37 2015
>> @@ -66,6 +66,20 @@ def SDT_AArch64CSel  : SDTypeProfile<1,
>>                                     SDTCisSameAs<0, 2>,
>>                                     SDTCisInt<3>,
>>                                     SDTCisVT<4, i32>]>;
>> +def SDT_AArch64CCMP : SDTypeProfile<1, 5,
>> +                                    [SDTCisVT<0, i32>,
>> +                                     SDTCisInt<1>,
>> +                                     SDTCisSameAs<1, 2>,
>> +                                     SDTCisInt<3>,
>> +                                     SDTCisInt<4>,
>> +                                     SDTCisVT<5, i32>]>; def
>> +SDT_AArch64FCCMP : SDTypeProfile<1, 5,
>> +                                     [SDTCisVT<0, i32>,
>> +                                      SDTCisFP<1>,
>> +                                      SDTCisSameAs<1, 2>,
>> +                                      SDTCisInt<3>,
>> +                                      SDTCisInt<4>,
>> +                                      SDTCisVT<5, i32>]>;
>> def SDT_AArch64FCmp   : SDTypeProfile<0, 2,
>>                                    [SDTCisFP<0>,
>>                                     SDTCisSameAs<0, 1>]>; @@ -160,6
> +174,10 @@ def
>> AArch64and_flag  : SDNode<"AArch64IS  def AArch64adc_flag  :
>> SDNode<"AArch64ISD::ADCS",  SDTBinaryArithWithFlagsInOut>;  def
>> AArch64sbc_flag  : SDNode<"AArch64ISD::SBCS",
>> SDTBinaryArithWithFlagsInOut>;
>> 
>> +def AArch64ccmp      : SDNode<"AArch64ISD::CCMP",  SDT_AArch64CCMP>;
>> +def AArch64ccmn      : SDNode<"AArch64ISD::CCMN",  SDT_AArch64CCMP>;
>> +def AArch64fccmp     : SDNode<"AArch64ISD::FCCMP",
>> SDT_AArch64FCCMP>;
>> +
>> def AArch64threadpointer : SDNode<"AArch64ISD::THREAD_POINTER",
>> SDTPtrLeaf>;
>> 
>> def AArch64fcmp      : SDNode<"AArch64ISD::FCMP", SDT_AArch64FCmp>;
>> @@ -1020,13 +1038,10 @@ def : InstAlias<"uxth $dst, $src", (UBFM  def :
>> InstAlias<"uxtw $dst, $src", (UBFMXri GPR64:$dst, GPR64:$src, 0, 31)>;
>> 
>> 
> //===----------------------------------------------------------------------=
> ==//
>> -// Conditionally set flags instructions.
>> +// Conditional comparison instructions.
>> 
> //===----------------------------------------------------------------------=
> ==//
>> -defm CCMN : CondSetFlagsImm<0, "ccmn">; -defm CCMP :
>> CondSetFlagsImm<1, "ccmp">;
>> -
>> -defm CCMN : CondSetFlagsReg<0, "ccmn">; -defm CCMP :
>> CondSetFlagsReg<1, "ccmp">;
>> +defm CCMN : CondComparison<0, "ccmn", AArch64ccmn>; defm CCMP :
>> +CondComparison<1, "ccmp", AArch64ccmp>;
>> 
>> 
> //===----------------------------------------------------------------------=
> ==//
>> // Conditional select instructions.
>> @@ -2556,7 +2571,7 @@ defm FCMP  : FPComparison<0, "fcmp", AAr  //===-
>> ---------------------------------------------------------------------===//
>> 
>> defm FCCMPE : FPCondComparison<1, "fccmpe">; -defm FCCMP  :
>> FPCondComparison<0, "fccmp">;
>> +defm FCCMP  : FPCondComparison<0, "fccmp", AArch64fccmp>;
>> 
>> 
> //===----------------------------------------------------------------------=
> ==//
>> // Floating point conditional select instruction.
>> 
>> Modified: llvm/trunk/test/CodeGen/AArch64/arm64-ccmp.ll
>> URL: http://llvm.org/viewvc/llvm-
>> project/llvm/trunk/test/CodeGen/AArch64/arm64-
>> ccmp.ll?rev=242436&r1=242435&r2=242436&view=diff
>> ==========================================================
>> ====================
>> --- llvm/trunk/test/CodeGen/AArch64/arm64-ccmp.ll (original)
>> +++ llvm/trunk/test/CodeGen/AArch64/arm64-ccmp.ll Thu Jul 16 15:02:37
>> +++ 2015
>> @@ -287,3 +287,99 @@ sw.bb.i.i:
>>   %code1.i.i.phi.trans.insert = getelementptr inbounds %str1, %str1* %0,
> i64
>> 0, i32 0, i32 0, i64 16
>>   br label %sw.bb.i.i
>> }
>> +
>> +; CHECK-LABEL: select_and
>> +define i64 @select_and(i32 %w0, i32 %w1, i64 %x2, i64 %x3) { ; CHECK:
>> +cmp w1, #5 ; CHECK-NEXT: ccmp w0, w1, #0, ne ; CHECK-NEXT: csel x0, x2,
>> +x3, lt ; CHECK-NEXT: ret
>> +  %1 = icmp slt i32 %w0, %w1
>> +  %2 = icmp ne i32 5, %w1
>> +  %3 = and i1 %1, %2
>> +  %sel = select i1 %3, i64 %x2, i64 %x3
>> +  ret i64 %sel
>> +}
>> +
>> +; CHECK-LABEL: select_or
>> +define i64 @select_or(i32 %w0, i32 %w1, i64 %x2, i64 %x3) { ; CHECK:
>> +cmp w1, #5 ; CHECK-NEXT: ccmp w0, w1, #8, eq ; CHECK-NEXT: csel x0, x2,
>> +x3, lt ; CHECK-NEXT: ret
>> +  %1 = icmp slt i32 %w0, %w1
>> +  %2 = icmp ne i32 5, %w1
>> +  %3 = or i1 %1, %2
>> +  %sel = select i1 %3, i64 %x2, i64 %x3
>> +  ret i64 %sel
>> +}
>> +
>> +; CHECK-LABEL: select_complicated
>> +define i16 @select_complicated(double %v1, double %v2, i16 %a, i16 %b)
>> +{ ; CHECK: ldr [[REG:d[0-9]+]], ; CHECK: fcmp d0, d2 ; CHECK-NEXT: fmov
>> +d2, #13.00000000 ; CHECK-NEXT: fccmp d1, d2, #4, ne ; CHECK-NEXT: fccmp
>> +d0, d1, #1, ne ; CHECK-NEXT: fccmp d0, d1, #4, vc ; CEHCK-NEXT: csel
>> +w0, w0, w1, eq
>> +  %1 = fcmp one double %v1, %v2
>> +  %2 = fcmp oeq double %v2, 13.0
>> +  %3 = fcmp oeq double %v1, 42.0
>> +  %or0 = or i1 %2, %3
>> +  %or1 = or i1 %1, %or0
>> +  %sel = select i1 %or1, i16 %a, i16 %b
>> +  ret i16 %sel
>> +}
>> +
>> +; CHECK-LABEL: gccbug
>> +define i64 @gccbug(i64 %x0, i64 %x1) {
>> +; CHECK: cmp x1, #0
>> +; CHECK-NEXT: ccmp x0, #2, #0, eq
>> +; CHECK-NEXT: ccmp x0, #4, #4, ne
>> +; CHECK-NEXT: orr w[[REGNUM:[0-9]+]], wzr, #0x1 ; CHECK-NEXT: cinc x0,
>> +x[[REGNUM]], eq ; CHECK-NEXT: ret
>> +  %cmp0 = icmp eq i64 %x1, 0
>> +  %cmp1 = icmp eq i64 %x0, 2
>> +  %cmp2 = icmp eq i64 %x0, 4
>> +
>> +  %or = or i1 %cmp2, %cmp1
>> +  %and = and i1 %or, %cmp0
>> +
>> +  %sel = select i1 %and, i64 2, i64 1
>> +  ret i64 %sel
>> +}
>> +
>> +; CHECK-LABEL: select_ororand
>> +define i32 @select_ororand(i32 %w0, i32 %w1, i32 %w2, i32 %w3) { ;
>> +CHECK: cmp w3, #4 ; CHECK-NEXT: ccmp w2, #2, #0, gt ; CHECK-NEXT: ccmp
>> +w1, #13, #2, ge ; CHECK-NEXT: ccmp w0, #0, #4, ls ; CHECK-NEXT: csel
>> +w0, w3, wzr, eq ; CHECK-NEXT: ret
>> +  %c0 = icmp eq i32 %w0, 0
>> +  %c1 = icmp ugt i32 %w1, 13
>> +  %c2 = icmp slt i32 %w2, 2
>> +  %c4 = icmp sgt i32 %w3, 4
>> +  %or = or i1 %c0, %c1
>> +  %and = and i1 %c2, %c4
>> +  %or1 = or i1 %or, %and
>> +  %sel = select i1 %or1, i32 %w3, i32 0
>> +  ret i32 %sel
>> +}
>> +
>> +; CHECK-LABEL: select_noccmp
>> +define i64 @select_noccmp(i64 %v1, i64 %v2, i64 %v3, i64 %r) { ;
>> +CHECK-NOT: CCMP
>> +  %c0 = icmp slt i64 %v1, 0
>> +  %c1 = icmp sgt i64 %v1, 13
>> +  %c2 = icmp slt i64 %v3, 2
>> +  %c4 = icmp sgt i64 %v3, 4
>> +  %and0 = and i1 %c0, %c1
>> +  %and1 = and i1 %c2, %c4
>> +  %or = or i1 %and0, %and1
>> +  %sel = select i1 %or, i64 0, i64 %r
>> +  ret i64 %sel
>> +}
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> <reg.ll><reg.r242236.s><reg.s>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits