[llvm-commits] Patch to implement UMLAL/SMLAL Instructions for the ARM Architecture

Thu Aug 2 14:57:55 PDT 2012

Patch to implement UMLAL/SMLAL Instructions for the ARM Architecture

This patch corrects the definition of umlal/smlal instructions and adds 
support for matching them to the ARM dag combiner.



This patch got lost on my end. Sorry.

Is it okay to commit?

Prior discussion:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120507/142351.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120416/141153.html

Bug:
http://llvm.org/bugs/show_bug.cgi?id=12213

> I understand Anton has not gotten back to you. That's fine, I don't think
> you need to wait for him. However, it looks like Arnold has some concerns
> about the patch? Please address his concerns first.
>
> Once you have addressed Arnold's concerns, he can commit it for you.
> Arnold has commit privilege.
>
> Evan
>
> On May 9, 2012, at 4:32 PM, Yin Ma wrote:
>
>> Hi Evan,
>>
>>      Sorry for bothering you. I would like to know if this patch is
>> likely to be reviewed
>> And merged into the main trunk eventually? It has been a while after I
>> submitted this
>> patch to the list. We hope this change could be merged into main trunk
>> so my team could
>> use it in the llvm official version. But we have waited for quite a
>> while. We like to know
>> if this is still possible. Please take a look and give us some advice.
>> Thank you.
>>
>> Sincerely,
>>
>>                          Yin
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Tuesday, April 17, 2012 10:36 PM
>> To: Yin Ma
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] LLVM patch to implement UMLAL/SMLAL
>> Instructions for ARM Architecture.
>>
>> Thanks. I think the patch is correct now. But I would very much
>> appreciate another pair of eyes on it.
>>
>> Evan
>>
>> On Apr 17, 2012, at 11:29 AM, Yin Ma <yinma at codeaurora.org> wrote:
>>
>> HI Evan,
>>
>>      I have updated the source based on your latest comments.
>> Thank you for reviewing.
>>
>>                             Yin
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Saturday, April 14, 2012 11:51 AM
>> To: Yin Ma
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] LLVM patch to implement UMLAL/SMLAL
>> Instructions for ARM Architecture.
>>
>> Sorry, I took a closer look and I find some issues.
>>
>>  // Multiply + accumulate
>> -def SMLAL : AsMul1I64<0b0000111, (outs GPR:$RdLo, GPR:$RdHi),
>> -                               (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
>> +def SMLAL : AsMla1I64<0b0000111, (outs GPR:$RdLo, GPR:$RdHi),
>> +                               (ins GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi), IIC_iMAC64,
>>                      "smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
>> -                    Requires<[IsARM, HasV6]>;
>> -def UMLAL : AsMul1I64<0b0000101, (outs GPR:$RdLo, GPR:$RdHi),
>> -                               (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
>> +                    RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">,
>> Requires<[IsARM, HasV6]>;
>> +def UMLAL : AsMla1I64<0b0000101, (outs GPR:$RdLo, GPR:$RdHi),
>> +                               (ins GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi), IIC_iMAC64,
>>                      "umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
>> -                    Requires<[IsARM, HasV6]>;
>> +                    RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">,
>> Requires<[IsARM, HasV6]>;
>>
>> -let Constraints = "@earlyclobber $RdLo, at earlyclobber $RdHi" in {
>> +let Constraints = "$RLo = $RdLo,$RHi = $RdHi" in {
>>  def SMLALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
>> -                              (ins GPR:$Rn, GPR:$Rm, pred:$p,
>> cc_out:$s),
>> +                              (ins GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi, pred:$p, cc_out:$s),
>>                                4, IIC_iMAC64, [],
>> -          (SMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, pred:$p,
>> cc_out:$s)>,
>> +          (SMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi, pred:$p, cc_out:$s)>,
>>                             Requires<[IsARM, NoV6]>;
>>  def UMLALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
>> -                              (ins GPR:$Rn, GPR:$Rm, pred:$p,
>> cc_out:$s),
>> -                              4, IIC_iMAC64, [],
>> -          (UMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, pred:$p,
>> cc_out:$s)>,
>> -                           Requires<[IsARM, NoV6]>;
>> +                               (ins GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi, pred:$p, cc_out:$s),
>> +                               4, IIC_iMAC64, [],
>> +          (UMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi, pred:$p, cc_out:$s)>,
>> +                            Requires<[IsARM, NoV6]>;
>> +}
>> +
>>
>> These are beyond 80 columns.
>>
>> +  // follow the glue value to get the second add
>> +  // don't know how much uses of the first add
>> +  // use while check
>>
>> This comment doesn't make sense. Please correct.
>>
>> Evan
>>
>> On Apr 12, 2012, at 11:15 AM, Evan Cheng wrote:
>>
>>
>>
>> LGTM. Can someone commit this for you?
>>
>> Evan
>>
>> On Apr 2, 2012, at 10:22 AM, Yin Ma wrote:
>>
>>
>>
>> Hi,
>>
>> I have updated the code based on your comments. I have put more comments
>> on the code, especially for some conditional statements I cannot remove.
>> Please
>> review again.
>>
>> Llmlal.diff is the code diff for the updated version
>> Reports.simple.txt and result.txt are the new test results
>>
>> Thanks,
>>
>>                            Yin
>>
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Thursday, March 15, 2012 11:07 AM
>> To: Yin Ma
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] LLVM patch to implement UMLAL/SMLAL
>> Instructions for ARM Architecture.
>>
>>
>> On Mar 15, 2012, at 10:17 AM, Yin Ma wrote:
>>
>>
>>
>>
>> Hi Evan,
>>
>>      Thank you for review this patch. I will rework on the code style
>> based
>> on your advice. Then please review it again.
>>
>> For the question #8
>> 8.
>> There are a lot of failures in report.simple.txt. Why is that?
>>
>> Those failures are not caused by my change. It is due to our current
>> test machine setup.
>> This patch didnt increase any number of failures. After reworking on
>> the style, I will
>> Look for a better machine to run unit test. The failure number will
>> change.
>>
>> Ok. Please try to get a clean run. Otherwise it's not particularly
>> useful to include as part of the patch review.
>>
>> Evan
>>
>>
>>
>>
>>
>> Thanks,
>>
>>                  Yin
>>
>>
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Thursday, March 15, 2012 12:01 AM
>> To: Yin Ma
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] LLVM patch to implement UMLAL/SMLAL
>> Instructions for ARM Architecture.
>>
>> Thanks. Preliminary reviews below.
>>
>> The first thing I noticed is some of your stylistic choices are
>> different from the rest of the code:
>>
>> if( nValue != 2 ) return SDValue();
>>
>> LLVM doesn't use camel case for variable name. Also, there should be a
>> space before '(', not after.
>>
>> +    }else{
>> Should be } else {
>>
>> Now onto the rest of the patch.
>>
>> 1.
>> +  case ARMISD::UMLAL:{
>> +    if (Subtarget->isThumb1Only())
>> +      break;
>>
>> The check should not be needed, right? ARMISD::UMLAL cannot be formed
>> for Thumb1.
>>
>> +  // For UMLAL/SMLAL
>> +  setTargetDAGCombine(ISD::ADDC);
>> +  setTargetDAGCombine(ISD::ADDE);
>> +
>>
>> Should these be guarded with !(Subtarget->isThumb1Only()?
>>
>> 2.
>> +  case ISD::ADDE:       return PerformADDECCombine(N, DCI, Subtarget);
>> +  case ISD::ADDC:       return PerformADDECCombine(N, DCI, Subtarget);
>>
>> That's silly. It should be:
>> +  case ISD::ADDE:
>> +  case ISD::ADDC:       return PerformADDECCombine(N, DCI, Subtarget);
>>
>> 3. Can you add more comments to AddCombineTo64bitMLAL()? Please describe
>> what the routine is trying to match.
>>
>> 4.
>> +  // The second use must be a glue to a add
>> +  int nValue = N->getNumValues();
>> +  if( nValue != 2 ) return SDValue();
>>
>> if (N->getNumValues() != 2) return SDValue();
>>
>> The check isn't necessary. ADDE always produce two values.
>>
>> 5.
>>
>> +  EVT GVT = N->getValueType(1);
>> +  if( VT != MVT::i32 || GVT != MVT::Glue ) return SDValue();
>>
>> VT must be MVT::i32 after legalization, right? Are there cases where GVT
>> might not be MVT::Glue?
>>
>> +
>> +  // look for glue value
>> +  SDNode::use_iterator UI = N->use_begin();
>> +  SDNode::use_iterator UE = N->use_end();
>> +  SDNode* HiAdd = NULL;
>> +  while( UI != UE ){
>> +    SDUse& Nuse = UI.getUse();
>> +    if( Nuse.getResNo() == 1 ){
>> +      HiAdd = Nuse.getUser();
>> +      break;
>> +    }
>> +    UI++;
>> +  }
>>
>> Is a loop necessary? A glue value can only have a single use.
>>
>> 6. The rest of the routine has more stylistic issues. Also it's hard to
>> understand without high level description.
>>
>> 7.
>> +/// PerformADDECCombine - Target-specific dag combine xforms for
>> ISD::ADDE & ISD::ADDC for UMLAL.
>> +///
>>
>> Please be aware of 80 col violation.
>>
>> +  SDValue Result = AddCombineTo64bitMLAL(N, DCI, Subtarget);
>> +  if (Result.getNode())
>> +      return Result;
>> +
>> +  // If that didn't work, try again with the operands commuted.
>> +  return SDValue();
>> +}
>>
>> Isn't this just?
>> return AddCombineTo64bitMLAL(N, DCI, Subtarget);
>>
>> This routine seems unnecessary.
>>
>> The comment indicates it should try again with operands commuted. That
>> should be done in AddCombineTo64bitMLAL, no?
>>
>> 8.
>> There are a lot of failures in report.simple.txt. Why is that?
>>
>> Evan
>>
>> On Mar 8, 2012, at 4:15 PM, Yin Ma <yinma at codeaurora.org> wrote:
>>
>>
>>
>>
>>
>> The current definition of UMLAL/SMLAL in LLVM for ARM is not used and
>> the
>> definition is not correct because the instruction reads the four values
>> as the input values instead of two values defined in the .td file.
>>
>> I have created a bugzilla entry  regarding to this issue:
>> http://llvm.org/bugs/show_bug.cgi?id=12213
>>
>> I am proposing a patch not only fixed the definition but also added the
>> corresponding
>> generation algorithm on DAG. This algorithm only operates on ARM
>> backend. It identifies the
>> opportunity of conversions during DAG process.
>>
>> llmla.diff is the code change
>> longMAC.ll is the test case for ARM
>> longMACt.ll is the test case for Thumb2
>> report.simple.txt is the result from test-suites
>> result.txt is the result from test
>>
>> Please give a review. Thanks,
>>
>>                                         Yin
>>
>>
>> <llmlal.diff><longMAC.ll><longMACt.ll><report.simple.txt><result.txt>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>> <llmlal.diff><longMAC.ll><longMACt.ll><report.simple.txt><result.txt>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>> <llmlal.diff>
>> <longMAC.ll>
>> <longMACt.ll>
>> <report.simple.txt>
>> <result.txt>
>
>


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
-------------- next part --------------
Index: lib/Target/ARM/ARMISelLowering.h
===================================================================

--- lib/Target/ARM/ARMISelLowering.h	(revision 161211)
+++ lib/Target/ARM/ARMISelLowering.h	(working copy)
@@ -176,6 +176,9 @@
       VMULLs,       // ...signed
       VMULLu,       // ...unsigned
 
+      UMLAL,        // 64bit Unsigned Accumulate Multiply
+      SMLAL,        // 64bit Signed Accumulate Multiply
+
       // Operands of the standard BUILD_VECTOR node are not legalized, which
       // is fine if BUILD_VECTORs are always lowered to shuffles or other
       // operations, but for ARM some BUILD_VECTORs are legal as-is and their
Index: lib/Target/ARM/ARMInstrInfo.td
===================================================================
--- lib/Target/ARM/ARMInstrInfo.td	(revision 161211)
+++ lib/Target/ARM/ARMInstrInfo.td	(working copy)
@@ -83,6 +83,13 @@
                                              SDTCisInt<0>,
                                              SDTCisVT<1, i32>,
                                              SDTCisVT<4, i32>]>;
+
+def SDT_ARM64bitmlal : SDTypeProfile<2,4, [ SDTCisVT<0, i32>, SDTCisVT<1, i32>,
+                                        SDTCisVT<2, i32>, SDTCisVT<3, i32>,
+                                        SDTCisVT<4, i32>, SDTCisVT<5, i32> ] >;
+def ARMUmlal         : SDNode<"ARMISD::UMLAL", SDT_ARM64bitmlal>;
+def ARMSmlal         : SDNode<"ARMISD::SMLAL", SDT_ARM64bitmlal>;
+
 // Node definitions.
 def ARMWrapper       : SDNode<"ARMISD::Wrapper",     SDTIntUnaryOp>;
 def ARMWrapperDYN    : SDNode<"ARMISD::WrapperDYN",  SDTIntUnaryOp>;
@@ -3400,6 +3407,20 @@
   let Inst{11-8}  = Rm;
   let Inst{3-0}   = Rn;
 }
+class AsMla1I64<bits<7> opcod, dag oops, dag iops, InstrItinClass itin,
+             string opc, string asm, list<dag> pattern>
+  : AsMul1I<opcod, oops, iops, itin, opc, asm, pattern> {
+  bits<4> RdLo;
+  bits<4> RdHi;
+  bits<4> Rm;
+  bits<4> Rn;
+  bits<4> RLo;
+  bits<4> RHi;
+  let Inst{19-16} = RdHi;
+  let Inst{15-12} = RdLo;
+  let Inst{11-8}  = Rm;
+  let Inst{3-0}   = Rn;
+}
 
 // FIXME: The v5 pseudos are only necessary for the additional Constraint
 //        property. Remove them when it's possible to add those properties
@@ -3482,14 +3503,14 @@
 }
 
 // Multiply + accumulate
-def SMLAL : AsMul1I64<0b0000111, (outs GPR:$RdLo, GPR:$RdHi),
-                               (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
+def SMLAL : AsMla1I64<0b0000111, (outs GPR:$RdLo, GPR:$RdHi),
+                        (ins GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi), IIC_iMAC64,
                     "smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
-                    Requires<[IsARM, HasV6]>;
-def UMLAL : AsMul1I64<0b0000101, (outs GPR:$RdLo, GPR:$RdHi),
-                               (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
+         RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">, Requires<[IsARM, HasV6]>;
+def UMLAL : AsMla1I64<0b0000101, (outs GPR:$RdLo, GPR:$RdHi),
+                        (ins GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi), IIC_iMAC64,
                     "umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
-                    Requires<[IsARM, HasV6]>;
+         RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">, Requires<[IsARM, HasV6]>;
 
 def UMAAL : AMul1I <0b0000010, (outs GPR:$RdLo, GPR:$RdHi),
                                (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
@@ -3505,17 +3526,22 @@
   let Inst{3-0}   = Rn;
 }
 
-let Constraints = "@earlyclobber $RdLo, at earlyclobber $RdHi" in {
+let Constraints = "$RLo = $RdLo,$RHi = $RdHi" in {
 def SMLALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
-                              (ins GPR:$Rn, GPR:$Rm, pred:$p, cc_out:$s),
+                (ins GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi, pred:$p, cc_out:$s),
                               4, IIC_iMAC64, [],
-          (SMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, pred:$p, cc_out:$s)>,
+             (SMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi,
+                           pred:$p, cc_out:$s)>,
                            Requires<[IsARM, NoV6]>;
 def UMLALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
-                              (ins GPR:$Rn, GPR:$Rm, pred:$p, cc_out:$s),
+                (ins GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi, pred:$p, cc_out:$s),
                               4, IIC_iMAC64, [],
-          (UMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, pred:$p, cc_out:$s)>,
+             (UMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi,
+                           pred:$p, cc_out:$s)>,
                            Requires<[IsARM, NoV6]>;
+}
+
+let Constraints = "@earlyclobber $RdLo, at earlyclobber $RdHi" in {
 def UMAALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
                               (ins GPR:$Rn, GPR:$Rm, pred:$p),
                               4, IIC_iMAC64, [],
Index: lib/Target/ARM/ARMISelLowering.cpp
===================================================================
--- lib/Target/ARM/ARMISelLowering.cpp	(revision 161211)
+++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
@@ -571,6 +571,11 @@
     }
   }
 
+  // For UMLAL/SMLAL
+  if (!Subtarget->isThumb1Only()) {
+    setTargetDAGCombine(ISD::ADDC);
+  }
+
   computeRegisterProperties();
 
   // ARM does not have f32 extending load.
@@ -989,6 +994,8 @@
   case ARMISD::VTBL2:         return "ARMISD::VTBL2";
   case ARMISD::VMULLs:        return "ARMISD::VMULLs";
   case ARMISD::VMULLu:        return "ARMISD::VMULLu";
+  case ARMISD::UMLAL:         return "ARMISD::UMLAL";
+  case ARMISD::SMLAL:         return "ARMISD::SMLAL";
   case ARMISD::BUILD_VECTOR:  return "ARMISD::BUILD_VECTOR";
   case ARMISD::FMAX:          return "ARMISD::FMAX";
   case ARMISD::FMIN:          return "ARMISD::FMIN";
@@ -7124,6 +7131,154 @@
   return DAG.getNode(ISD::TRUNCATE, N->getDebugLoc(), VT, tmp);
 }
 
+static SDValue findMUL_LOHI(SDValue V) {
+  if (V->getOpcode() == ISD::UMUL_LOHI ||
+      V->getOpcode() == ISD::SMUL_LOHI)
+    return V;
+  return SDValue();
+}
+
+static SDValue AddCombineTo64bitMLAL(SDNode *AddcNode,
+                                     TargetLowering::DAGCombinerInfo &DCI,
+                                     const ARMSubtarget *Subtarget) {
+
+  if (Subtarget->isThumb1Only()) return SDValue();
+
+  // Only perform the checks after legalize when the pattern is available.
+  if (DCI.isBeforeLegalize()) return SDValue();
+
+  // Look for multiply add opportunities.
+  // The pattern is a ISD::UMUL_LOHI followed by two add nodes, where
+  // each add nodes consumes a value from ISD::UMUL_LOHI and there is
+  // a glue link from the first add to the second add.
+  // If we find this pattern, we can replace the U/SMUL_LOHI, ADDC, and ADDE by
+  // a S/UMLAL instruction.
+  //          loAdd   UMUL_LOHI
+  //            \    / :lo    \ :hi
+  //             \  /          \          [no multiline comment]
+  //              ADDC         |  hiAdd
+  //                 \ :glue  /  /
+  //                  \      /  /
+  //                    ADDE
+  //
+  assert(AddcNode->getOpcode() == ISD::ADDC && "Expect an ADDC");
+  SDValue AddcOp0 = AddcNode->getOperand(0);
+  SDValue AddcOp1 = AddcNode->getOperand(1);
+
+  // Check if the two operands are from the same mul_lohi node.
+  if (AddcOp0.getNode() == AddcOp1.getNode())
+    return SDValue();
+
+  assert(AddcNode->getNumValues() == 2 &&
+         AddcNode->getValueType(0) == MVT::i32 &&
+         AddcNode->getValueType(1) == MVT::Glue &&
+         "Expect ADDC with two result values: i32, glue");
+
+  // Check that the ADDC adds the low result of the S/UMUL_LOHI.
+  if (AddcOp0->getOpcode() != ISD::UMUL_LOHI &&
+      AddcOp0->getOpcode() != ISD::SMUL_LOHI &&
+      AddcOp1->getOpcode() != ISD::UMUL_LOHI &&
+      AddcOp1->getOpcode() != ISD::SMUL_LOHI)
+    return SDValue();
+
+  // Look for the glued ADDE.
+  SDNode* AddeNode = AddcNode->getGluedUser();
+  if (AddeNode == NULL)
+    return SDValue();
+
+  // Make sure it is really an ADDE.
+  if (AddeNode->getOpcode() != ISD::ADDE)
+    return SDValue();
+
+  assert(AddeNode->getNumOperands() == 3 &&
+         AddeNode->getOperand(2).getValueType() == MVT::Glue &&
+         "ADDE node has the wrong inputs");
+
+  // Check for the triangle shape.
+  SDValue AddeOp0 = AddeNode->getOperand(0);
+  SDValue AddeOp1 = AddeNode->getOperand(1);
+
+  // Make sure that the ADDE operands are not coming from the same node.
+  if (AddeOp0.getNode() == AddeOp1.getNode())
+    return SDValue();
+
+  // Find the MUL_LOHI node walking up ADDE's operands.
+  bool IsLeftOperandMUL = false;
+  SDValue MULOp = findMUL_LOHI(AddeOp0);
+  if (MULOp == SDValue())
+   MULOp = findMUL_LOHI(AddeOp1);
+  else
+    IsLeftOperandMUL = true;
+  if (MULOp == SDValue())
+     return SDValue();
+
+  // Figure out the right opcode.
+  unsigned Opc = MULOp->getOpcode();
+  unsigned FinalOpc = (Opc == ISD::SMUL_LOHI) ? ARMISD::SMLAL : ARMISD::UMLAL;
+
+  // Figure out the high and low input values to the MLAL node.
+  SDValue* HiMul = &MULOp;
+  SDValue* HiAdd = NULL;
+  SDValue* LoMul = NULL;
+  SDValue* LowAdd = NULL;
+
+  if (IsLeftOperandMUL)
+    HiAdd = &AddeOp1;
+  else
+    HiAdd = &AddeOp0;
+
+
+  if (AddcOp0->getOpcode() == Opc) {
+    LoMul = &AddcOp0;
+    LowAdd = &AddcOp1;
+  }
+  if (AddcOp1->getOpcode() == Opc) {
+    LoMul = &AddcOp1;
+    LowAdd = &AddcOp0;
+  }
+
+  if (LoMul == NULL)
+    return SDValue();
+
+  if (LoMul->getNode() != HiMul->getNode())
+    return SDValue();
+
+  // Create the merged node.
+  SelectionDAG &DAG = DCI.DAG;
+
+  // Build operand list.
+  SmallVector<SDValue, 8> Ops;
+  Ops.push_back(LoMul->getOperand(0));
+  Ops.push_back(LoMul->getOperand(1));
+  Ops.push_back(*LowAdd);
+  Ops.push_back(*HiAdd);
+
+  SDValue MLALNode =  DAG.getNode(FinalOpc, AddcNode->getDebugLoc(),
+                                 DAG.getVTList(MVT::i32, MVT::i32),
+                                 &Ops[0], Ops.size());
+
+  // Replace the ADDs' nodes uses by the MLA node's values.
+  SDValue HiMLALResult(MLALNode.getNode(), 1);
+  DAG.ReplaceAllUsesOfValueWith(SDValue(AddeNode, 0), HiMLALResult);
+
+  SDValue LoMLALResult(MLALNode.getNode(), 0);
+  DAG.ReplaceAllUsesOfValueWith(SDValue(AddcNode, 0), LoMLALResult);
+
+  // Return original node to notify the driver to stop replacing.
+  SDValue resNode(AddcNode, 0);
+  return resNode;
+}
+
+/// PerformADDCCombine - Target-specific dag combine transform from
+/// ISD::ADDC, ISD::ADDE, and ISD::MUL_LOHI to MLAL.
+static SDValue PerformADDCCombine(SDNode *N,
+                                 TargetLowering::DAGCombinerInfo &DCI,
+                                 const ARMSubtarget *Subtarget) {
+
+  return AddCombineTo64bitMLAL(N, DCI, Subtarget);
+
+}
+
 /// PerformADDCombineWithOperands - Try DAG combinations for an ADD with
 /// operands N0 and N1.  This is a helper for PerformADDCombine that is
 /// called with the default operands, and if that fails, with commuted
@@ -8735,6 +8890,7 @@
                                              DAGCombinerInfo &DCI) const {
   switch (N->getOpcode()) {
   default: break;
+  case ISD::ADDC:       return PerformADDCCombine(N, DCI, Subtarget);
   case ISD::ADD:        return PerformADDCombine(N, DCI, Subtarget);
   case ISD::SUB:        return PerformSUBCombine(N, DCI);
   case ISD::MUL:        return PerformMULCombine(N, DCI, Subtarget);
Index: lib/Target/ARM/ARMISelDAGToDAG.cpp
===================================================================
--- lib/Target/ARM/ARMISelDAGToDAG.cpp	(revision 161211)
+++ lib/Target/ARM/ARMISelDAGToDAG.cpp	(working copy)
@@ -2747,6 +2747,38 @@
                                     dl, MVT::i32, MVT::i32, Ops, 5);
     }
   }
+  case ARMISD::UMLAL:{
+    if (Subtarget->isThumb()) {
+      SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
+                        N->getOperand(3), getAL(CurDAG),
+                        CurDAG->getRegister(0, MVT::i32)};
+      return CurDAG->getMachineNode(ARM::t2UMLAL, dl, MVT::i32, MVT::i32, Ops, 6);
+    }else{
+      SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
+                        N->getOperand(3), getAL(CurDAG),
+                        CurDAG->getRegister(0, MVT::i32),
+                        CurDAG->getRegister(0, MVT::i32) };
+      return CurDAG->getMachineNode(Subtarget->hasV6Ops() ?
+                                      ARM::UMLAL : ARM::UMLALv5,
+                                      dl, MVT::i32, MVT::i32, Ops, 7);
+    }
+  }
+  case ARMISD::SMLAL:{
+    if (Subtarget->isThumb()) {
+      SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
+                        N->getOperand(3), getAL(CurDAG),
+                        CurDAG->getRegister(0, MVT::i32)};
+      return CurDAG->getMachineNode(ARM::t2SMLAL, dl, MVT::i32, MVT::i32, Ops, 6);
+    }else{
+      SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
+                        N->getOperand(3), getAL(CurDAG),
+                        CurDAG->getRegister(0, MVT::i32),
+                        CurDAG->getRegister(0, MVT::i32) };
+      return CurDAG->getMachineNode(Subtarget->hasV6Ops() ?
+                                      ARM::SMLAL : ARM::SMLALv5,
+                                      dl, MVT::i32, MVT::i32, Ops, 7);
+    }
+  }
   case ISD::LOAD: {
     SDNode *ResNode = 0;
     if (Subtarget->isThumb() && Subtarget->hasThumb2())
Index: lib/Target/ARM/ARMInstrThumb2.td
===================================================================
--- lib/Target/ARM/ARMInstrThumb2.td	(revision 161211)
+++ lib/Target/ARM/ARMInstrThumb2.td	(working copy)
@@ -523,8 +523,27 @@
   let Inst{7-4}   = opc7_4;
   let Inst{3-0}   = Rm;
 }
+class T2MlaLong<bits<3> opc22_20, bits<4> opc7_4,
+                dag oops, dag iops, InstrItinClass itin,
+                string opc, string asm, list<dag> pattern>
+  : T2I<oops, iops, itin, opc, asm, pattern> {
+  bits<4> RdLo;
+  bits<4> RdHi;
+  bits<4> Rn;
+  bits<4> Rm;
+  bits<4> RLo;
+  bits<4> RHi;
 
+  let Inst{31-23} = 0b111110111;
+  let Inst{22-20} = opc22_20;
+  let Inst{19-16} = Rn;
+  let Inst{15-12} = RdLo;
+  let Inst{11-8}  = RdHi;
+  let Inst{7-4}   = opc7_4;
+  let Inst{3-0}   = Rm;
+}
 
+
 /// T2I_bin_irs - Defines a set of (op reg, {so_imm|r|so_reg}) patterns for a
 /// binary operation that produces a value. These are predicable and can be
 /// changed to modify CPSR.
@@ -2428,15 +2447,17 @@
 } // isCommutable
 
 // Multiply + accumulate
-def t2SMLAL : T2MulLong<0b100, 0b0000,
-                  (outs rGPR:$RdLo, rGPR:$RdHi),
+def t2SMLAL : T2MlaLong<0b100, 0b0000,
+                  (outs rGPR:$RdLo, rGPR:$RdHi, rGPR:$RLo, rGPR:$RHi),
                   (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC64,
-                  "smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>;
+                  "smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
+                  RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">;
 
-def t2UMLAL : T2MulLong<0b110, 0b0000,
-                  (outs rGPR:$RdLo, rGPR:$RdHi),
+def t2UMLAL : T2MlaLong<0b110, 0b0000,
+                  (outs rGPR:$RdLo, rGPR:$RdHi, rGPR:$RLo, rGPR:$RHi),
                   (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC64,
-                  "umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>;
+                  "umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
+                  RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">;
 
 def t2UMAAL : T2MulLong<0b110, 0b0110,
                   (outs rGPR:$RdLo, rGPR:$RdHi),