[llvm-commits] Patch to implement UMLAL/SMLAL Instructions for the ARM Architecture
Arnold Schwaighofer
arnolds at codeaurora.org
Thu Aug 2 14:57:55 PDT 2012
Patch to implement UMLAL/SMLAL Instructions for the ARM Architecture
This patch corrects the definition of umlal/smlal instructions and adds
support for matching them to the ARM dag combiner.
This patch got lost on my end. Sorry.
Is it okay to commit?
Prior discussion:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120507/142351.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120416/141153.html
Bug:
http://llvm.org/bugs/show_bug.cgi?id=12213
> I understand Anton has not gotten back to you. That's fine, I don't think
> you need to wait for him. However, it looks like Arnold has some concerns
> about the patch? Please address his concerns first.
>
> Once you have addressed Arnold's concerns, he can commit it for you.
> Arnold has commit privilege.
>
> Evan
>
> On May 9, 2012, at 4:32 PM, Yin Ma wrote:
>
>> Hi Evan,
>>
>> Sorry for bothering you. I would like to know if this patch is
>> likely to be reviewed
>> And merged into the main trunk eventually? It has been a while after I
>> submitted this
>> patch to the list. We hope this change could be merged into main trunk
>> so my team could
>> use it in the llvm official version. But we have waited for quite a
>> while. We like to know
>> if this is still possible. Please take a look and give us some advice.
>> Thank you.
>>
>> Sincerely,
>>
>> Yin
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Tuesday, April 17, 2012 10:36 PM
>> To: Yin Ma
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] LLVM patch to implement UMLAL/SMLAL
>> Instructions for ARM Architecture.
>>
>> Thanks. I think the patch is correct now. But I would very much
>> appreciate another pair of eyes on it.
>>
>> Evan
>>
>> On Apr 17, 2012, at 11:29 AM, Yin Ma <yinma at codeaurora.org> wrote:
>>
>> HI Evan,
>>
>> I have updated the source based on your latest comments.
>> Thank you for reviewing.
>>
>> Yin
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Saturday, April 14, 2012 11:51 AM
>> To: Yin Ma
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] LLVM patch to implement UMLAL/SMLAL
>> Instructions for ARM Architecture.
>>
>> Sorry, I took a closer look and I find some issues.
>>
>> // Multiply + accumulate
>> -def SMLAL : AsMul1I64<0b0000111, (outs GPR:$RdLo, GPR:$RdHi),
>> - (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
>> +def SMLAL : AsMla1I64<0b0000111, (outs GPR:$RdLo, GPR:$RdHi),
>> + (ins GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi), IIC_iMAC64,
>> "smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
>> - Requires<[IsARM, HasV6]>;
>> -def UMLAL : AsMul1I64<0b0000101, (outs GPR:$RdLo, GPR:$RdHi),
>> - (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
>> + RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">,
>> Requires<[IsARM, HasV6]>;
>> +def UMLAL : AsMla1I64<0b0000101, (outs GPR:$RdLo, GPR:$RdHi),
>> + (ins GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi), IIC_iMAC64,
>> "umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
>> - Requires<[IsARM, HasV6]>;
>> + RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">,
>> Requires<[IsARM, HasV6]>;
>>
>> -let Constraints = "@earlyclobber $RdLo, at earlyclobber $RdHi" in {
>> +let Constraints = "$RLo = $RdLo,$RHi = $RdHi" in {
>> def SMLALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
>> - (ins GPR:$Rn, GPR:$Rm, pred:$p,
>> cc_out:$s),
>> + (ins GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi, pred:$p, cc_out:$s),
>> 4, IIC_iMAC64, [],
>> - (SMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, pred:$p,
>> cc_out:$s)>,
>> + (SMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi, pred:$p, cc_out:$s)>,
>> Requires<[IsARM, NoV6]>;
>> def UMLALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
>> - (ins GPR:$Rn, GPR:$Rm, pred:$p,
>> cc_out:$s),
>> - 4, IIC_iMAC64, [],
>> - (UMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, pred:$p,
>> cc_out:$s)>,
>> - Requires<[IsARM, NoV6]>;
>> + (ins GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi, pred:$p, cc_out:$s),
>> + 4, IIC_iMAC64, [],
>> + (UMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, GPR:$RLo,
>> GPR:$RHi, pred:$p, cc_out:$s)>,
>> + Requires<[IsARM, NoV6]>;
>> +}
>> +
>>
>> These are beyond 80 columns.
>>
>> + // follow the glue value to get the second add
>> + // don't know how much uses of the first add
>> + // use while check
>>
>> This comment doesn't make sense. Please correct.
>>
>> Evan
>>
>> On Apr 12, 2012, at 11:15 AM, Evan Cheng wrote:
>>
>>
>>
>> LGTM. Can someone commit this for you?
>>
>> Evan
>>
>> On Apr 2, 2012, at 10:22 AM, Yin Ma wrote:
>>
>>
>>
>> Hi,
>>
>> I have updated the code based on your comments. I have put more comments
>> on the code, especially for some conditional statements I cannot remove.
>> Please
>> review again.
>>
>> Llmlal.diff is the code diff for the updated version
>> Reports.simple.txt and result.txt are the new test results
>>
>> Thanks,
>>
>> Yin
>>
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Thursday, March 15, 2012 11:07 AM
>> To: Yin Ma
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] LLVM patch to implement UMLAL/SMLAL
>> Instructions for ARM Architecture.
>>
>>
>> On Mar 15, 2012, at 10:17 AM, Yin Ma wrote:
>>
>>
>>
>>
>> Hi Evan,
>>
>> Thank you for review this patch. I will rework on the code style
>> based
>> on your advice. Then please review it again.
>>
>> For the question #8
>> 8.
>> There are a lot of failures in report.simple.txt. Why is that?
>>
>> Those failures are not caused by my change. It is due to our current
>> test machine setup.
>> This patch didnt increase any number of failures. After reworking on
>> the style, I will
>> Look for a better machine to run unit test. The failure number will
>> change.
>>
>> Ok. Please try to get a clean run. Otherwise it's not particularly
>> useful to include as part of the patch review.
>>
>> Evan
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Yin
>>
>>
>>
>> From: Evan Cheng [mailto:evan.cheng at apple.com]
>> Sent: Thursday, March 15, 2012 12:01 AM
>> To: Yin Ma
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] LLVM patch to implement UMLAL/SMLAL
>> Instructions for ARM Architecture.
>>
>> Thanks. Preliminary reviews below.
>>
>> The first thing I noticed is some of your stylistic choices are
>> different from the rest of the code:
>>
>> if( nValue != 2 ) return SDValue();
>>
>> LLVM doesn't use camel case for variable name. Also, there should be a
>> space before '(', not after.
>>
>> + }else{
>> Should be } else {
>>
>> Now onto the rest of the patch.
>>
>> 1.
>> + case ARMISD::UMLAL:{
>> + if (Subtarget->isThumb1Only())
>> + break;
>>
>> The check should not be needed, right? ARMISD::UMLAL cannot be formed
>> for Thumb1.
>>
>> + // For UMLAL/SMLAL
>> + setTargetDAGCombine(ISD::ADDC);
>> + setTargetDAGCombine(ISD::ADDE);
>> +
>>
>> Should these be guarded with !(Subtarget->isThumb1Only()?
>>
>> 2.
>> + case ISD::ADDE: return PerformADDECCombine(N, DCI, Subtarget);
>> + case ISD::ADDC: return PerformADDECCombine(N, DCI, Subtarget);
>>
>> That's silly. It should be:
>> + case ISD::ADDE:
>> + case ISD::ADDC: return PerformADDECCombine(N, DCI, Subtarget);
>>
>> 3. Can you add more comments to AddCombineTo64bitMLAL()? Please describe
>> what the routine is trying to match.
>>
>> 4.
>> + // The second use must be a glue to a add
>> + int nValue = N->getNumValues();
>> + if( nValue != 2 ) return SDValue();
>>
>> if (N->getNumValues() != 2) return SDValue();
>>
>> The check isn't necessary. ADDE always produce two values.
>>
>> 5.
>>
>> + EVT GVT = N->getValueType(1);
>> + if( VT != MVT::i32 || GVT != MVT::Glue ) return SDValue();
>>
>> VT must be MVT::i32 after legalization, right? Are there cases where GVT
>> might not be MVT::Glue?
>>
>> +
>> + // look for glue value
>> + SDNode::use_iterator UI = N->use_begin();
>> + SDNode::use_iterator UE = N->use_end();
>> + SDNode* HiAdd = NULL;
>> + while( UI != UE ){
>> + SDUse& Nuse = UI.getUse();
>> + if( Nuse.getResNo() == 1 ){
>> + HiAdd = Nuse.getUser();
>> + break;
>> + }
>> + UI++;
>> + }
>>
>> Is a loop necessary? A glue value can only have a single use.
>>
>> 6. The rest of the routine has more stylistic issues. Also it's hard to
>> understand without high level description.
>>
>> 7.
>> +/// PerformADDECCombine - Target-specific dag combine xforms for
>> ISD::ADDE & ISD::ADDC for UMLAL.
>> +///
>>
>> Please be aware of 80 col violation.
>>
>> + SDValue Result = AddCombineTo64bitMLAL(N, DCI, Subtarget);
>> + if (Result.getNode())
>> + return Result;
>> +
>> + // If that didn't work, try again with the operands commuted.
>> + return SDValue();
>> +}
>>
>> Isn't this just?
>> return AddCombineTo64bitMLAL(N, DCI, Subtarget);
>>
>> This routine seems unnecessary.
>>
>> The comment indicates it should try again with operands commuted. That
>> should be done in AddCombineTo64bitMLAL, no?
>>
>> 8.
>> There are a lot of failures in report.simple.txt. Why is that?
>>
>> Evan
>>
>> On Mar 8, 2012, at 4:15 PM, Yin Ma <yinma at codeaurora.org> wrote:
>>
>>
>>
>>
>>
>> The current definition of UMLAL/SMLAL in LLVM for ARM is not used and
>> the
>> definition is not correct because the instruction reads the four values
>> as the input values instead of two values defined in the .td file.
>>
>> I have created a bugzilla entry regarding to this issue:
>> http://llvm.org/bugs/show_bug.cgi?id=12213
>>
>> I am proposing a patch not only fixed the definition but also added the
>> corresponding
>> generation algorithm on DAG. This algorithm only operates on ARM
>> backend. It identifies the
>> opportunity of conversions during DAG process.
>>
>> llmla.diff is the code change
>> longMAC.ll is the test case for ARM
>> longMACt.ll is the test case for Thumb2
>> report.simple.txt is the result from test-suites
>> result.txt is the result from test
>>
>> Please give a review. Thanks,
>>
>> Yin
>>
>>
>> <llmlal.diff><longMAC.ll><longMACt.ll><report.simple.txt><result.txt>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>> <llmlal.diff><longMAC.ll><longMACt.ll><report.simple.txt><result.txt>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>> <llmlal.diff>
>> <longMAC.ll>
>> <longMACt.ll>
>> <report.simple.txt>
>> <result.txt>
>
>
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
-------------- next part --------------
Index: lib/Target/ARM/ARMISelLowering.h
===================================================================
--- lib/Target/ARM/ARMISelLowering.h (revision 161211)
+++ lib/Target/ARM/ARMISelLowering.h (working copy)
@@ -176,6 +176,9 @@
VMULLs, // ...signed
VMULLu, // ...unsigned
+ UMLAL, // 64bit Unsigned Accumulate Multiply
+ SMLAL, // 64bit Signed Accumulate Multiply
+
// Operands of the standard BUILD_VECTOR node are not legalized, which
// is fine if BUILD_VECTORs are always lowered to shuffles or other
// operations, but for ARM some BUILD_VECTORs are legal as-is and their
Index: lib/Target/ARM/ARMInstrInfo.td
===================================================================
--- lib/Target/ARM/ARMInstrInfo.td (revision 161211)
+++ lib/Target/ARM/ARMInstrInfo.td (working copy)
@@ -83,6 +83,13 @@
SDTCisInt<0>,
SDTCisVT<1, i32>,
SDTCisVT<4, i32>]>;
+
+def SDT_ARM64bitmlal : SDTypeProfile<2,4, [ SDTCisVT<0, i32>, SDTCisVT<1, i32>,
+ SDTCisVT<2, i32>, SDTCisVT<3, i32>,
+ SDTCisVT<4, i32>, SDTCisVT<5, i32> ] >;
+def ARMUmlal : SDNode<"ARMISD::UMLAL", SDT_ARM64bitmlal>;
+def ARMSmlal : SDNode<"ARMISD::SMLAL", SDT_ARM64bitmlal>;
+
// Node definitions.
def ARMWrapper : SDNode<"ARMISD::Wrapper", SDTIntUnaryOp>;
def ARMWrapperDYN : SDNode<"ARMISD::WrapperDYN", SDTIntUnaryOp>;
@@ -3400,6 +3407,20 @@
let Inst{11-8} = Rm;
let Inst{3-0} = Rn;
}
+class AsMla1I64<bits<7> opcod, dag oops, dag iops, InstrItinClass itin,
+ string opc, string asm, list<dag> pattern>
+ : AsMul1I<opcod, oops, iops, itin, opc, asm, pattern> {
+ bits<4> RdLo;
+ bits<4> RdHi;
+ bits<4> Rm;
+ bits<4> Rn;
+ bits<4> RLo;
+ bits<4> RHi;
+ let Inst{19-16} = RdHi;
+ let Inst{15-12} = RdLo;
+ let Inst{11-8} = Rm;
+ let Inst{3-0} = Rn;
+}
// FIXME: The v5 pseudos are only necessary for the additional Constraint
// property. Remove them when it's possible to add those properties
@@ -3482,14 +3503,14 @@
}
// Multiply + accumulate
-def SMLAL : AsMul1I64<0b0000111, (outs GPR:$RdLo, GPR:$RdHi),
- (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
+def SMLAL : AsMla1I64<0b0000111, (outs GPR:$RdLo, GPR:$RdHi),
+ (ins GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi), IIC_iMAC64,
"smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
- Requires<[IsARM, HasV6]>;
-def UMLAL : AsMul1I64<0b0000101, (outs GPR:$RdLo, GPR:$RdHi),
- (ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
+ RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">, Requires<[IsARM, HasV6]>;
+def UMLAL : AsMla1I64<0b0000101, (outs GPR:$RdLo, GPR:$RdHi),
+ (ins GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi), IIC_iMAC64,
"umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
- Requires<[IsARM, HasV6]>;
+ RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">, Requires<[IsARM, HasV6]>;
def UMAAL : AMul1I <0b0000010, (outs GPR:$RdLo, GPR:$RdHi),
(ins GPR:$Rn, GPR:$Rm), IIC_iMAC64,
@@ -3505,17 +3526,22 @@
let Inst{3-0} = Rn;
}
-let Constraints = "@earlyclobber $RdLo, at earlyclobber $RdHi" in {
+let Constraints = "$RLo = $RdLo,$RHi = $RdHi" in {
def SMLALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
- (ins GPR:$Rn, GPR:$Rm, pred:$p, cc_out:$s),
+ (ins GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi, pred:$p, cc_out:$s),
4, IIC_iMAC64, [],
- (SMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, pred:$p, cc_out:$s)>,
+ (SMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi,
+ pred:$p, cc_out:$s)>,
Requires<[IsARM, NoV6]>;
def UMLALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
- (ins GPR:$Rn, GPR:$Rm, pred:$p, cc_out:$s),
+ (ins GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi, pred:$p, cc_out:$s),
4, IIC_iMAC64, [],
- (UMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, pred:$p, cc_out:$s)>,
+ (UMLAL GPR:$RdLo, GPR:$RdHi, GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi,
+ pred:$p, cc_out:$s)>,
Requires<[IsARM, NoV6]>;
+}
+
+let Constraints = "@earlyclobber $RdLo, at earlyclobber $RdHi" in {
def UMAALv5 : ARMPseudoExpand<(outs GPR:$RdLo, GPR:$RdHi),
(ins GPR:$Rn, GPR:$Rm, pred:$p),
4, IIC_iMAC64, [],
Index: lib/Target/ARM/ARMISelLowering.cpp
===================================================================
--- lib/Target/ARM/ARMISelLowering.cpp (revision 161211)
+++ lib/Target/ARM/ARMISelLowering.cpp (working copy)
@@ -571,6 +571,11 @@
}
}
+ // For UMLAL/SMLAL
+ if (!Subtarget->isThumb1Only()) {
+ setTargetDAGCombine(ISD::ADDC);
+ }
+
computeRegisterProperties();
// ARM does not have f32 extending load.
@@ -989,6 +994,8 @@
case ARMISD::VTBL2: return "ARMISD::VTBL2";
case ARMISD::VMULLs: return "ARMISD::VMULLs";
case ARMISD::VMULLu: return "ARMISD::VMULLu";
+ case ARMISD::UMLAL: return "ARMISD::UMLAL";
+ case ARMISD::SMLAL: return "ARMISD::SMLAL";
case ARMISD::BUILD_VECTOR: return "ARMISD::BUILD_VECTOR";
case ARMISD::FMAX: return "ARMISD::FMAX";
case ARMISD::FMIN: return "ARMISD::FMIN";
@@ -7124,6 +7131,154 @@
return DAG.getNode(ISD::TRUNCATE, N->getDebugLoc(), VT, tmp);
}
+static SDValue findMUL_LOHI(SDValue V) {
+ if (V->getOpcode() == ISD::UMUL_LOHI ||
+ V->getOpcode() == ISD::SMUL_LOHI)
+ return V;
+ return SDValue();
+}
+
+static SDValue AddCombineTo64bitMLAL(SDNode *AddcNode,
+ TargetLowering::DAGCombinerInfo &DCI,
+ const ARMSubtarget *Subtarget) {
+
+ if (Subtarget->isThumb1Only()) return SDValue();
+
+ // Only perform the checks after legalize when the pattern is available.
+ if (DCI.isBeforeLegalize()) return SDValue();
+
+ // Look for multiply add opportunities.
+ // The pattern is a ISD::UMUL_LOHI followed by two add nodes, where
+ // each add nodes consumes a value from ISD::UMUL_LOHI and there is
+ // a glue link from the first add to the second add.
+ // If we find this pattern, we can replace the U/SMUL_LOHI, ADDC, and ADDE by
+ // a S/UMLAL instruction.
+ // loAdd UMUL_LOHI
+ // \ / :lo \ :hi
+ // \ / \ [no multiline comment]
+ // ADDC | hiAdd
+ // \ :glue / /
+ // \ / /
+ // ADDE
+ //
+ assert(AddcNode->getOpcode() == ISD::ADDC && "Expect an ADDC");
+ SDValue AddcOp0 = AddcNode->getOperand(0);
+ SDValue AddcOp1 = AddcNode->getOperand(1);
+
+ // Check if the two operands are from the same mul_lohi node.
+ if (AddcOp0.getNode() == AddcOp1.getNode())
+ return SDValue();
+
+ assert(AddcNode->getNumValues() == 2 &&
+ AddcNode->getValueType(0) == MVT::i32 &&
+ AddcNode->getValueType(1) == MVT::Glue &&
+ "Expect ADDC with two result values: i32, glue");
+
+ // Check that the ADDC adds the low result of the S/UMUL_LOHI.
+ if (AddcOp0->getOpcode() != ISD::UMUL_LOHI &&
+ AddcOp0->getOpcode() != ISD::SMUL_LOHI &&
+ AddcOp1->getOpcode() != ISD::UMUL_LOHI &&
+ AddcOp1->getOpcode() != ISD::SMUL_LOHI)
+ return SDValue();
+
+ // Look for the glued ADDE.
+ SDNode* AddeNode = AddcNode->getGluedUser();
+ if (AddeNode == NULL)
+ return SDValue();
+
+ // Make sure it is really an ADDE.
+ if (AddeNode->getOpcode() != ISD::ADDE)
+ return SDValue();
+
+ assert(AddeNode->getNumOperands() == 3 &&
+ AddeNode->getOperand(2).getValueType() == MVT::Glue &&
+ "ADDE node has the wrong inputs");
+
+ // Check for the triangle shape.
+ SDValue AddeOp0 = AddeNode->getOperand(0);
+ SDValue AddeOp1 = AddeNode->getOperand(1);
+
+ // Make sure that the ADDE operands are not coming from the same node.
+ if (AddeOp0.getNode() == AddeOp1.getNode())
+ return SDValue();
+
+ // Find the MUL_LOHI node walking up ADDE's operands.
+ bool IsLeftOperandMUL = false;
+ SDValue MULOp = findMUL_LOHI(AddeOp0);
+ if (MULOp == SDValue())
+ MULOp = findMUL_LOHI(AddeOp1);
+ else
+ IsLeftOperandMUL = true;
+ if (MULOp == SDValue())
+ return SDValue();
+
+ // Figure out the right opcode.
+ unsigned Opc = MULOp->getOpcode();
+ unsigned FinalOpc = (Opc == ISD::SMUL_LOHI) ? ARMISD::SMLAL : ARMISD::UMLAL;
+
+ // Figure out the high and low input values to the MLAL node.
+ SDValue* HiMul = &MULOp;
+ SDValue* HiAdd = NULL;
+ SDValue* LoMul = NULL;
+ SDValue* LowAdd = NULL;
+
+ if (IsLeftOperandMUL)
+ HiAdd = &AddeOp1;
+ else
+ HiAdd = &AddeOp0;
+
+
+ if (AddcOp0->getOpcode() == Opc) {
+ LoMul = &AddcOp0;
+ LowAdd = &AddcOp1;
+ }
+ if (AddcOp1->getOpcode() == Opc) {
+ LoMul = &AddcOp1;
+ LowAdd = &AddcOp0;
+ }
+
+ if (LoMul == NULL)
+ return SDValue();
+
+ if (LoMul->getNode() != HiMul->getNode())
+ return SDValue();
+
+ // Create the merged node.
+ SelectionDAG &DAG = DCI.DAG;
+
+ // Build operand list.
+ SmallVector<SDValue, 8> Ops;
+ Ops.push_back(LoMul->getOperand(0));
+ Ops.push_back(LoMul->getOperand(1));
+ Ops.push_back(*LowAdd);
+ Ops.push_back(*HiAdd);
+
+ SDValue MLALNode = DAG.getNode(FinalOpc, AddcNode->getDebugLoc(),
+ DAG.getVTList(MVT::i32, MVT::i32),
+ &Ops[0], Ops.size());
+
+ // Replace the ADDs' nodes uses by the MLA node's values.
+ SDValue HiMLALResult(MLALNode.getNode(), 1);
+ DAG.ReplaceAllUsesOfValueWith(SDValue(AddeNode, 0), HiMLALResult);
+
+ SDValue LoMLALResult(MLALNode.getNode(), 0);
+ DAG.ReplaceAllUsesOfValueWith(SDValue(AddcNode, 0), LoMLALResult);
+
+ // Return original node to notify the driver to stop replacing.
+ SDValue resNode(AddcNode, 0);
+ return resNode;
+}
+
+/// PerformADDCCombine - Target-specific dag combine transform from
+/// ISD::ADDC, ISD::ADDE, and ISD::MUL_LOHI to MLAL.
+static SDValue PerformADDCCombine(SDNode *N,
+ TargetLowering::DAGCombinerInfo &DCI,
+ const ARMSubtarget *Subtarget) {
+
+ return AddCombineTo64bitMLAL(N, DCI, Subtarget);
+
+}
+
/// PerformADDCombineWithOperands - Try DAG combinations for an ADD with
/// operands N0 and N1. This is a helper for PerformADDCombine that is
/// called with the default operands, and if that fails, with commuted
@@ -8735,6 +8890,7 @@
DAGCombinerInfo &DCI) const {
switch (N->getOpcode()) {
default: break;
+ case ISD::ADDC: return PerformADDCCombine(N, DCI, Subtarget);
case ISD::ADD: return PerformADDCombine(N, DCI, Subtarget);
case ISD::SUB: return PerformSUBCombine(N, DCI);
case ISD::MUL: return PerformMULCombine(N, DCI, Subtarget);
Index: lib/Target/ARM/ARMISelDAGToDAG.cpp
===================================================================
--- lib/Target/ARM/ARMISelDAGToDAG.cpp (revision 161211)
+++ lib/Target/ARM/ARMISelDAGToDAG.cpp (working copy)
@@ -2747,6 +2747,38 @@
dl, MVT::i32, MVT::i32, Ops, 5);
}
}
+ case ARMISD::UMLAL:{
+ if (Subtarget->isThumb()) {
+ SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
+ N->getOperand(3), getAL(CurDAG),
+ CurDAG->getRegister(0, MVT::i32)};
+ return CurDAG->getMachineNode(ARM::t2UMLAL, dl, MVT::i32, MVT::i32, Ops, 6);
+ }else{
+ SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
+ N->getOperand(3), getAL(CurDAG),
+ CurDAG->getRegister(0, MVT::i32),
+ CurDAG->getRegister(0, MVT::i32) };
+ return CurDAG->getMachineNode(Subtarget->hasV6Ops() ?
+ ARM::UMLAL : ARM::UMLALv5,
+ dl, MVT::i32, MVT::i32, Ops, 7);
+ }
+ }
+ case ARMISD::SMLAL:{
+ if (Subtarget->isThumb()) {
+ SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
+ N->getOperand(3), getAL(CurDAG),
+ CurDAG->getRegister(0, MVT::i32)};
+ return CurDAG->getMachineNode(ARM::t2SMLAL, dl, MVT::i32, MVT::i32, Ops, 6);
+ }else{
+ SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
+ N->getOperand(3), getAL(CurDAG),
+ CurDAG->getRegister(0, MVT::i32),
+ CurDAG->getRegister(0, MVT::i32) };
+ return CurDAG->getMachineNode(Subtarget->hasV6Ops() ?
+ ARM::SMLAL : ARM::SMLALv5,
+ dl, MVT::i32, MVT::i32, Ops, 7);
+ }
+ }
case ISD::LOAD: {
SDNode *ResNode = 0;
if (Subtarget->isThumb() && Subtarget->hasThumb2())
Index: lib/Target/ARM/ARMInstrThumb2.td
===================================================================
--- lib/Target/ARM/ARMInstrThumb2.td (revision 161211)
+++ lib/Target/ARM/ARMInstrThumb2.td (working copy)
@@ -523,8 +523,27 @@
let Inst{7-4} = opc7_4;
let Inst{3-0} = Rm;
}
+class T2MlaLong<bits<3> opc22_20, bits<4> opc7_4,
+ dag oops, dag iops, InstrItinClass itin,
+ string opc, string asm, list<dag> pattern>
+ : T2I<oops, iops, itin, opc, asm, pattern> {
+ bits<4> RdLo;
+ bits<4> RdHi;
+ bits<4> Rn;
+ bits<4> Rm;
+ bits<4> RLo;
+ bits<4> RHi;
+ let Inst{31-23} = 0b111110111;
+ let Inst{22-20} = opc22_20;
+ let Inst{19-16} = Rn;
+ let Inst{15-12} = RdLo;
+ let Inst{11-8} = RdHi;
+ let Inst{7-4} = opc7_4;
+ let Inst{3-0} = Rm;
+}
+
/// T2I_bin_irs - Defines a set of (op reg, {so_imm|r|so_reg}) patterns for a
/// binary operation that produces a value. These are predicable and can be
/// changed to modify CPSR.
@@ -2428,15 +2447,17 @@
} // isCommutable
// Multiply + accumulate
-def t2SMLAL : T2MulLong<0b100, 0b0000,
- (outs rGPR:$RdLo, rGPR:$RdHi),
+def t2SMLAL : T2MlaLong<0b100, 0b0000,
+ (outs rGPR:$RdLo, rGPR:$RdHi, rGPR:$RLo, rGPR:$RHi),
(ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC64,
- "smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>;
+ "smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
+ RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">;
-def t2UMLAL : T2MulLong<0b110, 0b0000,
- (outs rGPR:$RdLo, rGPR:$RdHi),
+def t2UMLAL : T2MlaLong<0b110, 0b0000,
+ (outs rGPR:$RdLo, rGPR:$RdHi, rGPR:$RLo, rGPR:$RHi),
(ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC64,
- "umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>;
+ "umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>,
+ RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">;
def t2UMAAL : T2MulLong<0b110, 0b0110,
(outs rGPR:$RdLo, rGPR:$RdHi),
More information about the llvm-commits
mailing list