[llvm] r319543 - [X86] Improvement in CodeGen instruction selection for LEAs.
Matt Morehouse via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 1 14:22:07 PST 2017
Reverted in r319591.
On Fri, Dec 1, 2017 at 12:50 PM, Matt Morehouse <mascasa at google.com> wrote:
> If you don't have a fix yet, could you please revert? Bot has been broken
> for several hours now.
>
> On Fri, Dec 1, 2017 at 9:24 AM, Jatin Bhateja <jatin.bhateja at gmail.com>
> wrote:
>
>> I noticed this and looking for its resolution.
>>
>> Thanks,
>> Jatin
>>
>> On Fri, Dec 1, 2017 at 10:25 PM, Matt Morehouse <mascasa at google.com>
>> wrote:
>>
>>> Hi Jatin,
>>>
>>> The gep.ll test is failing under ASan and breaking the x86_64-linux-fast
>>> bot
>>> <http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/10802>.
>>> Please take a look.
>>>
>>> ******************** TEST 'LLVM :: CodeGen/X86/GlobalISel/gep.ll' FAILED ********************
>>> Script:
>>> --
>>> /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/llc -mtriple=x86_64-linux-gnu -global-isel -verify-machineinstrs < /b/sanitizer-x86_64-linux-fast/build/llvm/test/CodeGen/X86/GlobalISel/gep.ll -o - | /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm/test/CodeGen/X86/GlobalISel/gep.ll --check-prefix=ALL --check-prefix=X64_GISEL
>>> /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/llc -mtriple=x86_64-linux-gnu -verify-machineinstrs < /b/sanitizer-x86_64-linux-fast/build/llvm/test/CodeGen/X86/GlobalISel/gep.ll -o - | /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm/test/CodeGen/X86/GlobalISel/gep.ll --check-prefix=ALL --check-prefix=X64
>>> --
>>> Exit Code: 2
>>>
>>> Command Output (stderr):
>>> --
>>> =================================================================
>>> ==64410==ERROR: AddressSanitizer: use-after-poison on address 0x621000024e98 at pc 0x0000036f9139 bp 0x7fffb1b5d940 sp 0x7fffb1b5d938
>>> READ of size 8 at 0x621000024e98 thread T0
>>> #0 0x36f9138 in getParent /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/CodeGen/MachineInstr.h:140:43
>>> #1 0x36f9138 in llvm::MachineInstr::getRegInfo() /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachineInstr.cpp:128
>>> #2 0x2bc42de in llvm::DenseMapInfo<(anonymous namespace)::MemOpKey>::getHashValue((anonymous namespace)::MemOpKey const&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/Target/X86/X86OptimizeLEAs.cpp:184:36
>>>
>>>
>>> On Fri, Dec 1, 2017 at 6:07 AM, Jatin Bhateja via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>> Author: jbhateja
>>>> Date: Fri Dec 1 06:07:38 2017
>>>> New Revision: 319543
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=319543&view=rev
>>>> Log:
>>>> [X86] Improvement in CodeGen instruction selection for LEAs.
>>>>
>>>> Summary:
>>>> 1/ Operand folding during complex pattern matching for LEAs has been
>>>> extended, such that it promotes Scale to
>>>> accommodate similar operand appearing in the DAG e.g.
>>>> T1 = A + B
>>>> T2 = T1 + 10
>>>> T3 = T2 + A
>>>> For above DAG rooted at T3, X86AddressMode will now look like
>>>> Base = B , Index = A , Scale = 2 , Disp = 10
>>>>
>>>> 2/ During OptimizeLEAPass down the pipeline factorization is now
>>>> performed over LEAs so that if there is an opportunity
>>>> then complex LEAs (having 3 operands) could be factored out e.g.
>>>> leal 1(%rax,%rcx,1), %rdx
>>>> leal 1(%rax,%rcx,2), %rcx
>>>> will be factored as following
>>>> leal 1(%rax,%rcx,1), %rdx
>>>> leal (%rdx,%rcx) , %edx
>>>>
>>>> 3/ Aggressive operand folding for AM based selection for LEAs is
>>>> sensitive to loops, thus avoiding creation of any complex LEAs within a
>>>> loop.
>>>>
>>>> 4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX)
>>>> which offers better through put.
>>>>
>>>> PR32755 will be taken care of by this pathc.
>>>>
>>>> Previous patch revisions : r313343 , r314886
>>>>
>>>> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja
>>>>
>>>> Reviewed By: lsaba, RKSimon, jbhateja
>>>>
>>>> Subscribers: jmolloy, spatel, igorb, llvm-commits
>>>>
>>>> Differential Revision: https://reviews.llvm.org/D35014
>>>>
>>>> Modified:
>>>> llvm/trunk/include/llvm/CodeGen/MachineInstr.h
>>>> llvm/trunk/include/llvm/CodeGen/SelectionDAG.h
>>>> llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
>>>> llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp
>>>> llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp
>>>> llvm/trunk/test/CodeGen/X86/GlobalISel/callingconv.ll
>>>> llvm/trunk/test/CodeGen/X86/GlobalISel/gep.ll
>>>> llvm/trunk/test/CodeGen/X86/GlobalISel/memop-scalar.ll
>>>> llvm/trunk/test/CodeGen/X86/lea-opt-cse1.ll
>>>> llvm/trunk/test/CodeGen/X86/lea-opt-cse2.ll
>>>> llvm/trunk/test/CodeGen/X86/lea-opt-cse3.ll
>>>> llvm/trunk/test/CodeGen/X86/lea-opt-cse4.ll
>>>> llvm/trunk/test/CodeGen/X86/mul-constant-i16.ll
>>>> llvm/trunk/test/CodeGen/X86/mul-constant-i32.ll
>>>> llvm/trunk/test/CodeGen/X86/mul-constant-i64.ll
>>>> llvm/trunk/test/CodeGen/X86/mul-constant-result.ll
>>>> llvm/trunk/test/CodeGen/X86/umul-with-overflow.ll
>>>> llvm/trunk/test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll
>>>>
>>>> Modified: llvm/trunk/include/llvm/CodeGen/MachineInstr.h
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/
>>>> CodeGen/MachineInstr.h?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/include/llvm/CodeGen/MachineInstr.h (original)
>>>> +++ llvm/trunk/include/llvm/CodeGen/MachineInstr.h Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -1320,12 +1320,13 @@ public:
>>>> /// Add all implicit def and use operands to this instruction.
>>>> void addImplicitDefUseOperands(MachineFunction &MF);
>>>>
>>>> -private:
>>>> /// If this instruction is embedded into a MachineFunction, return
>>>> the
>>>> /// MachineRegisterInfo object for the current function, otherwise
>>>> /// return null.
>>>> MachineRegisterInfo *getRegInfo();
>>>>
>>>> +private:
>>>> +
>>>> /// Unlink all of the register operands in this instruction from
>>>> their
>>>> /// respective use lists. This requires that the operands already
>>>> be on their
>>>> /// use lists.
>>>>
>>>> Modified: llvm/trunk/include/llvm/CodeGen/SelectionDAG.h
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/
>>>> CodeGen/SelectionDAG.h?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/include/llvm/CodeGen/SelectionDAG.h (original)
>>>> +++ llvm/trunk/include/llvm/CodeGen/SelectionDAG.h Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -300,6 +300,9 @@ public:
>>>> /// type legalization.
>>>> bool NewNodesMustHaveLegalTypes = false;
>>>>
>>>> + /// Set to true for DAG of BasicBlock contained inside a loop.
>>>> + bool IsDAGPartOfLoop = false;
>>>> +
>>>> private:
>>>> /// DAGUpdateListener is a friend so it can manipulate the listener
>>>> stack.
>>>> friend struct DAGUpdateListener;
>>>>
>>>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/S
>>>> electionDAG/SelectionDAGISel.cpp?rev=319543&r1=319542&r2=319
>>>> 543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Fri Dec
>>>> 1 06:07:38 2017
>>>> @@ -27,6 +27,7 @@
>>>> #include "llvm/Analysis/AliasAnalysis.h"
>>>> #include "llvm/Analysis/BranchProbabilityInfo.h"
>>>> #include "llvm/Analysis/CFG.h"
>>>> +#include "llvm/Analysis/LoopInfo.h"
>>>> #include "llvm/Analysis/OptimizationRemarkEmitter.h"
>>>> #include "llvm/Analysis/TargetLibraryInfo.h"
>>>> #include "llvm/CodeGen/FastISel.h"
>>>> @@ -325,6 +326,8 @@ void SelectionDAGISel::getAnalysisUsage(
>>>> if (OptLevel != CodeGenOpt::None)
>>>> AU.addRequired<AAResultsWrapperPass>();
>>>> AU.addRequired<GCModuleInfo>();
>>>> + if (OptLevel != CodeGenOpt::None)
>>>> + AU.addRequired<LoopInfoWrapperPass>();
>>>> AU.addRequired<StackProtector>();
>>>> AU.addPreserved<StackProtector>();
>>>> AU.addPreserved<GCModuleInfo>();
>>>> @@ -1415,6 +1418,7 @@ void SelectionDAGISel::SelectAllBasicBlo
>>>>
>>>> // Iterate over all basic blocks in the function.
>>>> for (const BasicBlock *LLVMBB : RPOT) {
>>>> + CurDAG->IsDAGPartOfLoop = false;
>>>> if (OptLevel != CodeGenOpt::None) {
>>>> bool AllPredsVisited = true;
>>>> for (const_pred_iterator PI = pred_begin(LLVMBB), PE =
>>>> pred_end(LLVMBB);
>>>> @@ -1592,6 +1596,13 @@ void SelectionDAGISel::SelectAllBasicBlo
>>>> FunctionBasedInstrumentation);
>>>> }
>>>>
>>>> + if (OptLevel != CodeGenOpt::None) {
>>>> + auto &LIWP = getAnalysis<LoopInfoWrapperPass>();
>>>> + LoopInfo &LI = LIWP.getLoopInfo();
>>>> + if (LI.getLoopFor(LLVMBB))
>>>> + CurDAG->IsDAGPartOfLoop = true;
>>>> + }
>>>> +
>>>> if (Begin != BI)
>>>> ++NumDAGBlocks;
>>>> else
>>>>
>>>> Modified: llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X8
>>>> 6/X86ISelDAGToDAG.cpp?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp (original)
>>>> +++ llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -88,6 +88,11 @@ namespace {
>>>> IndexReg.getNode() != nullptr || Base_Reg.getNode() !=
>>>> nullptr;
>>>> }
>>>>
>>>> + bool hasComplexAddressingMode() const {
>>>> + return Disp && IndexReg.getNode() != nullptr &&
>>>> + Base_Reg.getNode() != nullptr;
>>>> + }
>>>> +
>>>> /// Return true if this addressing mode is already RIP-relative.
>>>> bool isRIPRelative() const {
>>>> if (BaseType != RegBase) return false;
>>>> @@ -97,6 +102,10 @@ namespace {
>>>> return false;
>>>> }
>>>>
>>>> + bool isLegalScale() const {
>>>> + return (Scale == 1 || Scale == 2 || Scale == 4 || Scale == 8);
>>>> + }
>>>> +
>>>> void setBaseReg(SDValue Reg) {
>>>> BaseType = RegBase;
>>>> Base_Reg = Reg;
>>>> @@ -162,10 +171,13 @@ namespace {
>>>> /// If true, selector should try to optimize for minimum code size.
>>>> bool OptForMinSize;
>>>>
>>>> + /// If true, selector should try to aggresively fold operands into
>>>> AM.
>>>> + bool OptForAggressingFolding;
>>>> +
>>>> public:
>>>> explicit X86DAGToDAGISel(X86TargetMachine &tm, CodeGenOpt::Level
>>>> OptLevel)
>>>> : SelectionDAGISel(tm, OptLevel), OptForSize(false),
>>>> - OptForMinSize(false) {}
>>>> + OptForMinSize(false), OptForAggressingFolding(false) {}
>>>>
>>>> StringRef getPassName() const override {
>>>> return "X86 DAG->DAG Instruction Selection";
>>>> @@ -184,6 +196,12 @@ namespace {
>>>>
>>>> void PreprocessISelDAG() override;
>>>>
>>>> + void setAggressiveOperandFolding(bool val) {
>>>> + OptForAggressingFolding = val;
>>>> + }
>>>> +
>>>> + bool getAggressiveOperandFolding() { return
>>>> OptForAggressingFolding; }
>>>> +
>>>> // Include the pieces autogenerated from the target description.
>>>> #include "X86GenDAGISel.inc"
>>>>
>>>> @@ -198,6 +216,7 @@ namespace {
>>>> bool matchAdd(SDValue N, X86ISelAddressMode &AM, unsigned Depth);
>>>> bool matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,
>>>> unsigned Depth);
>>>> + bool matchAddressLEA(SDValue N, X86ISelAddressMode &AM);
>>>> bool matchAddressBase(SDValue N, X86ISelAddressMode &AM);
>>>> bool selectAddr(SDNode *Parent, SDValue N, SDValue &Base,
>>>> SDValue &Scale, SDValue &Index, SDValue &Disp,
>>>> @@ -447,6 +466,20 @@ namespace {
>>>>
>>>> bool isMaskZeroExtended(SDNode *N) const;
>>>> };
>>>> +
>>>> + class X86AggressiveOperandFolding {
>>>> + public:
>>>> + explicit X86AggressiveOperandFolding(X86DAGToDAGISel &ISel, bool
>>>> val)
>>>> + : Selector(&ISel) {
>>>> + Selector->setAggressiveOperandFolding(val);
>>>> + }
>>>> + ~X86AggressiveOperandFolding() {
>>>> + Selector->setAggressiveOperandFolding(false);
>>>> + }
>>>> +
>>>> + private:
>>>> + X86DAGToDAGISel *Selector;
>>>> + };
>>>> }
>>>>
>>>>
>>>> @@ -1194,7 +1227,7 @@ static bool foldMaskAndShiftToScale(Sele
>>>> AM.IndexReg = NewSRL;
>>>> return false;
>>>> }
>>>> -
>>>> +
>>>> bool X86DAGToDAGISel::matchAddressRecursively(SDValue N,
>>>> X86ISelAddressMode &AM,
>>>> unsigned Depth) {
>>>> SDLoc dl(N);
>>>> @@ -1202,8 +1235,14 @@ bool X86DAGToDAGISel::matchAddressRecurs
>>>> dbgs() << "MatchAddress: ";
>>>> AM.dump();
>>>> });
>>>> - // Limit recursion.
>>>> - if (Depth > 5)
>>>> +
>>>> + // Limit recursion. For aggressive operand folding recurse
>>>> + // till depth 8 which is the maximum legal scale value.
>>>> + auto getMaxOperandFoldingDepth = [&] () -> unsigned int {
>>>> + return getAggressiveOperandFolding() ? 8 : 5;
>>>> + };
>>>> +
>>>> + if (Depth > getMaxOperandFoldingDepth())
>>>> return matchAddressBase(N, AM);
>>>>
>>>> // If this is already a %rip relative address, we can only merge
>>>> immediates
>>>> @@ -1494,6 +1533,20 @@ bool X86DAGToDAGISel::matchAddressBase(S
>>>> return false;
>>>> }
>>>>
>>>> + if (OptLevel != CodeGenOpt::None && getAggressiveOperandFolding()
>>>> &&
>>>> + AM.BaseType == X86ISelAddressMode::RegBase) {
>>>> + if (AM.Base_Reg == N) {
>>>> + SDValue Base_Reg = AM.Base_Reg;
>>>> + AM.Base_Reg = AM.IndexReg;
>>>> + AM.IndexReg = Base_Reg;
>>>> + AM.Scale++;
>>>> + return false;
>>>> + } else if (AM.IndexReg == N) {
>>>> + AM.Scale++;
>>>> + return false;
>>>> + }
>>>> + }
>>>> +
>>>> // Otherwise, we cannot select it.
>>>> return true;
>>>> }
>>>> @@ -1729,7 +1782,7 @@ bool X86DAGToDAGISel::selectLEA64_32Addr
>>>> SDValue &Disp, SDValue
>>>> &Segment) {
>>>> // Save the debug loc before calling selectLEAAddr, in case it
>>>> invalidates N.
>>>> SDLoc DL(N);
>>>> -
>>>> +
>>>> if (!selectLEAAddr(N, Base, Scale, Index, Disp, Segment))
>>>> return false;
>>>>
>>>> @@ -1764,6 +1817,29 @@ bool X86DAGToDAGISel::selectLEA64_32Addr
>>>> return true;
>>>> }
>>>>
>>>> +bool X86DAGToDAGISel::matchAddressLEA(SDValue N, X86ISelAddressMode
>>>> &AM) {
>>>> + // Avoid enabling aggressive operand folding when node N is a part
>>>> of loop.
>>>> + X86AggressiveOperandFolding Enable(*this, !CurDAG->IsDAGPartOfLoop);
>>>> +
>>>> + bool matchRes = matchAddress(N, AM);
>>>> +
>>>> + // Check for legality of scale when recursion unwinds back to the
>>>> top.
>>>> + if (!matchRes) {
>>>> + if (!AM.isLegalScale())
>>>> + return true;
>>>> +
>>>> + // Avoid creating costly complex LEAs having scale less than 2
>>>> + // within loop.
>>>> + if(CurDAG->IsDAGPartOfLoop && Subtarget->slow3OpsLEA() &&
>>>> + AM.Scale <= 2 && AM.hasComplexAddressingMode() &&
>>>> + (!AM.hasSymbolicDisplacement() && N.getOpcode() <
>>>> ISD::BUILTIN_OP_END))
>>>> + return true;
>>>> + }
>>>> +
>>>> + return matchRes;
>>>> +}
>>>> +
>>>> +
>>>> /// Calls SelectAddr and determines if the maximal addressing
>>>> /// mode it matches can be cost effectively emitted as an LEA
>>>> instruction.
>>>> bool X86DAGToDAGISel::selectLEAAddr(SDValue N,
>>>> @@ -1781,7 +1857,7 @@ bool X86DAGToDAGISel::selectLEAAddr(SDVa
>>>> SDValue Copy = AM.Segment;
>>>> SDValue T = CurDAG->getRegister(0, MVT::i32);
>>>> AM.Segment = T;
>>>> - if (matchAddress(N, AM))
>>>> + if (matchAddressLEA(N, AM))
>>>> return false;
>>>> assert (T == AM.Segment);
>>>> AM.Segment = Copy;
>>>>
>>>> Modified: llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X8
>>>> 6/X86OptimizeLEAs.cpp?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp (original)
>>>> +++ llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -27,6 +27,7 @@
>>>> #include "llvm/ADT/SmallVector.h"
>>>> #include "llvm/ADT/Statistic.h"
>>>> #include "llvm/CodeGen/MachineBasicBlock.h"
>>>> +#include "llvm/CodeGen/MachineDominators.h"
>>>> #include "llvm/CodeGen/MachineFunction.h"
>>>> #include "llvm/CodeGen/MachineFunctionPass.h"
>>>> #include "llvm/CodeGen/MachineInstr.h"
>>>> @@ -58,6 +59,7 @@ static cl::opt<bool>
>>>> cl::init(false));
>>>>
>>>> STATISTIC(NumSubstLEAs, "Number of LEA instruction substitutions");
>>>> +STATISTIC(NumFactoredLEAs, "Number of LEAs factorized");
>>>> STATISTIC(NumRedundantLEAs, "Number of redundant LEA instructions
>>>> removed");
>>>>
>>>> /// \brief Returns true if two machine operands are identical and they
>>>> are not
>>>> @@ -65,6 +67,10 @@ STATISTIC(NumRedundantLEAs, "Number of r
>>>> static inline bool isIdenticalOp(const MachineOperand &MO1,
>>>> const MachineOperand &MO2);
>>>>
>>>> +/// \brief Returns true if two machine instructions have identical
>>>> operands.
>>>> +static bool isIdenticalMI(MachineRegisterInfo *MRI, const
>>>> MachineOperand &MO1,
>>>> + const MachineOperand &MO2);
>>>> +
>>>> /// \brief Returns true if two address displacement operands are of
>>>> the same
>>>> /// type and use the same symbol/index/address regardless of the
>>>> offset.
>>>> static bool isSimilarDispOp(const MachineOperand &MO1,
>>>> @@ -73,6 +79,9 @@ static bool isSimilarDispOp(const Machin
>>>> /// \brief Returns true if the instruction is LEA.
>>>> static inline bool isLEA(const MachineInstr &MI);
>>>>
>>>> +/// \brief Returns true if Definition of Operand is a copylike
>>>> instruction.
>>>> +static bool isDefCopyLike(MachineRegisterInfo *MRI, const
>>>> MachineOperand &Opr);
>>>> +
>>>> namespace {
>>>>
>>>> /// A key based on instruction's memory operands.
>>>> @@ -80,15 +89,35 @@ class MemOpKey {
>>>> public:
>>>> MemOpKey(const MachineOperand *Base, const MachineOperand *Scale,
>>>> const MachineOperand *Index, const MachineOperand *Segment,
>>>> - const MachineOperand *Disp)
>>>> - : Disp(Disp) {
>>>> + const MachineOperand *Disp, bool DispCheck = false)
>>>> + : Disp(Disp), DeepCheck(DispCheck) {
>>>> Operands[0] = Base;
>>>> Operands[1] = Scale;
>>>> Operands[2] = Index;
>>>> Operands[3] = Segment;
>>>> }
>>>>
>>>> + /// Checks operands of MemOpKey are identical, if Base or Index
>>>> + /// operand definitions are of kind SUBREG_TO_REG then compare
>>>> + /// operands of defining MI.
>>>> + bool performDeepCheck(const MemOpKey &Other) const {
>>>> + MachineInstr *MI = const_cast<MachineInstr
>>>> *>(Operands[0]->getParent());
>>>> + MachineRegisterInfo *MRI = MI->getRegInfo();
>>>> +
>>>> + for (int i = 0; i < 4; i++) {
>>>> + bool CopyLike = isDefCopyLike(MRI, *Operands[i]);
>>>> + if (CopyLike && !isIdenticalMI(MRI, *Operands[i],
>>>> *Other.Operands[i]))
>>>> + return false;
>>>> + else if (!CopyLike && !isIdenticalOp(*Operands[i],
>>>> *Other.Operands[i]))
>>>> + return false;
>>>> + }
>>>> + return isIdenticalOp(*Disp, *Other.Disp);
>>>> + }
>>>> +
>>>> bool operator==(const MemOpKey &Other) const {
>>>> + if (DeepCheck)
>>>> + return performDeepCheck(Other);
>>>> +
>>>> // Addresses' bases, scales, indices and segments must be
>>>> identical.
>>>> for (int i = 0; i < 4; ++i)
>>>> if (!isIdenticalOp(*Operands[i], *Other.Operands[i]))
>>>> @@ -106,6 +135,12 @@ public:
>>>>
>>>> // Address' displacement operand.
>>>> const MachineOperand *Disp;
>>>> +
>>>> + // If true checks Address' base, index, segment and
>>>> + // displacement are identical, in additions if base/index
>>>> + // are defined by copylike instruction then futher
>>>> + // compare the operands of the defining instruction.
>>>> + bool DeepCheck;
>>>> };
>>>>
>>>> } // end anonymous namespace
>>>> @@ -131,12 +166,34 @@ template <> struct DenseMapInfo<MemOpKey
>>>> static unsigned getHashValue(const MemOpKey &Val) {
>>>> // Checking any field of MemOpKey is enough to determine if the
>>>> key is
>>>> // empty or tombstone.
>>>> + hash_code Hash(0);
>>>> assert(Val.Disp != PtrInfo::getEmptyKey() && "Cannot hash the
>>>> empty key");
>>>> assert(Val.Disp != PtrInfo::getTombstoneKey() &&
>>>> "Cannot hash the tombstone key");
>>>>
>>>> - hash_code Hash = hash_combine(*Val.Operands[0], *Val.Operands[1],
>>>> - *Val.Operands[2], *Val.Operands[3]);
>>>> + auto getMIHash = [](MachineInstr *MI) -> hash_code {
>>>> + hash_code h(0);
>>>> + for (unsigned i = 1, e = MI->getNumOperands(); i < e; i++)
>>>> + h = hash_combine(h, MI->getOperand(i));
>>>> + return h;
>>>> + };
>>>> +
>>>> + const MachineOperand &Base = *Val.Operands[0];
>>>> + const MachineOperand &Index = *Val.Operands[2];
>>>> + MachineInstr *MI = const_cast<MachineInstr *>(Base.getParent());
>>>> + MachineRegisterInfo *MRI = MI->getRegInfo();
>>>> +
>>>> + if (isDefCopyLike(MRI, Base))
>>>> + Hash = getMIHash(MRI->getVRegDef(Base.getReg()));
>>>> + else
>>>> + Hash = hash_combine(Hash, Base);
>>>> +
>>>> + if (isDefCopyLike(MRI, Index))
>>>> + Hash = getMIHash(MRI->getVRegDef(Index.getReg()));
>>>> + else
>>>> + Hash = hash_combine(Hash, Index);
>>>> +
>>>> + Hash = hash_combine(Hash, *Val.Operands[1], *Val.Operands[3]);
>>>>
>>>> // If the address displacement is an immediate, it should not
>>>> affect the
>>>> // hash so that memory operands which differ only be immediate
>>>> displacement
>>>> @@ -196,6 +253,16 @@ static inline MemOpKey getMemOpKey(const
>>>> &MI.getOperand(N + X86::AddrDisp));
>>>> }
>>>>
>>>> +static inline MemOpKey getMemOpCSEKey(const MachineInstr &MI, unsigned
>>>> N) {
>>>> + static MachineOperand DummyScale = MachineOperand::CreateImm(1);
>>>> + assert((isLEA(MI) || MI.mayLoadOrStore()) &&
>>>> + "The instruction must be a LEA, a load or a store");
>>>> + return MemOpKey(&MI.getOperand(N + X86::AddrBaseReg), &DummyScale,
>>>> + &MI.getOperand(N + X86::AddrIndexReg),
>>>> + &MI.getOperand(N + X86::AddrSegmentReg),
>>>> + &MI.getOperand(N + X86::AddrDisp), true);
>>>> +}
>>>> +
>>>> static inline bool isIdenticalOp(const MachineOperand &MO1,
>>>> const MachineOperand &MO2) {
>>>> return MO1.isIdenticalTo(MO2) &&
>>>> @@ -203,6 +270,27 @@ static inline bool isIdenticalOp(const M
>>>> !TargetRegisterInfo::isPhysicalRegister(MO1.getReg()));
>>>> }
>>>>
>>>> +static bool isIdenticalMI(MachineRegisterInfo *MRI, const
>>>> MachineOperand &MO1,
>>>> + const MachineOperand &MO2) {
>>>> + MachineInstr *MI1 = nullptr;
>>>> + MachineInstr *MI2 = nullptr;
>>>> + if (!MO1.isReg() || !MO2.isReg())
>>>> + return false;
>>>> +
>>>> + MI1 = MRI->getVRegDef(MO1.getReg());
>>>> + MI2 = MRI->getVRegDef(MO2.getReg());
>>>> + if (!MI1 || !MI2)
>>>> + return false;
>>>> + if (MI1->getOpcode() != MI2->getOpcode())
>>>> + return false;
>>>> + if (MI1->getNumOperands() != MI2->getNumOperands())
>>>> + return false;
>>>> + for (unsigned i = 1, e = MI1->getNumOperands(); i < e; ++i)
>>>> + if (!isIdenticalOp(MI1->getOperand(i), MI2->getOperand(i)))
>>>> + return false;
>>>> + return true;
>>>> +}
>>>> +
>>>> #ifndef NDEBUG
>>>> static bool isValidDispOp(const MachineOperand &MO) {
>>>> return MO.isImm() || MO.isCPI() || MO.isJTI() || MO.isSymbol() ||
>>>> @@ -234,8 +322,150 @@ static inline bool isLEA(const MachineIn
>>>> Opcode == X86::LEA64r || Opcode == X86::LEA64_32r;
>>>> }
>>>>
>>>> +static bool isDefCopyLike(MachineRegisterInfo *MRI, const
>>>> MachineOperand &Opr) {
>>>> + bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent());
>>>> + if (!Opr.isReg() || isInstrErased ||
>>>> + TargetRegisterInfo::isPhysicalRegister(Opr.getReg()))
>>>> + return false;
>>>> + MachineInstr *MI = MRI->getVRegDef(Opr.getReg());
>>>> + return MI && MI->isCopyLike();
>>>> +}
>>>> +
>>>> namespace {
>>>>
>>>> +/// This class captures the functions and attributes
>>>> +/// needed to factorize LEA within and across basic
>>>> +/// blocks.LEA instruction with same BASE,OFFSET and
>>>> +/// INDEX are the candidates for factorization.
>>>> +class FactorizeLEAOpt {
>>>> +public:
>>>> + using LEAListT = std::list<MachineInstr *>;
>>>> + using LEAMapT = DenseMap<MemOpKey, LEAListT>;
>>>> + using ValueT = DenseMap<MemOpKey, unsigned>;
>>>> + using ScopeEntryT = std::pair<MachineBasicBlock *, ValueT>;
>>>> + using ScopeStackT = std::vector<ScopeEntryT>;
>>>> +
>>>> + FactorizeLEAOpt() = default;
>>>> + FactorizeLEAOpt(const FactorizeLEAOpt &) = delete;
>>>> + FactorizeLEAOpt &operator=(const FactorizeLEAOpt &) = delete;
>>>> +
>>>> + void performCleanup() {
>>>> + for (auto LEA : removedLEAs)
>>>> + LEA->eraseFromParent();
>>>> + LEAs.clear();
>>>> + Stack.clear();
>>>> + removedLEAs.clear();
>>>> + }
>>>> +
>>>> + LEAMapT &getLEAMap() { return LEAs; }
>>>> + ScopeEntryT *getTopScope() { return &Stack.back(); }
>>>> +
>>>> + void addForLazyRemoval(MachineInstr *Instr) {
>>>> removedLEAs.insert(Instr); }
>>>> +
>>>> + bool checkIfScheduledForRemoval(MachineInstr *Instr) {
>>>> + return removedLEAs.find(Instr) != removedLEAs.end();
>>>> + }
>>>> +
>>>> + /// Push the ScopeEntry for the BasicBlock over Stack.
>>>> + /// Also traverses over list of instruction and update
>>>> + /// LEAs Map and ScopeEntry for each LEA instruction
>>>> + /// found using insertLEA().
>>>> + void collectDataForBasicBlock(MachineBasicBlock *MBB);
>>>> +
>>>> + /// Stores the size of MachineInstr list corrosponding
>>>> + /// to key K from LEAs MAP into the ScopeEntry of
>>>> + /// the basic block, then insert the LEA at the beginning
>>>> + /// of the list.
>>>> + void insertLEA(MachineInstr *MI);
>>>> +
>>>> + /// Pops out ScopeEntry of top most BasicBlock from the stack
>>>> + /// and remove the LEA instructions contained in the scope
>>>> + /// from the LEAs Map.
>>>> + void removeDataForBasicBlock();
>>>> +
>>>> + /// If LEA contains Physical Registers then its not a candidate
>>>> + /// for factorizations since physical registers may violate SSA
>>>> + /// semantics of MI.
>>>> + bool containsPhyReg(MachineInstr *MI, unsigned RecLevel);
>>>> +
>>>> +private:
>>>> + ScopeStackT Stack;
>>>> + LEAMapT LEAs;
>>>> + std::set<MachineInstr *> removedLEAs;
>>>> +};
>>>> +
>>>> +void FactorizeLEAOpt::collectDataForBasicBlock(MachineBasicBlock
>>>> *MBB) {
>>>> + ValueT EmptyMap;
>>>> + ScopeEntryT SE = std::make_pair(MBB, EmptyMap);
>>>> + Stack.push_back(SE);
>>>> + for (auto &MI : *MBB) {
>>>> + if (isLEA(MI))
>>>> + insertLEA(&MI);
>>>> + }
>>>> +}
>>>> +
>>>> +void FactorizeLEAOpt::removeDataForBasicBlock() {
>>>> + ScopeEntryT &SE = Stack.back();
>>>> + for (auto MapEntry : SE.second) {
>>>> + LEAMapT::iterator Itr = LEAs.find(MapEntry.first);
>>>> + assert((Itr != LEAs.end()) &&
>>>> + "LEAs map must have a node corresponding to ScopeEntry's
>>>> Key.");
>>>> +
>>>> + while (((*Itr).second.size() > MapEntry.second))
>>>> + (*Itr).second.pop_front();
>>>> + // If list goes empty remove entry from LEAs Map.
>>>> + if ((*Itr).second.empty())
>>>> + LEAs.erase(Itr);
>>>> + }
>>>> + Stack.pop_back();
>>>> +}
>>>> +
>>>> +bool FactorizeLEAOpt::containsPhyReg(MachineInstr *MI, unsigned
>>>> RecLevel) {
>>>> + if (!MI || !RecLevel)
>>>> + return false;
>>>> +
>>>> + MachineRegisterInfo *MRI = MI->getRegInfo();
>>>> + for (auto Operand : MI->operands()) {
>>>> + if (!Operand.isReg())
>>>> + continue;
>>>> + if (TargetRegisterInfo::isPhysicalRegister(Operand.getReg()))
>>>> + return true;
>>>> + MachineInstr *OperDefMI = MRI->getVRegDef(Operand.getReg());
>>>> + if (OperDefMI && (MI != OperDefMI) && OperDefMI->isCopyLike() &&
>>>> + containsPhyReg(OperDefMI, RecLevel - 1))
>>>> + return true;
>>>> + }
>>>> + return false;
>>>> +}
>>>> +
>>>> +void FactorizeLEAOpt::insertLEA(MachineInstr *MI) {
>>>> + unsigned lsize;
>>>> + if (containsPhyReg(MI, 2))
>>>> + return;
>>>> +
>>>> + // Factorization is beneficial only for complex LEAs.
>>>> + MachineOperand &Base = MI->getOperand(1);
>>>> + MachineOperand &Index = MI->getOperand(3);
>>>> + MachineOperand &Offset = MI->getOperand(4);
>>>> + if ((Offset.isImm() && !Offset.getImm()) ||
>>>> + (!Base.isReg() || !Base.getReg()) || (!Index.isReg() ||
>>>> !Index.getReg()))
>>>> + return;
>>>> +
>>>> + MemOpKey Key = getMemOpCSEKey(*MI, 1);
>>>> + ScopeEntryT *TopScope = getTopScope();
>>>> +
>>>> + LEAMapT::iterator Itr = LEAs.find(Key);
>>>> + if (Itr == LEAs.end()) {
>>>> + lsize = 0;
>>>> + LEAs[Key].push_front(MI);
>>>> + } else {
>>>> + lsize = (*Itr).second.size();
>>>> + (*Itr).second.push_front(MI);
>>>> + }
>>>> + if (TopScope->second.find(Key) == TopScope->second.end())
>>>> + TopScope->second[Key] = lsize;
>>>> +}
>>>> +
>>>> class OptimizeLEAPass : public MachineFunctionPass {
>>>> public:
>>>> OptimizeLEAPass() : MachineFunctionPass(ID) {}
>>>> @@ -247,6 +477,12 @@ public:
>>>> /// been calculated by LEA. Also, remove redundant LEAs.
>>>> bool runOnMachineFunction(MachineFunction &MF) override;
>>>>
>>>> + void getAnalysisUsage(AnalysisUsage &AU) const override {
>>>> + AU.setPreservesCFG();
>>>> + MachineFunctionPass::getAnalysisUsage(AU);
>>>> + AU.addRequired<MachineDominatorTree>();
>>>> + }
>>>> +
>>>> private:
>>>> using MemOpMap = DenseMap<MemOpKey, SmallVector<MachineInstr *, 16>>;
>>>>
>>>> @@ -292,8 +528,24 @@ private:
>>>> /// \brief Removes LEAs which calculate similar addresses.
>>>> bool removeRedundantLEAs(MemOpMap &LEAs);
>>>>
>>>> + /// \brief Visit over basic blocks, collect LEAs in a scoped
>>>> + /// hash map (FactorizeLEAOpt::LEAs) and try to factor them out.
>>>> + bool FactorizeLEAsAllBasicBlocks(MachineFunction &MF);
>>>> +
>>>> + bool FactorizeLEAsBasicBlock(MachineDomTreeNode *DN);
>>>> +
>>>> + /// \brief Factor out LEAs which share Base,Index,Offset and Segment.
>>>> + bool processBasicBlock(const MachineBasicBlock &MBB);
>>>> +
>>>> + /// \brief Try to replace LEA with a lower strength instruction
>>>> + /// to improves latency and throughput.
>>>> + bool strengthReduceLEAs(MemOpMap &LEAs, const MachineBasicBlock
>>>> &MBB);
>>>> +
>>>> DenseMap<const MachineInstr *, unsigned> InstrPos;
>>>>
>>>> + FactorizeLEAOpt FactorOpt;
>>>> +
>>>> + MachineDominatorTree *DT;
>>>> MachineRegisterInfo *MRI;
>>>> const X86InstrInfo *TII;
>>>> const X86RegisterInfo *TRI;
>>>> @@ -489,7 +741,9 @@ void OptimizeLEAPass::findLEAs(const Mac
>>>> bool OptimizeLEAPass::removeRedundantAddrCalc(MemOpMap &LEAs) {
>>>> bool Changed = false;
>>>>
>>>> - assert(!LEAs.empty());
>>>> + if (LEAs.empty())
>>>> + return Changed;
>>>> +
>>>> MachineBasicBlock *MBB = (*LEAs.begin()->second.begin()
>>>> )->getParent();
>>>>
>>>> // Process all instructions in basic block.
>>>> @@ -659,6 +913,10 @@ bool OptimizeLEAPass::removeRedundantLEA
>>>> // Erase removed LEA from the list.
>>>> I2 = List.erase(I2);
>>>>
>>>> + // If List becomes empty remove it from LEAs map.
>>>> + if (List.empty())
>>>> + LEAs.erase(E.first);
>>>> +
>>>> Changed = true;
>>>> }
>>>> ++I1;
>>>> @@ -668,6 +926,217 @@ bool OptimizeLEAPass::removeRedundantLEA
>>>> return Changed;
>>>> }
>>>>
>>>> +static inline int getADDrrFromLEA(int LEAOpcode) {
>>>> + switch (LEAOpcode) {
>>>> + default:
>>>> + llvm_unreachable("Unexpected LEA instruction");
>>>> + case X86::LEA16r:
>>>> + return X86::ADD16rr;
>>>> + case X86::LEA32r:
>>>> + return X86::ADD32rr;
>>>> + case X86::LEA64_32r:
>>>> + case X86::LEA64r:
>>>> + return X86::ADD64rr;
>>>> + }
>>>> +}
>>>> +
>>>> +bool OptimizeLEAPass::strengthReduceLEAs(MemOpMap &LEAs,
>>>> + const MachineBasicBlock &BB) {
>>>> + bool Changed = false;
>>>> +
>>>> + // Loop over all entries in the table.
>>>> + for (auto &E : LEAs) {
>>>> + auto &List = E.second;
>>>> +
>>>> + // Loop over all LEA pairs.
>>>> + auto I1 = List.begin();
>>>> + while (I1 != List.end()) {
>>>> + MachineInstrBuilder NewMI;
>>>> + MachineInstr &First = **I1;
>>>> + MachineOperand &Res = First.getOperand(0);
>>>> + MachineOperand &Base = First.getOperand(1);
>>>> + MachineOperand &Scale = First.getOperand(2);
>>>> + MachineOperand &Index = First.getOperand(3);
>>>> + MachineOperand &Offset = First.getOperand(4);
>>>> +
>>>> + const MCInstrDesc &ADDrr = TII->get(getADDrrFromLEA(First
>>>> .getOpcode()));
>>>> + const DebugLoc DL = First.getDebugLoc();
>>>> +
>>>> + if (!Base.isReg() || !Index.isReg() || !Index.getReg()) {
>>>> + I1++;
>>>> + continue;
>>>> + }
>>>> +
>>>> + if (TargetRegisterInfo::isPhysicalRegister(Res.getReg()) ||
>>>> + TargetRegisterInfo::isPhysicalRegister(Base.getReg()) ||
>>>> + TargetRegisterInfo::isPhysicalRegister(Index.getReg())) {
>>>> + I1++;
>>>> + continue;
>>>> + }
>>>> +
>>>> + // Check for register class compatibility between Result and
>>>> + // Index operands.
>>>> + const TargetRegisterClass *ResRC = MRI->getRegClass(Res.getReg())
>>>> ;
>>>> + const TargetRegisterClass *IndexRC =
>>>> MRI->getRegClass(Index.getReg());
>>>> + if (TRI->getRegSizeInBits(*ResRC) !=
>>>> TRI->getRegSizeInBits(*IndexRC)) {
>>>> + I1++;
>>>> + continue;
>>>> + }
>>>> +
>>>> + MachineBasicBlock &MBB = *(const_cast<MachineBasicBlock *>(&BB));
>>>> + // R = B + I
>>>> + if (Scale.isImm() && Scale.getImm() == 1 && Offset.isImm() &&
>>>> + !Offset.getImm()) {
>>>> + NewMI = BuildMI(MBB, &First, DL, ADDrr)
>>>> + .addDef(Res.getReg())
>>>> + .addUse(Base.getReg())
>>>> + .addUse(Index.getReg());
>>>> + Changed = NewMI.getInstr() != nullptr;
>>>> + First.eraseFromParent();
>>>> + I1 = List.erase(I1);
>>>> +
>>>> + // If List becomes empty remove it from LEAs map.
>>>> + if (List.empty())
>>>> + LEAs.erase(E.first);
>>>> + } else
>>>> + I1++;
>>>> + }
>>>> + }
>>>> + return Changed;
>>>> +}
>>>> +
>>>> +bool OptimizeLEAPass::processBasicBlock(const MachineBasicBlock &MBB)
>>>> {
>>>> + bool cseDone = false;
>>>> +
>>>> + // Legal scale value (1,2,4 & 8) vector.
>>>> + auto LegalScale = [](int scale) {
>>>> + return scale == 1 || scale == 2 || scale == 4 || scale == 8;
>>>> + };
>>>> +
>>>> + auto CompareFn = [](const MachineInstr *Arg1,
>>>> + const MachineInstr *Arg2) -> bool {
>>>> + return Arg1->getOperand(2).getImm() >=
>>>> Arg2->getOperand(2).getImm();
>>>> + };
>>>> +
>>>> + // Loop over all entries in the table.
>>>> + for (auto &E : FactorOpt.getLEAMap()) {
>>>> + auto &List = E.second;
>>>> + if (List.size() > 1)
>>>> + List.sort(CompareFn);
>>>> +
>>>> + // Loop over all LEA pairs.
>>>> + for (auto Iter1 = List.begin(); Iter1 != List.end(); Iter1++) {
>>>> + for (auto Iter2 = std::next(Iter1); Iter2 != List.end();
>>>> Iter2++) {
>>>> + MachineInstr &LI1 = **Iter1;
>>>> + MachineInstr &LI2 = **Iter2;
>>>> +
>>>> + if (!DT->dominates(&LI2, &LI1))
>>>> + continue;
>>>> +
>>>> + int Scale1 = LI1.getOperand(2).getImm();
>>>> + int Scale2 = LI2.getOperand(2).getImm();
>>>> + assert(LI2.getOperand(0).isReg() && "Result is a VirtualReg");
>>>> + DebugLoc DL = LI1.getDebugLoc();
>>>> +
>>>> + // Continue if instruction has already been factorized.
>>>> + if (FactorOpt.checkIfScheduledForRemoval(&LI1))
>>>> + continue;
>>>> +
>>>> + int Factor = Scale1 - Scale2;
>>>> + if (Factor > 0 && LegalScale(Factor)) {
>>>> + MachineOperand NewBase = LI2.getOperand(0);
>>>> + MachineOperand NewIndex = LI1.getOperand(3);
>>>> +
>>>> + const TargetRegisterClass *LI2ResRC =
>>>> + MRI->getRegClass(LI2.getOperand(0).getReg());
>>>> + const TargetRegisterClass *LI1BaseRC =
>>>> + MRI->getRegClass(LI1.getOperand(1).getReg());
>>>> +
>>>> + if (TRI->getRegSizeInBits(*LI1BaseRC) >
>>>> + TRI->getRegSizeInBits(*LI2ResRC)) {
>>>> + MachineInstr *LI1IndexDef =
>>>> + MRI->getVRegDef(LI1.getOperand(3).getReg());
>>>> + if (LI1IndexDef->getOpcode() !=
>>>> TargetOpcode::SUBREG_TO_REG)
>>>> + continue;
>>>> + MachineOperand &SubReg = LI1IndexDef->getOperand(2);
>>>> + const TargetRegisterClass *SubRegRC =
>>>> + MRI->getRegClass(SubReg.getReg());
>>>> + if (TRI->getRegSizeInBits(*SubRegRC) !=
>>>> + TRI->getRegSizeInBits(*LI2ResRC))
>>>> + continue;
>>>> + NewIndex = SubReg;
>>>> + }
>>>> +
>>>> + DEBUG(dbgs() << "CSE LEAs: Candidate to replace: ";
>>>> LI1.dump(););
>>>> + MachineInstrBuilder NewMI =
>>>> + BuildMI(*(const_cast<MachineBasicBlock *>(&MBB)), &LI1,
>>>> DL,
>>>> + TII->get(LI1.getOpcode()))
>>>> + .addDef(LI1.getOperand(0).getReg()) // Dst = Dst
>>>> of LI1.
>>>> + .addUse(NewBase.getReg()) // Base = Dst
>>>> to LI2.
>>>> + .addImm(Factor) // Scale = Diff b/w
>>>> scales.
>>>> + .addUse(NewIndex.getReg()) // Index = Index of LI1.
>>>> + .addImm(0) // Disp = 0
>>>> + .addUse(
>>>> + LI1.getOperand(5).getReg()); // Segment =
>>>> Segmant of LI1.
>>>> +
>>>> + cseDone = NewMI.getInstr() != nullptr;
>>>> +
>>>> + /// To preserve the SSA nature we need to remove Def flag
>>>> + /// from old result.
>>>> + LI1.getOperand(0).setIsDef(false);
>>>> +
>>>> + /// Lazy removal shall ensure that replaced LEA remains
>>>> + /// till we finish processing all the basic block. This shall
>>>> + /// provide opportunity for further factorization based on
>>>> + /// the replaced LEA which will be legal since it has same
>>>> + /// destination as newly formed LEA.
>>>> + FactorOpt.addForLazyRemoval(&LI1);
>>>> +
>>>> + NumFactoredLEAs++;
>>>> + DEBUG(dbgs() << "CSE LEAs: Replaced by: "; NewMI->dump(););
>>>> + }
>>>> + }
>>>> + }
>>>> + }
>>>> + return cseDone;
>>>> +}
>>>> +
>>>> +bool OptimizeLEAPass::FactorizeLEAsBasicBlock(MachineDomTreeNode *DN)
>>>> {
>>>> + bool Changed = false;
>>>> + using StackT = SmallVector<MachineDomTreeNode *, 16>;
>>>> + using VisitedNodeMapT = SmallSet<MachineDomTreeNode *, 16>;
>>>> +
>>>> + StackT WorkList;
>>>> + VisitedNodeMapT DomNodeMap;
>>>> +
>>>> + WorkList.push_back(DN);
>>>> + while (!WorkList.empty()) {
>>>> + MachineDomTreeNode *MDN = WorkList.back();
>>>> + FactorOpt.collectDataForBasicBlock(MDN->getBlock());
>>>> + Changed |= processBasicBlock(*MDN->getBlock());
>>>> +
>>>> + if (DomNodeMap.find(MDN) == DomNodeMap.end()) {
>>>> + DomNodeMap.insert(MDN);
>>>> + for (auto Child : MDN->getChildren())
>>>> + WorkList.push_back(Child);
>>>> + }
>>>> +
>>>> + MachineDomTreeNode *TDM = WorkList.back();
>>>> + if (MDN->getLevel() == TDM->getLevel()) {
>>>> + FactorOpt.removeDataForBasicBlock();
>>>> + DomNodeMap.erase(MDN);
>>>> + WorkList.pop_back();
>>>> + }
>>>> + }
>>>> + return Changed;
>>>> +}
>>>> +
>>>> +bool OptimizeLEAPass::FactorizeLEAsAllBasicBlocks(MachineFunction
>>>> &MF) {
>>>> + bool Changed = FactorizeLEAsBasicBlock(DT->getRootNode());
>>>> + FactorOpt.performCleanup();
>>>> + return Changed;
>>>> +}
>>>> +
>>>> bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {
>>>> bool Changed = false;
>>>>
>>>> @@ -677,6 +1146,10 @@ bool OptimizeLEAPass::runOnMachineFuncti
>>>> MRI = &MF.getRegInfo();
>>>> TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();
>>>> TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();
>>>> + DT = &getAnalysis<MachineDominatorTree>();
>>>> +
>>>> + // Attempt factorizing LEAs.
>>>> + Changed |= FactorizeLEAsAllBasicBlocks(MF);
>>>>
>>>> // Process all basic blocks.
>>>> for (auto &MBB : MF) {
>>>> @@ -693,6 +1166,9 @@ bool OptimizeLEAPass::runOnMachineFuncti
>>>> // Remove redundant LEA instructions.
>>>> Changed |= removeRedundantLEAs(LEAs);
>>>>
>>>> + // Strength reduce LEA instructions.
>>>> + Changed |= strengthReduceLEAs(LEAs, MBB);
>>>> +
>>>> // Remove redundant address calculations. Do it only for -Os/-Oz
>>>> since only
>>>> // a code size gain is expected from this part of the pass.
>>>> if (MF.getFunction()->optForSize())
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/GlobalISel/callingconv.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/GlobalISel/callingconv.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/GlobalISel/callingconv.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/GlobalISel/callingconv.ll Fri Dec 1
>>>> 06:07:38 2017
>>>> @@ -388,7 +388,7 @@ define void @test_variadic_call_2(i8** %
>>>> ; X32-NEXT: movl 4(%ecx), %ecx
>>>> ; X32-NEXT: movl %eax, (%esp)
>>>> ; X32-NEXT: movl $4, %eax
>>>> -; X32-NEXT: leal (%esp,%eax), %eax
>>>> +; X32-NEXT: addl %esp, %eax
>>>> ; X32-NEXT: movl %edx, 4(%esp)
>>>> ; X32-NEXT: movl %ecx, 4(%eax)
>>>> ; X32-NEXT: calll variadic_callee
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/GlobalISel/gep.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/GlobalISel/gep.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/GlobalISel/gep.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/GlobalISel/gep.ll Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -5,10 +5,10 @@
>>>> define i32* @test_gep_i8(i32 *%arr, i8 %ind) {
>>>> ; X64_GISEL-LABEL: test_gep_i8:
>>>> ; X64_GISEL: # BB#0:
>>>> -; X64_GISEL-NEXT: movq $4, %rax
>>>> -; X64_GISEL-NEXT: movsbq %sil, %rcx
>>>> -; X64_GISEL-NEXT: imulq %rax, %rcx
>>>> -; X64_GISEL-NEXT: leaq (%rdi,%rcx), %rax
>>>> +; X64_GISEL-NEXT: movq $4, %rcx
>>>> +; X64_GISEL-NEXT: movsbq %sil, %rax
>>>> +; X64_GISEL-NEXT: imulq %rcx, %rax
>>>> +; X64_GISEL-NEXT: addq %rdi, %rax
>>>> ; X64_GISEL-NEXT: retq
>>>> ;
>>>> ; X64-LABEL: test_gep_i8:
>>>> @@ -25,7 +25,7 @@ define i32* @test_gep_i8_const(i32 *%arr
>>>> ; X64_GISEL-LABEL: test_gep_i8_const:
>>>> ; X64_GISEL: # BB#0:
>>>> ; X64_GISEL-NEXT: movq $80, %rax
>>>> -; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax
>>>> +; X64_GISEL-NEXT: addq %rdi, %rax
>>>> ; X64_GISEL-NEXT: retq
>>>> ;
>>>> ; X64-LABEL: test_gep_i8_const:
>>>> @@ -39,10 +39,10 @@ define i32* @test_gep_i8_const(i32 *%arr
>>>> define i32* @test_gep_i16(i32 *%arr, i16 %ind) {
>>>> ; X64_GISEL-LABEL: test_gep_i16:
>>>> ; X64_GISEL: # BB#0:
>>>> -; X64_GISEL-NEXT: movq $4, %rax
>>>> -; X64_GISEL-NEXT: movswq %si, %rcx
>>>> -; X64_GISEL-NEXT: imulq %rax, %rcx
>>>> -; X64_GISEL-NEXT: leaq (%rdi,%rcx), %rax
>>>> +; X64_GISEL-NEXT: movq $4, %rcx
>>>> +; X64_GISEL-NEXT: movswq %si, %rax
>>>> +; X64_GISEL-NEXT: imulq %rcx, %rax
>>>> +; X64_GISEL-NEXT: addq %rdi, %rax
>>>> ; X64_GISEL-NEXT: retq
>>>> ;
>>>> ; X64-LABEL: test_gep_i16:
>>>> @@ -59,7 +59,7 @@ define i32* @test_gep_i16_const(i32 *%ar
>>>> ; X64_GISEL-LABEL: test_gep_i16_const:
>>>> ; X64_GISEL: # BB#0:
>>>> ; X64_GISEL-NEXT: movq $80, %rax
>>>> -; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax
>>>> +; X64_GISEL-NEXT: addq %rdi, %rax
>>>> ; X64_GISEL-NEXT: retq
>>>> ;
>>>> ; X64-LABEL: test_gep_i16_const:
>>>> @@ -73,10 +73,10 @@ define i32* @test_gep_i16_const(i32 *%ar
>>>> define i32* @test_gep_i32(i32 *%arr, i32 %ind) {
>>>> ; X64_GISEL-LABEL: test_gep_i32:
>>>> ; X64_GISEL: # BB#0:
>>>> -; X64_GISEL-NEXT: movq $4, %rax
>>>> -; X64_GISEL-NEXT: movslq %esi, %rcx
>>>> -; X64_GISEL-NEXT: imulq %rax, %rcx
>>>> -; X64_GISEL-NEXT: leaq (%rdi,%rcx), %rax
>>>> +; X64_GISEL-NEXT: movq $4, %rcx
>>>> +; X64_GISEL-NEXT: movslq %esi, %rax
>>>> +; X64_GISEL-NEXT: imulq %rcx, %rax
>>>> +; X64_GISEL-NEXT: addq %rdi, %rax
>>>> ; X64_GISEL-NEXT: retq
>>>> ;
>>>> ; X64-LABEL: test_gep_i32:
>>>> @@ -92,7 +92,7 @@ define i32* @test_gep_i32_const(i32 *%ar
>>>> ; X64_GISEL-LABEL: test_gep_i32_const:
>>>> ; X64_GISEL: # BB#0:
>>>> ; X64_GISEL-NEXT: movq $20, %rax
>>>> -; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax
>>>> +; X64_GISEL-NEXT: addq %rdi, %rax
>>>> ; X64_GISEL-NEXT: retq
>>>> ;
>>>> ; X64-LABEL: test_gep_i32_const:
>>>> @@ -108,7 +108,7 @@ define i32* @test_gep_i64(i32 *%arr, i64
>>>> ; X64_GISEL: # BB#0:
>>>> ; X64_GISEL-NEXT: movq $4, %rax
>>>> ; X64_GISEL-NEXT: imulq %rsi, %rax
>>>> -; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax
>>>> +; X64_GISEL-NEXT: addq %rdi, %rax
>>>> ; X64_GISEL-NEXT: retq
>>>> ;
>>>> ; X64-LABEL: test_gep_i64:
>>>> @@ -123,7 +123,7 @@ define i32* @test_gep_i64_const(i32 *%ar
>>>> ; X64_GISEL-LABEL: test_gep_i64_const:
>>>> ; X64_GISEL: # BB#0:
>>>> ; X64_GISEL-NEXT: movq $20, %rax
>>>> -; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax
>>>> +; X64_GISEL-NEXT: addq %rdi, %rax
>>>> ; X64_GISEL-NEXT: retq
>>>> ;
>>>> ; X64-LABEL: test_gep_i64_const:
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/GlobalISel/memop-scalar.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/GlobalISel/memop-scalar.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/GlobalISel/memop-scalar.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/GlobalISel/memop-scalar.ll Fri Dec 1
>>>> 06:07:38 2017
>>>> @@ -181,7 +181,7 @@ define i32 @test_gep_folding_largeGepInd
>>>> ; ALL-LABEL: test_gep_folding_largeGepIndex:
>>>> ; ALL: # BB#0:
>>>> ; ALL-NEXT: movabsq $228719476720, %rax # imm = 0x3540BE3FF0
>>>> -; ALL-NEXT: leaq (%rdi,%rax), %rax
>>>> +; ALL-NEXT: addq %rdi, %rax
>>>> ; ALL-NEXT: movl %esi, (%rax)
>>>> ; ALL-NEXT: movl (%rax), %eax
>>>> ; ALL-NEXT: retq
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/lea-opt-cse1.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/lea-opt-cse1.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/lea-opt-cse1.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/lea-opt-cse1.ll Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -9,27 +9,21 @@ define void @test_func(%struct.SA* nocap
>>>> ; X64: # BB#0: # %entry
>>>> ; X64-NEXT: movl (%rdi), %eax
>>>> ; X64-NEXT: movl 16(%rdi), %ecx
>>>> -; X64-NEXT: leal (%rax,%rcx), %edx
>>>> ; X64-NEXT: leal 1(%rax,%rcx), %eax
>>>> ; X64-NEXT: movl %eax, 12(%rdi)
>>>> -; X64-NEXT: leal 1(%rcx,%rdx), %eax
>>>> +; X64-NEXT: addq %ecx, %eax
>>>> ; X64-NEXT: movl %eax, 16(%rdi)
>>>> ; X64-NEXT: retq
>>>> ;
>>>> ; X86-LABEL: test_func:
>>>> ; X86: # BB#0: # %entry
>>>> -; X86-NEXT: pushl %esi
>>>> -; X86-NEXT: .cfi_def_cfa_offset 8
>>>> -; X86-NEXT: .cfi_offset %esi, -8
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> ; X86-NEXT: movl (%eax), %ecx
>>>> ; X86-NEXT: movl 16(%eax), %edx
>>>> -; X86-NEXT: leal 1(%ecx,%edx), %esi
>>>> +; X86-NEXT: leal 1(%ecx,%edx), %ecx
>>>> +; X86-NEXT: movl %ecx, 12(%eax)
>>>> ; X86-NEXT: addl %edx, %ecx
>>>> -; X86-NEXT: movl %esi, 12(%eax)
>>>> -; X86-NEXT: leal 1(%edx,%ecx), %ecx
>>>> ; X86-NEXT: movl %ecx, 16(%eax)
>>>> -; X86-NEXT: popl %esi
>>>> ; X86-NEXT: retl
>>>> entry:
>>>> %h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0,
>>>> i32 0
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/lea-opt-cse2.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/lea-opt-cse2.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/lea-opt-cse2.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/lea-opt-cse2.ll Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -1,6 +1,6 @@
>>>> ; NOTE: Assertions have been autogenerated by
>>>> utils/update_llc_test_checks.py
>>>> -; RUN: llc < %s -mtriple=x86_64-unknown | FileCheck %s
>>>> -check-prefix=X64
>>>> -; RUN: llc < %s -mtriple=i686-unknown | FileCheck %s
>>>> -check-prefix=X86
>>>> +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+slow-3ops-lea |
>>>> FileCheck %s -check-prefix=X64
>>>> +; RUN: llc < %s -mtriple=i686-unknown -mattr=+slow-3ops-lea |
>>>> FileCheck %s -check-prefix=X86
>>>>
>>>> %struct.SA = type { i32 , i32 , i32 , i32 , i32};
>>>>
>>>> @@ -10,43 +10,39 @@ define void @foo(%struct.SA* nocapture %
>>>> ; X64-NEXT: .p2align 4, 0x90
>>>> ; X64-NEXT: .LBB0_1: # %loop
>>>> ; X64-NEXT: # =>This Inner Loop Header: Depth=1
>>>> -; X64-NEXT: movl (%rdi), %eax
>>>> -; X64-NEXT: movl 16(%rdi), %ecx
>>>> -; X64-NEXT: leal 1(%rax,%rcx), %edx
>>>> -; X64-NEXT: movl %edx, 12(%rdi)
>>>> +; X64-NEXT: movl 16(%rdi), %eax
>>>> +; X64-NEXT: movl (%rdi), %ecx
>>>> +; X64-NEXT: addl %eax, %ecx
>>>> +; X64-NEXT: incl %ecx
>>>> +; X64-NEXT: movl %ecx, 12(%rdi)
>>>> ; X64-NEXT: decl %esi
>>>> ; X64-NEXT: jne .LBB0_1
>>>> ; X64-NEXT: # BB#2: # %exit
>>>> -; X64-NEXT: addl %ecx, %eax
>>>> -; X64-NEXT: leal 1(%rcx,%rax), %eax
>>>> -; X64-NEXT: movl %eax, 16(%rdi)
>>>> +; X64-NEXT: addl %eax, %ecx
>>>> +; X64-NEXT: movl %ecx, 16(%rdi)
>>>> ; X64-NEXT: retq
>>>> ;
>>>> ; X86-LABEL: foo:
>>>> ; X86: # BB#0: # %entry
>>>> -; X86-NEXT: pushl %edi
>>>> -; X86-NEXT: .cfi_def_cfa_offset 8
>>>> ; X86-NEXT: pushl %esi
>>>> -; X86-NEXT: .cfi_def_cfa_offset 12
>>>> -; X86-NEXT: .cfi_offset %esi, -12
>>>> -; X86-NEXT: .cfi_offset %edi, -8
>>>> +; X86-NEXT: .cfi_def_cfa_offset 8
>>>> +; X86-NEXT: .cfi_offset %esi, -8
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> ; X86-NEXT: .p2align 4, 0x90
>>>> ; X86-NEXT: .LBB0_1: # %loop
>>>> ; X86-NEXT: # =>This Inner Loop Header: Depth=1
>>>> -; X86-NEXT: movl (%eax), %edx
>>>> -; X86-NEXT: movl 16(%eax), %esi
>>>> -; X86-NEXT: leal 1(%edx,%esi), %edi
>>>> -; X86-NEXT: movl %edi, 12(%eax)
>>>> +; X86-NEXT: movl 16(%eax), %edx
>>>> +; X86-NEXT: movl (%eax), %esi
>>>> +; X86-NEXT: addl %edx, %esi
>>>> +; X86-NEXT: incl %esi
>>>> +; X86-NEXT: movl %esi, 12(%eax)
>>>> ; X86-NEXT: decl %ecx
>>>> ; X86-NEXT: jne .LBB0_1
>>>> ; X86-NEXT: # BB#2: # %exit
>>>> -; X86-NEXT: addl %esi, %edx
>>>> -; X86-NEXT: leal 1(%esi,%edx), %ecx
>>>> -; X86-NEXT: movl %ecx, 16(%eax)
>>>> +; X86-NEXT: addl %edx, %esi
>>>> +; X86-NEXT: movl %esi, 16(%eax)
>>>> ; X86-NEXT: popl %esi
>>>> -; X86-NEXT: popl %edi
>>>> ; X86-NEXT: retl
>>>> entry:
>>>> br label %loop
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/lea-opt-cse3.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/lea-opt-cse3.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/lea-opt-cse3.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/lea-opt-cse3.ll Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -8,7 +8,7 @@ define i32 @foo(i32 %a, i32 %b) local_un
>>>> ; X64-NEXT: # kill: %esi<def> %esi<kill> %rsi<def>
>>>> ; X64-NEXT: # kill: %edi<def> %edi<kill> %rdi<def>
>>>> ; X64-NEXT: leal 4(%rdi,%rsi,2), %ecx
>>>> -; X64-NEXT: leal 4(%rdi,%rsi,4), %eax
>>>> +; X64-NEXT: leal (%ecx,%esi,2), %eax
>>>> ; X64-NEXT: imull %ecx, %eax
>>>> ; X64-NEXT: retq
>>>> ;
>>>> @@ -16,9 +16,9 @@ define i32 @foo(i32 %a, i32 %b) local_un
>>>> ; X86: # BB#0: # %entry
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
>>>> -; X86-NEXT: leal 4(%ecx,%eax,2), %edx
>>>> -; X86-NEXT: leal 4(%ecx,%eax,4), %eax
>>>> -; X86-NEXT: imull %edx, %eax
>>>> +; X86-NEXT: leal 4(%ecx,%eax,2), %ecx
>>>> +; X86-NEXT: leal (%ecx,%eax,2), %eax
>>>> +; X86-NEXT: imull %ecx, %eax
>>>> ; X86-NEXT: retl
>>>> entry:
>>>> %mul = shl i32 %b, 1
>>>> @@ -36,7 +36,7 @@ define i32 @foo1(i32 %a, i32 %b) local_u
>>>> ; X64-NEXT: # kill: %esi<def> %esi<kill> %rsi<def>
>>>> ; X64-NEXT: # kill: %edi<def> %edi<kill> %rdi<def>
>>>> ; X64-NEXT: leal 4(%rdi,%rsi,4), %ecx
>>>> -; X64-NEXT: leal 4(%rdi,%rsi,8), %eax
>>>> +; X64-NEXT: leal (%ecx,%esi,4), %eax
>>>> ; X64-NEXT: imull %ecx, %eax
>>>> ; X64-NEXT: retq
>>>> ;
>>>> @@ -44,9 +44,9 @@ define i32 @foo1(i32 %a, i32 %b) local_u
>>>> ; X86: # BB#0: # %entry
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
>>>> -; X86-NEXT: leal 4(%ecx,%eax,4), %edx
>>>> -; X86-NEXT: leal 4(%ecx,%eax,8), %eax
>>>> -; X86-NEXT: imull %edx, %eax
>>>> +; X86-NEXT: leal 4(%ecx,%eax,4), %ecx
>>>> +; X86-NEXT: leal (%ecx,%eax,4), %eax
>>>> +; X86-NEXT: imull %ecx, %eax
>>>> ; X86-NEXT: retl
>>>> entry:
>>>> %mul = shl i32 %b, 2
>>>> @@ -68,29 +68,23 @@ define i32 @foo1_mult_basic_blocks(i32 %
>>>> ; X64-NEXT: cmpl $10, %ecx
>>>> ; X64-NEXT: je .LBB2_2
>>>> ; X64-NEXT: # BB#1: # %mid
>>>> -; X64-NEXT: leal 4(%rdi,%rsi,8), %eax
>>>> -; X64-NEXT: imull %eax, %ecx
>>>> -; X64-NEXT: movl %ecx, %eax
>>>> +; X64-NEXT: leal (%ecx,%esi,4), %eax
>>>> +; X64-NEXT: imull %ecx, %eax
>>>> ; X64-NEXT: .LBB2_2: # %exit
>>>> ; X64-NEXT: retq
>>>> ;
>>>> ; X86-LABEL: foo1_mult_basic_blocks:
>>>> ; X86: # BB#0: # %entry
>>>> -; X86-NEXT: pushl %esi
>>>> -; X86-NEXT: .cfi_def_cfa_offset 8
>>>> -; X86-NEXT: .cfi_offset %esi, -8
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
>>>> -; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
>>>> -; X86-NEXT: leal 4(%esi,%edx,4), %ecx
>>>> +; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> +; X86-NEXT: leal 4(%eax,%edx,4), %ecx
>>>> ; X86-NEXT: xorl %eax, %eax
>>>> ; X86-NEXT: cmpl $10, %ecx
>>>> ; X86-NEXT: je .LBB2_2
>>>> ; X86-NEXT: # BB#1: # %mid
>>>> -; X86-NEXT: leal 4(%esi,%edx,8), %eax
>>>> -; X86-NEXT: imull %eax, %ecx
>>>> -; X86-NEXT: movl %ecx, %eax
>>>> +; X86-NEXT: leal (%ecx,%edx,4), %eax
>>>> +; X86-NEXT: imull %ecx, %eax
>>>> ; X86-NEXT: .LBB2_2: # %exit
>>>> -; X86-NEXT: popl %esi
>>>> ; X86-NEXT: retl
>>>> entry:
>>>> %mul = shl i32 %b, 2
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/lea-opt-cse4.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/lea-opt-cse4.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/lea-opt-cse4.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/lea-opt-cse4.ll Fri Dec 1 06:07:38
>>>> 2017
>>>> @@ -1,41 +1,31 @@
>>>> ; NOTE: Assertions have been autogenerated by
>>>> utils/update_llc_test_checks.py
>>>> -; RUN: llc < %s -mtriple=x86_64-unknown | FileCheck %s
>>>> -check-prefix=X64
>>>> -; RUN: llc < %s -mtriple=i686-unknown | FileCheck %s
>>>> -check-prefix=X86
>>>> +; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+slow-3ops-lea |
>>>> FileCheck %s -check-prefix=X64
>>>> +; RUN: llc < %s -mtriple=i686-unknown -mattr=+slow-3ops-lea |
>>>> FileCheck %s -check-prefix=X86
>>>>
>>>> %struct.SA = type { i32 , i32 , i32 , i32 , i32};
>>>>
>>>> define void @foo(%struct.SA* nocapture %ctx, i32 %n)
>>>> local_unnamed_addr #0 {
>>>> ; X64-LABEL: foo:
>>>> ; X64: # BB#0: # %entry
>>>> -; X64-NEXT: movl 16(%rdi), %eax
>>>> -; X64-NEXT: movl (%rdi), %ecx
>>>> -; X64-NEXT: addl %eax, %ecx
>>>> -; X64-NEXT: addl %eax, %ecx
>>>> -; X64-NEXT: addl %eax, %ecx
>>>> -; X64-NEXT: leal (%rcx,%rax), %edx
>>>> -; X64-NEXT: leal 1(%rax,%rcx), %ecx
>>>> -; X64-NEXT: movl %ecx, 12(%rdi)
>>>> -; X64-NEXT: leal 1(%rax,%rdx), %eax
>>>> +; X64-NEXT: movl (%rdi), %eax
>>>> +; X64-NEXT: movl 16(%rdi), %ecx
>>>> +; X64-NEXT: leal (%rax,%rcx,4), %eax
>>>> +; X64-NEXT: addl $1, %eax
>>>> +; X64-NEXT: movl %eax, 12(%rdi)
>>>> +; X64-NEXT: addl %ecx, %eax
>>>> ; X64-NEXT: movl %eax, 16(%rdi)
>>>> ; X64-NEXT: retq
>>>> ;
>>>> ; X86-LABEL: foo:
>>>> ; X86: # BB#0: # %entry
>>>> -; X86-NEXT: pushl %esi
>>>> -; X86-NEXT: .cfi_def_cfa_offset 8
>>>> -; X86-NEXT: .cfi_offset %esi, -8
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> -; X86-NEXT: movl 16(%eax), %ecx
>>>> -; X86-NEXT: movl (%eax), %edx
>>>> -; X86-NEXT: addl %ecx, %edx
>>>> -; X86-NEXT: addl %ecx, %edx
>>>> -; X86-NEXT: addl %ecx, %edx
>>>> -; X86-NEXT: leal 1(%ecx,%edx), %esi
>>>> -; X86-NEXT: addl %ecx, %edx
>>>> -; X86-NEXT: movl %esi, 12(%eax)
>>>> -; X86-NEXT: leal 1(%ecx,%edx), %ecx
>>>> +; X86-NEXT: movl (%eax), %ecx
>>>> +; X86-NEXT: movl 16(%eax), %edx
>>>> +; X86-NEXT: leal (%ecx,%edx,4), %ecx
>>>> +; X86-NEXT: addl $1, %ecx
>>>> +; X86-NEXT: movl %ecx, 12(%eax)
>>>> +; X86-NEXT: addl %edx, %ecx
>>>> ; X86-NEXT: movl %ecx, 16(%eax)
>>>> -; X86-NEXT: popl %esi
>>>> ; X86-NEXT: retl
>>>> entry:
>>>> %h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0,
>>>> i32 0
>>>> @@ -62,15 +52,15 @@ define void @foo_loop(%struct.SA* nocapt
>>>> ; X64-NEXT: .p2align 4, 0x90
>>>> ; X64-NEXT: .LBB1_1: # %loop
>>>> ; X64-NEXT: # =>This Inner Loop Header: Depth=1
>>>> -; X64-NEXT: movl (%rdi), %ecx
>>>> ; X64-NEXT: movl 16(%rdi), %eax
>>>> -; X64-NEXT: leal 1(%rcx,%rax), %edx
>>>> -; X64-NEXT: movl %edx, 12(%rdi)
>>>> +; X64-NEXT: movl (%rdi), %ecx
>>>> +; X64-NEXT: addl %eax, %ecx
>>>> +; X64-NEXT: incl %ecx
>>>> +; X64-NEXT: movl %ecx, 12(%rdi)
>>>> ; X64-NEXT: decl %esi
>>>> ; X64-NEXT: jne .LBB1_1
>>>> ; X64-NEXT: # BB#2: # %exit
>>>> ; X64-NEXT: addl %eax, %ecx
>>>> -; X64-NEXT: leal 1(%rax,%rcx), %ecx
>>>> ; X64-NEXT: addl %eax, %ecx
>>>> ; X64-NEXT: addl %eax, %ecx
>>>> ; X64-NEXT: addl %eax, %ecx
>>>> @@ -82,26 +72,23 @@ define void @foo_loop(%struct.SA* nocapt
>>>> ;
>>>> ; X86-LABEL: foo_loop:
>>>> ; X86: # BB#0: # %entry
>>>> -; X86-NEXT: pushl %edi
>>>> -; X86-NEXT: .cfi_def_cfa_offset 8
>>>> ; X86-NEXT: pushl %esi
>>>> -; X86-NEXT: .cfi_def_cfa_offset 12
>>>> -; X86-NEXT: .cfi_offset %esi, -12
>>>> -; X86-NEXT: .cfi_offset %edi, -8
>>>> -; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
>>>> +; X86-NEXT: .cfi_def_cfa_offset 8
>>>> +; X86-NEXT: .cfi_offset %esi, -8
>>>> +; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> ; X86-NEXT: .p2align 4, 0x90
>>>> ; X86-NEXT: .LBB1_1: # %loop
>>>> ; X86-NEXT: # =>This Inner Loop Header: Depth=1
>>>> -; X86-NEXT: movl (%eax), %esi
>>>> ; X86-NEXT: movl 16(%eax), %ecx
>>>> -; X86-NEXT: leal 1(%esi,%ecx), %edi
>>>> -; X86-NEXT: movl %edi, 12(%eax)
>>>> -; X86-NEXT: decl %edx
>>>> +; X86-NEXT: movl (%eax), %edx
>>>> +; X86-NEXT: addl %ecx, %edx
>>>> +; X86-NEXT: incl %edx
>>>> +; X86-NEXT: movl %edx, 12(%eax)
>>>> +; X86-NEXT: decl %esi
>>>> ; X86-NEXT: jne .LBB1_1
>>>> ; X86-NEXT: # BB#2: # %exit
>>>> -; X86-NEXT: addl %ecx, %esi
>>>> -; X86-NEXT: leal 1(%ecx,%esi), %edx
>>>> +; X86-NEXT: addl %ecx, %edx
>>>> ; X86-NEXT: addl %ecx, %edx
>>>> ; X86-NEXT: addl %ecx, %edx
>>>> ; X86-NEXT: addl %ecx, %edx
>>>> @@ -110,7 +97,6 @@ define void @foo_loop(%struct.SA* nocapt
>>>> ; X86-NEXT: addl %ecx, %edx
>>>> ; X86-NEXT: movl %edx, 16(%eax)
>>>> ; X86-NEXT: popl %esi
>>>> -; X86-NEXT: popl %edi
>>>> ; X86-NEXT: retl
>>>> entry:
>>>> br label %loop
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/mul-constant-i16.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/mul-constant-i16.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/mul-constant-i16.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/mul-constant-i16.ll Fri Dec 1
>>>> 06:07:38 2017
>>>> @@ -558,11 +558,10 @@ define i16 @test_mul_by_28(i16 %x) {
>>>> define i16 @test_mul_by_29(i16 %x) {
>>>> ; X86-LABEL: test_mul_by_29:
>>>> ; X86: # BB#0:
>>>> -; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
>>>> -; X86-NEXT: leal (%ecx,%ecx,8), %eax
>>>> -; X86-NEXT: leal (%eax,%eax,2), %eax
>>>> -; X86-NEXT: addl %ecx, %eax
>>>> -; X86-NEXT: addl %ecx, %eax
>>>> +; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
>>>> +; X86-NEXT: leal (%eax,%eax,8), %ecx
>>>> +; X86-NEXT: leal (%ecx,%ecx,2), %ecx
>>>> +; X86-NEXT: leal (%ecx,%eax,2), %eax
>>>> ; X86-NEXT: # kill: %ax<def> %ax<kill> %eax<kill>
>>>> ; X86-NEXT: retl
>>>> ;
>>>> @@ -571,8 +570,7 @@ define i16 @test_mul_by_29(i16 %x) {
>>>> ; X64-NEXT: # kill: %edi<def> %edi<kill> %rdi<def>
>>>> ; X64-NEXT: leal (%rdi,%rdi,8), %eax
>>>> ; X64-NEXT: leal (%rax,%rax,2), %eax
>>>> -; X64-NEXT: addl %edi, %eax
>>>> -; X64-NEXT: addl %edi, %eax
>>>> +; X64-NEXT: leal (%rax,%rdi,2), %eax
>>>> ; X64-NEXT: # kill: %ax<def> %ax<kill> %eax<kill>
>>>> ; X64-NEXT: retq
>>>> %mul = mul nsw i16 %x, 29
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/mul-constant-i32.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/mul-constant-i32.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/mul-constant-i32.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/mul-constant-i32.ll Fri Dec 1
>>>> 06:07:38 2017
>>>> @@ -1457,11 +1457,10 @@ define i32 @test_mul_by_28(i32 %x) {
>>>> define i32 @test_mul_by_29(i32 %x) {
>>>> ; X86-LABEL: test_mul_by_29:
>>>> ; X86: # BB#0:
>>>> -; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
>>>> -; X86-NEXT: leal (%ecx,%ecx,8), %eax
>>>> -; X86-NEXT: leal (%eax,%eax,2), %eax
>>>> -; X86-NEXT: addl %ecx, %eax
>>>> -; X86-NEXT: addl %ecx, %eax
>>>> +; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> +; X86-NEXT: leal (%eax,%eax,8), %ecx
>>>> +; X86-NEXT: leal (%ecx,%ecx,2), %ecx
>>>> +; X86-NEXT: leal (%ecx,%eax,2), %eax
>>>> ; X86-NEXT: retl
>>>> ;
>>>> ; X64-HSW-LABEL: test_mul_by_29:
>>>> @@ -1469,8 +1468,7 @@ define i32 @test_mul_by_29(i32 %x) {
>>>> ; X64-HSW-NEXT: # kill: %edi<def> %edi<kill> %rdi<def>
>>>> ; X64-HSW-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]
>>>> ; X64-HSW-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]
>>>> -; X64-HSW-NEXT: addl %edi, %eax # sched: [1:0.25]
>>>> -; X64-HSW-NEXT: addl %edi, %eax # sched: [1:0.25]
>>>> +; X64-HSW-NEXT: leal (%rax,%rdi,2), %eax # sched: [1:0.50]
>>>> ; X64-HSW-NEXT: retq # sched: [2:1.00]
>>>> ;
>>>> ; X64-JAG-LABEL: test_mul_by_29:
>>>> @@ -1478,8 +1476,7 @@ define i32 @test_mul_by_29(i32 %x) {
>>>> ; X64-JAG-NEXT: # kill: %edi<def> %edi<kill> %rdi<def>
>>>> ; X64-JAG-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]
>>>> ; X64-JAG-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]
>>>> -; X64-JAG-NEXT: addl %edi, %eax # sched: [1:0.50]
>>>> -; X64-JAG-NEXT: addl %edi, %eax # sched: [1:0.50]
>>>> +; X64-JAG-NEXT: leal (%rax,%rdi,2), %eax # sched: [1:0.50]
>>>> ; X64-JAG-NEXT: retq # sched: [4:1.00]
>>>> ;
>>>> ; X86-NOOPT-LABEL: test_mul_by_29:
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/mul-constant-i64.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/mul-constant-i64.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/mul-constant-i64.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/mul-constant-i64.ll Fri Dec 1
>>>> 06:07:38 2017
>>>> @@ -1523,8 +1523,7 @@ define i64 @test_mul_by_29(i64 %x) {
>>>> ; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
>>>> ; X86-NEXT: leal (%eax,%eax,8), %ecx
>>>> ; X86-NEXT: leal (%ecx,%ecx,2), %ecx
>>>> -; X86-NEXT: addl %eax, %ecx
>>>> -; X86-NEXT: addl %eax, %ecx
>>>> +; X86-NEXT: leal (%ecx,%eax,2), %ecx
>>>> ; X86-NEXT: movl $29, %eax
>>>> ; X86-NEXT: mull {{[0-9]+}}(%esp)
>>>> ; X86-NEXT: addl %ecx, %edx
>>>> @@ -1534,16 +1533,14 @@ define i64 @test_mul_by_29(i64 %x) {
>>>> ; X64-HSW: # BB#0:
>>>> ; X64-HSW-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]
>>>> ; X64-HSW-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]
>>>> -; X64-HSW-NEXT: addq %rdi, %rax # sched: [1:0.25]
>>>> -; X64-HSW-NEXT: addq %rdi, %rax # sched: [1:0.25]
>>>> +; X64-HSW-NEXT: leaq (%rax,%rdi,2), %rax # sched: [1:0.50]
>>>> ; X64-HSW-NEXT: retq # sched: [2:1.00]
>>>> ;
>>>> ; X64-JAG-LABEL: test_mul_by_29:
>>>> ; X64-JAG: # BB#0:
>>>> ; X64-JAG-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]
>>>> ; X64-JAG-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]
>>>> -; X64-JAG-NEXT: addq %rdi, %rax # sched: [1:0.50]
>>>> -; X64-JAG-NEXT: addq %rdi, %rax # sched: [1:0.50]
>>>> +; X64-JAG-NEXT: leaq (%rax,%rdi,2), %rax # sched: [1:0.50]
>>>> ; X64-JAG-NEXT: retq # sched: [4:1.00]
>>>> ;
>>>> ; X86-NOOPT-LABEL: test_mul_by_29:
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/mul-constant-result.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/mul-constant-result.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/mul-constant-result.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/mul-constant-result.ll Fri Dec 1
>>>> 06:07:38 2017
>>>> @@ -164,8 +164,7 @@ define i32 @mult(i32, i32) local_unnamed
>>>> ; X86-NEXT: .LBB0_35:
>>>> ; X86-NEXT: leal (%eax,%eax,8), %ecx
>>>> ; X86-NEXT: leal (%ecx,%ecx,2), %ecx
>>>> -; X86-NEXT: addl %eax, %ecx
>>>> -; X86-NEXT: addl %ecx, %eax
>>>> +; X86-NEXT: leal (%ecx,%eax,2), %eax
>>>> ; X86-NEXT: popl %esi
>>>> ; X86-NEXT: retl
>>>> ; X86-NEXT: .LBB0_36:
>>>> @@ -323,16 +322,17 @@ define i32 @mult(i32, i32) local_unnamed
>>>> ; X64-HSW-NEXT: .LBB0_31:
>>>> ; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
>>>> ; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
>>>> -; X64-HSW-NEXT: jmp .LBB0_17
>>>> -; X64-HSW-NEXT: .LBB0_32:
>>>> -; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
>>>> -; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
>>>> -; X64-HSW-NEXT: addl %eax, %ecx
>>>> ; X64-HSW-NEXT: .LBB0_17:
>>>> ; X64-HSW-NEXT: addl %eax, %ecx
>>>> ; X64-HSW-NEXT: movl %ecx, %eax
>>>> ; X64-HSW-NEXT: # kill: %eax<def> %eax<kill> %rax<kill>
>>>> ; X64-HSW-NEXT: retq
>>>> +; X64-HSW-NEXT: .LBB0_32:
>>>> +; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
>>>> +; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
>>>> +; X64-HSW-NEXT: leal (%rcx,%rax,2), %eax
>>>> +; X64-HSW-NEXT: # kill: %eax<def> %eax<kill> %rax<kill>
>>>> +; X64-HSW-NEXT: retq
>>>> ; X64-HSW-NEXT: .LBB0_33:
>>>> ; X64-HSW-NEXT: movl %eax, %ecx
>>>> ; X64-HSW-NEXT: shll $5, %ecx
>>>>
>>>> Modified: llvm/trunk/test/CodeGen/X86/umul-with-overflow.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>>> X86/umul-with-overflow.ll?rev=319543&r1=319542&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/CodeGen/X86/umul-with-overflow.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/umul-with-overflow.ll Fri Dec 1
>>>> 06:07:38 2017
>>>> @@ -40,10 +40,10 @@ define i32 @test2(i32 %a, i32 %b) nounwi
>>>> ; X64-NEXT: leal (%rdi,%rdi), %eax
>>>> ; X64-NEXT: retq
>>>> entry:
>>>> - %tmp0 = add i32 %b, %a
>>>> - %tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32
>>>> %tmp0, i32 2)
>>>> - %tmp2 = extractvalue { i32, i1 } %tmp1, 0
>>>> - ret i32 %tmp2
>>>> + %tmp0 = add i32 %b, %a
>>>> + %tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32
>>>> %tmp0, i32 2)
>>>> + %tmp2 = extractvalue { i32, i1 } %tmp1, 0
>>>> + ret i32 %tmp2
>>>> }
>>>>
>>>> define i32 @test3(i32 %a, i32 %b) nounwind readnone {
>>>> @@ -64,8 +64,8 @@ define i32 @test3(i32 %a, i32 %b) nounwi
>>>> ; X64-NEXT: mull %ecx
>>>> ; X64-NEXT: retq
>>>> entry:
>>>> - %tmp0 = add i32 %b, %a
>>>> - %tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32
>>>> %tmp0, i32 4)
>>>> - %tmp2 = extractvalue { i32, i1 } %tmp1, 0
>>>> - ret i32 %tmp2
>>>> + %tmp0 = add i32 %b, %a
>>>> + %tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32
>>>> %tmp0, i32 4)
>>>> + %tmp2 = extractvalue { i32, i1 } %tmp1, 0
>>>> + ret i32 %tmp2
>>>> }
>>>>
>>>> Modified: llvm/trunk/test/Transforms/LoopStrengthReduce/X86/ivchain-X8
>>>> 6.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transfor
>>>> ms/LoopStrengthReduce/X86/ivchain-X86.ll?rev=319543&r1=31954
>>>> 2&r2=319543&view=diff
>>>> ============================================================
>>>> ==================
>>>> --- llvm/trunk/test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll
>>>> (original)
>>>> +++ llvm/trunk/test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll
>>>> Fri Dec 1 06:07:38 2017
>>>> @@ -13,14 +13,14 @@
>>>> ; X64-NEXT: .p2align
>>>> ; X64: %loop
>>>> ; no complex address modes
>>>> -; X64-NOT: (%{{[^)]+}},%{{[^)]+}},
>>>> +; X64-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
>>>> ;
>>>> ; X32: @simple
>>>> ; no expensive address computation in the preheader
>>>> ; X32-NOT: imul
>>>> ; X32: %loop
>>>> ; no complex address modes
>>>> -; X32-NOT: (%{{[^)]+}},%{{[^)]+}},
>>>> +; X32-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
>>>> define i32 @simple(i32* %a, i32* %b, i32 %x) nounwind {
>>>> entry:
>>>> br label %loop
>>>> @@ -103,7 +103,7 @@ exit:
>>>> ; X32-NOT: mov{{.*}}(%esp){{$}}
>>>> ; X32: %for.body{{$}}
>>>> ; no complex address modes
>>>> -; X32-NOT: (%{{[^)]+}},%{{[^)]+}},
>>>> +; X32-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
>>>> ; no reloads
>>>> ; X32-NOT: (%esp)
>>>> define void @extrastride(i8* nocapture %main, i32 %main_stride, i32*
>>>> nocapture %res, i32 %x, i32 %y, i32 %z) nounwind {
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171201/813252f0/attachment-0001.html>
More information about the llvm-commits
mailing list