[llvm] r257589 - LEA code size optimization pass (Part 2): Remove redundant LEA instructions.

Andrey Turetskiy via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 30 04:51:40 PDT 2016


Hi Philip,

I have no plans to do this in the near future. Your idea sounds
interesting, however please consider the following:

The reason why this particular part of the pass was enabled only for –Oz is
that I got small performance drop on Spec 2000 at –Os on Atom because of it
while the code size improve was even smaller. It’s not that I have some
proof that the transformation is mostly bad for performance (actually I
believe otherwise - execution of complex LEAs like ‘lea 0x12345678(%rax,
%rbx, 4), %rcx’ is usually costly so reducing their number is a potential
gain), I just didn’t want to risk in favour of insignificant code size gain
and since my primary concern was code size I didn’t do any performance
analysis.
So just note that restricting removing redundant LEAs to –Oz is not a solid
solution and there may be some changes in that in the future. Or may not :)
And if you have cases where removing redundant LEAs really helps code size
without hurting performance I think it would make sense to enable it at –Os
as well.


On Sat, Mar 26, 2016 at 12:24 AM, Philip Reames <listmail at philipreames.com>
wrote:

> I notice that this is structured as a per-basic block action, but is only
> enabled if the entire function is marked Oz.  Are there any plans to use
> block profiling to enable this in cold blocks on a non-Oz function?  I'd be
> very interested in seeing that happen.
>
> p.s. Sorry to revive a zombie thread; I came across this change due to the
> presentation at EuroLLVM which mentioned it.
>
> Philip
>
>
> On 01/13/2016 03:30 AM, Andrey Turetskiy via llvm-commits wrote:
>
>> Author: aturetsk
>> Date: Wed Jan 13 05:30:44 2016
>> New Revision: 257589
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=257589&view=rev
>> Log:
>> LEA code size optimization pass (Part 2): Remove redundant LEA
>> instructions.
>>
>> Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA
>> (in the same basic block) which calculates address differing only be a
>> displacement. Works only for -Oz.
>>
>> Differential Revision: http://reviews.llvm.org/D13295
>>
>>
>> Modified:
>>      llvm/trunk/lib/Target/X86/X86.h
>>      llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp
>>      llvm/trunk/test/CodeGen/X86/lea-opt.ll
>>
>> Modified: llvm/trunk/lib/Target/X86/X86.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86.h?rev=257589&r1=257588&r2=257589&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86.h (original)
>> +++ llvm/trunk/lib/Target/X86/X86.h Wed Jan 13 05:30:44 2016
>> @@ -54,7 +54,8 @@ FunctionPass *createX86PadShortFunctions
>>   /// instructions, in order to eliminate execution delays in some
>> processors.
>>   FunctionPass *createX86FixupLEAs();
>>   -/// Return a pass that removes redundant address recalculations.
>> +/// Return a pass that removes redundant LEA instructions and redundant
>> address
>> +/// recalculations.
>>   FunctionPass *createX86OptimizeLEAs();
>>     /// Return a pass that optimizes the code-size of x86 call sequences.
>> This is
>>
>> Modified: llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp?rev=257589&r1=257588&r2=257589&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp Wed Jan 13 05:30:44 2016
>> @@ -9,8 +9,10 @@
>>   //
>>   // This file defines the pass that performs some optimizations with LEA
>>   // instructions in order to improve code size.
>> -// Currently, it does one thing:
>> -// 1) Address calculations in load and store instructions are replaced by
>> +// Currently, it does two things:
>> +// 1) If there are two LEA instructions calculating addresses which only
>> differ
>> +//    by displacement inside a basic block, one of them is removed.
>> +// 2) Address calculations in load and store instructions are replaced by
>>   //    existing LEA def registers where possible.
>>   //
>>
>> //===----------------------------------------------------------------------===//
>> @@ -38,6 +40,7 @@ static cl::opt<bool> EnableX86LEAOpt("en
>>                                        cl::init(false));
>>     STATISTIC(NumSubstLEAs, "Number of LEA instruction substitutions");
>> +STATISTIC(NumRedundantLEAs, "Number of redundant LEA instructions
>> removed");
>>     namespace {
>>   class OptimizeLEAPass : public MachineFunctionPass {
>> @@ -71,6 +74,13 @@ private:
>>     /// \brief Returns true if the instruction is LEA.
>>     bool isLEA(const MachineInstr &MI);
>>   +  /// \brief Returns true if the \p Last LEA instruction can be
>> replaced by the
>> +  /// \p First. The difference between displacements of the addresses
>> calculated
>> +  /// by these LEAs is returned in \p AddrDispShift. It'll be used for
>> proper
>> +  /// replacement of the \p Last LEA's uses with the \p First's def
>> register.
>> +  bool isReplaceable(const MachineInstr &First, const MachineInstr &Last,
>> +                     int64_t &AddrDispShift);
>> +
>>     /// \brief Returns true if two instructions have memory operands that
>> only
>>     /// differ by displacement. The numbers of the first memory operands
>> for both
>>     /// instructions are specified through \p N1 and \p N2. The address
>> @@ -88,6 +98,9 @@ private:
>>     /// \brief Removes redundant address calculations.
>>     bool removeRedundantAddrCalc(const SmallVectorImpl<MachineInstr *>
>> &List);
>>   +  /// \brief Removes LEAs which calculate similar addresses.
>> +  bool removeRedundantLEAs(SmallVectorImpl<MachineInstr *> &List);
>> +
>>     DenseMap<const MachineInstr *, unsigned> InstrPos;
>>       MachineRegisterInfo *MRI;
>> @@ -194,6 +207,69 @@ bool OptimizeLEAPass::isLEA(const Machin
>>            Opcode == X86::LEA64r || Opcode == X86::LEA64_32r;
>>   }
>>   +// Check that the Last LEA can be replaced by the First LEA. To be so,
>> +// these requirements must be met:
>> +// 1) Addresses calculated by LEAs differ only by displacement.
>> +// 2) Def registers of LEAs belong to the same class.
>> +// 3) All uses of the Last LEA def register are replaceable, thus the
>> +//    register is used only as address base.
>> +bool OptimizeLEAPass::isReplaceable(const MachineInstr &First,
>> +                                    const MachineInstr &Last,
>> +                                    int64_t &AddrDispShift) {
>> +  assert(isLEA(First) && isLEA(Last) &&
>> +         "The function works only with LEA instructions");
>> +
>> +  // Compare instructions' memory operands.
>> +  if (!isSimilarMemOp(Last, 1, First, 1, AddrDispShift))
>> +    return false;
>> +
>> +  // Make sure that LEA def registers belong to the same class. There
>> may be
>> +  // instructions (like MOV8mr_NOREX) which allow a limited set of
>> registers to
>> +  // be used as their operands, so we must be sure that replacing one LEA
>> +  // with another won't lead to putting a wrong register in the
>> instruction.
>> +  if (MRI->getRegClass(First.getOperand(0).getReg()) !=
>> +      MRI->getRegClass(Last.getOperand(0).getReg()))
>> +    return false;
>> +
>> +  // Loop over all uses of the Last LEA to check that its def register is
>> +  // used only as address base for memory accesses. If so, it can be
>> +  // replaced, otherwise - no.
>> +  for (auto &MO : MRI->use_operands(Last.getOperand(0).getReg())) {
>> +    MachineInstr &MI = *MO.getParent();
>> +
>> +    // Get the number of the first memory operand.
>> +    const MCInstrDesc &Desc = MI.getDesc();
>> +    int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags,
>> MI.getOpcode());
>> +
>> +    // If the use instruction has no memory operand - the LEA is not
>> +    // replaceable.
>> +    if (MemOpNo < 0)
>> +      return false;
>> +
>> +    MemOpNo += X86II::getOperandBias(Desc);
>> +
>> +    // If the address base of the use instruction is not the LEA def
>> register -
>> +    // the LEA is not replaceable.
>> +    if (!isIdenticalOp(MI.getOperand(MemOpNo + X86::AddrBaseReg), MO))
>> +      return false;
>> +
>> +    // If the LEA def register is used as any other operand of the use
>> +    // instruction - the LEA is not replaceable.
>> +    for (unsigned i = 0; i < MI.getNumOperands(); i++)
>> +      if (i != (unsigned)(MemOpNo + X86::AddrBaseReg) &&
>> +          isIdenticalOp(MI.getOperand(i), MO))
>> +        return false;
>> +
>> +    // Check that the new address displacement will fit 4 bytes.
>> +    if (MI.getOperand(MemOpNo + X86::AddrDisp).isImm() &&
>> +        !isInt<32>(MI.getOperand(MemOpNo + X86::AddrDisp).getImm() +
>> +                   AddrDispShift))
>> +      return false;
>> +  }
>> +
>> +  return true;
>> +}
>> +
>>   // Check if MI1 and MI2 have memory operands which represent addresses
>> that
>>   // differ only by displacement.
>>   bool OptimizeLEAPass::isSimilarMemOp(const MachineInstr &MI1, unsigned
>> N1,
>> @@ -316,6 +392,81 @@ bool OptimizeLEAPass::removeRedundantAdd
>>     return Changed;
>>   }
>>   +// Try to find similar LEAs in the list and replace one with another.
>> +bool
>> +OptimizeLEAPass::removeRedundantLEAs(SmallVectorImpl<MachineInstr *>
>> &List) {
>> +  bool Changed = false;
>> +
>> +  // Loop over all LEA pairs.
>> +  auto I1 = List.begin();
>> +  while (I1 != List.end()) {
>> +    MachineInstr &First = **I1;
>> +    auto I2 = std::next(I1);
>> +    while (I2 != List.end()) {
>> +      MachineInstr &Last = **I2;
>> +      int64_t AddrDispShift;
>> +
>> +      // LEAs should be in occurence order in the list, so we can freely
>> +      // replace later LEAs with earlier ones.
>> +      assert(calcInstrDist(First, Last) > 0 &&
>> +             "LEAs must be in occurence order in the list");
>> +
>> +      // Check that the Last LEA instruction can be replaced by the
>> First.
>> +      if (!isReplaceable(First, Last, AddrDispShift)) {
>> +        ++I2;
>> +        continue;
>> +      }
>> +
>> +      // Loop over all uses of the Last LEA and update their operands.
>> Note that
>> +      // the correctness of this has already been checked in the
>> isReplaceable
>> +      // function.
>> +      for (auto UI = MRI->use_begin(Last.getOperand(0).getReg()),
>> +                UE = MRI->use_end();
>> +           UI != UE;) {
>> +        MachineOperand &MO = *UI++;
>> +        MachineInstr &MI = *MO.getParent();
>> +
>> +        // Get the number of the first memory operand.
>> +        const MCInstrDesc &Desc = MI.getDesc();
>> +        int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags,
>> MI.getOpcode()) +
>> +                      X86II::getOperandBias(Desc);
>> +
>> +        // Update address base.
>> +        MO.setReg(First.getOperand(0).getReg());
>> +
>> +        // Update address disp.
>> +        MachineOperand *Op = &MI.getOperand(MemOpNo + X86::AddrDisp);
>> +        if (Op->isImm())
>> +          Op->setImm(Op->getImm() + AddrDispShift);
>> +        else if (Op->isGlobal())
>> +          Op->setOffset(Op->getOffset() + AddrDispShift);
>> +        else
>> +          llvm_unreachable("Invalid address displacement operand");
>> +      }
>> +
>> +      // Since we can possibly extend register lifetime, clear kill
>> flags.
>> +      MRI->clearKillFlags(First.getOperand(0).getReg());
>> +
>> +      ++NumRedundantLEAs;
>> +      DEBUG(dbgs() << "OptimizeLEAs: Remove redundant LEA: ";
>> Last.dump(););
>> +
>> +      // By this moment, all of the Last LEA's uses must be replaced. So
>> we can
>> +      // freely remove it.
>> +      assert(MRI->use_empty(Last.getOperand(0).getReg()) &&
>> +             "The LEA's def register must have no uses");
>> +      Last.eraseFromParent();
>> +
>> +      // Erase removed LEA from the list.
>> +      I2 = List.erase(I2);
>> +
>> +      Changed = true;
>> +    }
>> +    ++I1;
>> +  }
>> +
>> +  return Changed;
>> +}
>> +
>>   bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {
>>     bool Changed = false;
>>   @@ -339,6 +490,11 @@ bool OptimizeLEAPass::runOnMachineFuncti
>>       if (LEAs.empty())
>>         continue;
>>   +    // Remove redundant LEA instructions. The optimization may have a
>> negative
>> +    // effect on performance, so do it only for -Oz.
>> +    if (MF.getFunction()->optForMinSize())
>> +      Changed |= removeRedundantLEAs(LEAs);
>> +
>>       // Remove redundant address calculations.
>>       Changed |= removeRedundantAddrCalc(LEAs);
>>     }
>>
>> Modified: llvm/trunk/test/CodeGen/X86/lea-opt.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/lea-opt.ll?rev=257589&r1=257588&r2=257589&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/lea-opt.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/lea-opt.ll Wed Jan 13 05:30:44 2016
>> @@ -129,3 +129,41 @@ sw.epilog:
>>   ; CHECK:      movl ${{[1-4]+}}, ([[REG2]])
>>   ; CHECK:      movl ${{[1-4]+}}, ([[REG3]])
>>   }
>> +
>> +define void @test4(i64 %x) nounwind minsize {
>> +entry:
>> +  %a = getelementptr inbounds [65 x %struct.anon1], [65 x
>> %struct.anon1]* @arr1, i64 0, i64 %x, i32 0
>> +  %tmp = load i32, i32* %a, align 4
>> +  %b = getelementptr inbounds [65 x %struct.anon1], [65 x
>> %struct.anon1]* @arr1, i64 0, i64 %x, i32 1
>> +  %tmp1 = load i32, i32* %b, align 4
>> +  %sub = sub i32 %tmp, %tmp1
>> +  %c = getelementptr inbounds [65 x %struct.anon1], [65 x
>> %struct.anon1]* @arr1, i64 0, i64 %x, i32 2
>> +  %tmp2 = load i32, i32* %c, align 4
>> +  %add = add nsw i32 %sub, %tmp2
>> +  switch i32 %add, label %sw.epilog [
>> +    i32 1, label %sw.bb.1
>> +    i32 2, label %sw.bb.2
>> +  ]
>> +
>> +sw.bb.1:                                          ; preds = %entry
>> +  store i32 111, i32* %b, align 4
>> +  store i32 222, i32* %c, align 4
>> +  br label %sw.epilog
>> +
>> +sw.bb.2:                                          ; preds = %entry
>> +  store i32 333, i32* %b, align 4
>> +  store i32 444, i32* %c, align 4
>> +  br label %sw.epilog
>> +
>> +sw.epilog:                                        ; preds = %sw.bb.2,
>> %sw.bb.1, %entry
>> +  ret void
>> +; CHECK-LABEL: test4:
>> +; CHECK:       leaq arr1+4({{.*}}), [[REG2:%[a-z]+]]
>> +; CHECK:       movl -4([[REG2]]), {{.*}}
>> +; CHECK:       subl ([[REG2]]), {{.*}}
>> +; CHECK:       addl 4([[REG2]]), {{.*}}
>> +; CHECK:       movl ${{[1-4]+}}, ([[REG2]])
>> +; CHECK:       movl ${{[1-4]+}}, 4([[REG2]])
>> +; CHECK:       movl ${{[1-4]+}}, ([[REG2]])
>> +; CHECK:       movl ${{[1-4]+}}, 4([[REG2]])
>> +}
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
>


-- 
Best regards,
Andrey Turetskiy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160330/00f1ddec/attachment.html>


More information about the llvm-commits mailing list