[llvm] r257589 - LEA code size optimization pass (Part 2): Remove redundant LEA instructions.
Andrey Turetskiy via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 30 04:51:40 PDT 2016
Hi Philip,
I have no plans to do this in the near future. Your idea sounds
interesting, however please consider the following:
The reason why this particular part of the pass was enabled only for –Oz is
that I got small performance drop on Spec 2000 at –Os on Atom because of it
while the code size improve was even smaller. It’s not that I have some
proof that the transformation is mostly bad for performance (actually I
believe otherwise - execution of complex LEAs like ‘lea 0x12345678(%rax,
%rbx, 4), %rcx’ is usually costly so reducing their number is a potential
gain), I just didn’t want to risk in favour of insignificant code size gain
and since my primary concern was code size I didn’t do any performance
analysis.
So just note that restricting removing redundant LEAs to –Oz is not a solid
solution and there may be some changes in that in the future. Or may not :)
And if you have cases where removing redundant LEAs really helps code size
without hurting performance I think it would make sense to enable it at –Os
as well.
On Sat, Mar 26, 2016 at 12:24 AM, Philip Reames <listmail at philipreames.com>
wrote:
> I notice that this is structured as a per-basic block action, but is only
> enabled if the entire function is marked Oz. Are there any plans to use
> block profiling to enable this in cold blocks on a non-Oz function? I'd be
> very interested in seeing that happen.
>
> p.s. Sorry to revive a zombie thread; I came across this change due to the
> presentation at EuroLLVM which mentioned it.
>
> Philip
>
>
> On 01/13/2016 03:30 AM, Andrey Turetskiy via llvm-commits wrote:
>
>> Author: aturetsk
>> Date: Wed Jan 13 05:30:44 2016
>> New Revision: 257589
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=257589&view=rev
>> Log:
>> LEA code size optimization pass (Part 2): Remove redundant LEA
>> instructions.
>>
>> Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA
>> (in the same basic block) which calculates address differing only be a
>> displacement. Works only for -Oz.
>>
>> Differential Revision: http://reviews.llvm.org/D13295
>>
>>
>> Modified:
>> llvm/trunk/lib/Target/X86/X86.h
>> llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp
>> llvm/trunk/test/CodeGen/X86/lea-opt.ll
>>
>> Modified: llvm/trunk/lib/Target/X86/X86.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86.h?rev=257589&r1=257588&r2=257589&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86.h (original)
>> +++ llvm/trunk/lib/Target/X86/X86.h Wed Jan 13 05:30:44 2016
>> @@ -54,7 +54,8 @@ FunctionPass *createX86PadShortFunctions
>> /// instructions, in order to eliminate execution delays in some
>> processors.
>> FunctionPass *createX86FixupLEAs();
>> -/// Return a pass that removes redundant address recalculations.
>> +/// Return a pass that removes redundant LEA instructions and redundant
>> address
>> +/// recalculations.
>> FunctionPass *createX86OptimizeLEAs();
>> /// Return a pass that optimizes the code-size of x86 call sequences.
>> This is
>>
>> Modified: llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp?rev=257589&r1=257588&r2=257589&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86OptimizeLEAs.cpp Wed Jan 13 05:30:44 2016
>> @@ -9,8 +9,10 @@
>> //
>> // This file defines the pass that performs some optimizations with LEA
>> // instructions in order to improve code size.
>> -// Currently, it does one thing:
>> -// 1) Address calculations in load and store instructions are replaced by
>> +// Currently, it does two things:
>> +// 1) If there are two LEA instructions calculating addresses which only
>> differ
>> +// by displacement inside a basic block, one of them is removed.
>> +// 2) Address calculations in load and store instructions are replaced by
>> // existing LEA def registers where possible.
>> //
>>
>> //===----------------------------------------------------------------------===//
>> @@ -38,6 +40,7 @@ static cl::opt<bool> EnableX86LEAOpt("en
>> cl::init(false));
>> STATISTIC(NumSubstLEAs, "Number of LEA instruction substitutions");
>> +STATISTIC(NumRedundantLEAs, "Number of redundant LEA instructions
>> removed");
>> namespace {
>> class OptimizeLEAPass : public MachineFunctionPass {
>> @@ -71,6 +74,13 @@ private:
>> /// \brief Returns true if the instruction is LEA.
>> bool isLEA(const MachineInstr &MI);
>> + /// \brief Returns true if the \p Last LEA instruction can be
>> replaced by the
>> + /// \p First. The difference between displacements of the addresses
>> calculated
>> + /// by these LEAs is returned in \p AddrDispShift. It'll be used for
>> proper
>> + /// replacement of the \p Last LEA's uses with the \p First's def
>> register.
>> + bool isReplaceable(const MachineInstr &First, const MachineInstr &Last,
>> + int64_t &AddrDispShift);
>> +
>> /// \brief Returns true if two instructions have memory operands that
>> only
>> /// differ by displacement. The numbers of the first memory operands
>> for both
>> /// instructions are specified through \p N1 and \p N2. The address
>> @@ -88,6 +98,9 @@ private:
>> /// \brief Removes redundant address calculations.
>> bool removeRedundantAddrCalc(const SmallVectorImpl<MachineInstr *>
>> &List);
>> + /// \brief Removes LEAs which calculate similar addresses.
>> + bool removeRedundantLEAs(SmallVectorImpl<MachineInstr *> &List);
>> +
>> DenseMap<const MachineInstr *, unsigned> InstrPos;
>> MachineRegisterInfo *MRI;
>> @@ -194,6 +207,69 @@ bool OptimizeLEAPass::isLEA(const Machin
>> Opcode == X86::LEA64r || Opcode == X86::LEA64_32r;
>> }
>> +// Check that the Last LEA can be replaced by the First LEA. To be so,
>> +// these requirements must be met:
>> +// 1) Addresses calculated by LEAs differ only by displacement.
>> +// 2) Def registers of LEAs belong to the same class.
>> +// 3) All uses of the Last LEA def register are replaceable, thus the
>> +// register is used only as address base.
>> +bool OptimizeLEAPass::isReplaceable(const MachineInstr &First,
>> + const MachineInstr &Last,
>> + int64_t &AddrDispShift) {
>> + assert(isLEA(First) && isLEA(Last) &&
>> + "The function works only with LEA instructions");
>> +
>> + // Compare instructions' memory operands.
>> + if (!isSimilarMemOp(Last, 1, First, 1, AddrDispShift))
>> + return false;
>> +
>> + // Make sure that LEA def registers belong to the same class. There
>> may be
>> + // instructions (like MOV8mr_NOREX) which allow a limited set of
>> registers to
>> + // be used as their operands, so we must be sure that replacing one LEA
>> + // with another won't lead to putting a wrong register in the
>> instruction.
>> + if (MRI->getRegClass(First.getOperand(0).getReg()) !=
>> + MRI->getRegClass(Last.getOperand(0).getReg()))
>> + return false;
>> +
>> + // Loop over all uses of the Last LEA to check that its def register is
>> + // used only as address base for memory accesses. If so, it can be
>> + // replaced, otherwise - no.
>> + for (auto &MO : MRI->use_operands(Last.getOperand(0).getReg())) {
>> + MachineInstr &MI = *MO.getParent();
>> +
>> + // Get the number of the first memory operand.
>> + const MCInstrDesc &Desc = MI.getDesc();
>> + int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags,
>> MI.getOpcode());
>> +
>> + // If the use instruction has no memory operand - the LEA is not
>> + // replaceable.
>> + if (MemOpNo < 0)
>> + return false;
>> +
>> + MemOpNo += X86II::getOperandBias(Desc);
>> +
>> + // If the address base of the use instruction is not the LEA def
>> register -
>> + // the LEA is not replaceable.
>> + if (!isIdenticalOp(MI.getOperand(MemOpNo + X86::AddrBaseReg), MO))
>> + return false;
>> +
>> + // If the LEA def register is used as any other operand of the use
>> + // instruction - the LEA is not replaceable.
>> + for (unsigned i = 0; i < MI.getNumOperands(); i++)
>> + if (i != (unsigned)(MemOpNo + X86::AddrBaseReg) &&
>> + isIdenticalOp(MI.getOperand(i), MO))
>> + return false;
>> +
>> + // Check that the new address displacement will fit 4 bytes.
>> + if (MI.getOperand(MemOpNo + X86::AddrDisp).isImm() &&
>> + !isInt<32>(MI.getOperand(MemOpNo + X86::AddrDisp).getImm() +
>> + AddrDispShift))
>> + return false;
>> + }
>> +
>> + return true;
>> +}
>> +
>> // Check if MI1 and MI2 have memory operands which represent addresses
>> that
>> // differ only by displacement.
>> bool OptimizeLEAPass::isSimilarMemOp(const MachineInstr &MI1, unsigned
>> N1,
>> @@ -316,6 +392,81 @@ bool OptimizeLEAPass::removeRedundantAdd
>> return Changed;
>> }
>> +// Try to find similar LEAs in the list and replace one with another.
>> +bool
>> +OptimizeLEAPass::removeRedundantLEAs(SmallVectorImpl<MachineInstr *>
>> &List) {
>> + bool Changed = false;
>> +
>> + // Loop over all LEA pairs.
>> + auto I1 = List.begin();
>> + while (I1 != List.end()) {
>> + MachineInstr &First = **I1;
>> + auto I2 = std::next(I1);
>> + while (I2 != List.end()) {
>> + MachineInstr &Last = **I2;
>> + int64_t AddrDispShift;
>> +
>> + // LEAs should be in occurence order in the list, so we can freely
>> + // replace later LEAs with earlier ones.
>> + assert(calcInstrDist(First, Last) > 0 &&
>> + "LEAs must be in occurence order in the list");
>> +
>> + // Check that the Last LEA instruction can be replaced by the
>> First.
>> + if (!isReplaceable(First, Last, AddrDispShift)) {
>> + ++I2;
>> + continue;
>> + }
>> +
>> + // Loop over all uses of the Last LEA and update their operands.
>> Note that
>> + // the correctness of this has already been checked in the
>> isReplaceable
>> + // function.
>> + for (auto UI = MRI->use_begin(Last.getOperand(0).getReg()),
>> + UE = MRI->use_end();
>> + UI != UE;) {
>> + MachineOperand &MO = *UI++;
>> + MachineInstr &MI = *MO.getParent();
>> +
>> + // Get the number of the first memory operand.
>> + const MCInstrDesc &Desc = MI.getDesc();
>> + int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags,
>> MI.getOpcode()) +
>> + X86II::getOperandBias(Desc);
>> +
>> + // Update address base.
>> + MO.setReg(First.getOperand(0).getReg());
>> +
>> + // Update address disp.
>> + MachineOperand *Op = &MI.getOperand(MemOpNo + X86::AddrDisp);
>> + if (Op->isImm())
>> + Op->setImm(Op->getImm() + AddrDispShift);
>> + else if (Op->isGlobal())
>> + Op->setOffset(Op->getOffset() + AddrDispShift);
>> + else
>> + llvm_unreachable("Invalid address displacement operand");
>> + }
>> +
>> + // Since we can possibly extend register lifetime, clear kill
>> flags.
>> + MRI->clearKillFlags(First.getOperand(0).getReg());
>> +
>> + ++NumRedundantLEAs;
>> + DEBUG(dbgs() << "OptimizeLEAs: Remove redundant LEA: ";
>> Last.dump(););
>> +
>> + // By this moment, all of the Last LEA's uses must be replaced. So
>> we can
>> + // freely remove it.
>> + assert(MRI->use_empty(Last.getOperand(0).getReg()) &&
>> + "The LEA's def register must have no uses");
>> + Last.eraseFromParent();
>> +
>> + // Erase removed LEA from the list.
>> + I2 = List.erase(I2);
>> +
>> + Changed = true;
>> + }
>> + ++I1;
>> + }
>> +
>> + return Changed;
>> +}
>> +
>> bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {
>> bool Changed = false;
>> @@ -339,6 +490,11 @@ bool OptimizeLEAPass::runOnMachineFuncti
>> if (LEAs.empty())
>> continue;
>> + // Remove redundant LEA instructions. The optimization may have a
>> negative
>> + // effect on performance, so do it only for -Oz.
>> + if (MF.getFunction()->optForMinSize())
>> + Changed |= removeRedundantLEAs(LEAs);
>> +
>> // Remove redundant address calculations.
>> Changed |= removeRedundantAddrCalc(LEAs);
>> }
>>
>> Modified: llvm/trunk/test/CodeGen/X86/lea-opt.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/lea-opt.ll?rev=257589&r1=257588&r2=257589&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/lea-opt.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/lea-opt.ll Wed Jan 13 05:30:44 2016
>> @@ -129,3 +129,41 @@ sw.epilog:
>> ; CHECK: movl ${{[1-4]+}}, ([[REG2]])
>> ; CHECK: movl ${{[1-4]+}}, ([[REG3]])
>> }
>> +
>> +define void @test4(i64 %x) nounwind minsize {
>> +entry:
>> + %a = getelementptr inbounds [65 x %struct.anon1], [65 x
>> %struct.anon1]* @arr1, i64 0, i64 %x, i32 0
>> + %tmp = load i32, i32* %a, align 4
>> + %b = getelementptr inbounds [65 x %struct.anon1], [65 x
>> %struct.anon1]* @arr1, i64 0, i64 %x, i32 1
>> + %tmp1 = load i32, i32* %b, align 4
>> + %sub = sub i32 %tmp, %tmp1
>> + %c = getelementptr inbounds [65 x %struct.anon1], [65 x
>> %struct.anon1]* @arr1, i64 0, i64 %x, i32 2
>> + %tmp2 = load i32, i32* %c, align 4
>> + %add = add nsw i32 %sub, %tmp2
>> + switch i32 %add, label %sw.epilog [
>> + i32 1, label %sw.bb.1
>> + i32 2, label %sw.bb.2
>> + ]
>> +
>> +sw.bb.1: ; preds = %entry
>> + store i32 111, i32* %b, align 4
>> + store i32 222, i32* %c, align 4
>> + br label %sw.epilog
>> +
>> +sw.bb.2: ; preds = %entry
>> + store i32 333, i32* %b, align 4
>> + store i32 444, i32* %c, align 4
>> + br label %sw.epilog
>> +
>> +sw.epilog: ; preds = %sw.bb.2,
>> %sw.bb.1, %entry
>> + ret void
>> +; CHECK-LABEL: test4:
>> +; CHECK: leaq arr1+4({{.*}}), [[REG2:%[a-z]+]]
>> +; CHECK: movl -4([[REG2]]), {{.*}}
>> +; CHECK: subl ([[REG2]]), {{.*}}
>> +; CHECK: addl 4([[REG2]]), {{.*}}
>> +; CHECK: movl ${{[1-4]+}}, ([[REG2]])
>> +; CHECK: movl ${{[1-4]+}}, 4([[REG2]])
>> +; CHECK: movl ${{[1-4]+}}, ([[REG2]])
>> +; CHECK: movl ${{[1-4]+}}, 4([[REG2]])
>> +}
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
>
--
Best regards,
Andrey Turetskiy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160330/00f1ddec/attachment.html>
More information about the llvm-commits
mailing list