[PATCH] D13294: LEA code size optimization pass (Part 1): Remove redundant address recalculations

Andrey Turetskiy via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 23 09:23:47 PST 2015


aturetsk added a comment.

Hi Quentin!

Thanks for the review.

My measurements show no significant compile-time impact of the pass in -Os mode.

There are plenty of things to improve in the patch. Here are my thoughts about it:

1. I believe the most needed thing in the pass is a proper heuristic to decide which address calculations should be replaced and which should not. The current code size impact of the pass is -0.2% in -Os mode and -0.3% in -Oz mode (including part 2 of the patch) on Spec 2000. That's not much, I was hoping for more when I started the development. However I think a good heuristic can improve the results. Moreover, a heuristic can probably discover some performance opportunities of the pass. Last time I measured, I got about 0% geomean change in performance on Spec 2000, however the measurements showed that some tests have significant improvement and some have significant degradation from the pass. So if a heuristic could help to cut off some of bad cases, we would get a performance gain as well.

2. MachineFunction scope. I implemented this as one of my experiments with the pass, but the changes didn't give any improvements (or even had a negative effect, I can't remember). I believe extended over-basic-block liveranges of LEA defs damage code. So to extend the scope we need to have some heuristic to understand when we want to use LEAs over basic blocks or not. I wasn't able to find one, Unfortunately I didn't save the patch, but extending the scope is an easy thing to implement - search for LEAs in basic blocks from DominatorTree in findLEAs and tweak calcInstrDist and chooseBestLEA to handle instructions from different basic blocks.

3. Some time ago I discovered that at the moment when the LEA pass works some instructions calculating addresses are not yet LEAs (they will be converted later after RA, you can see convertToThreeAddress for details). I've created an experimental patch to handle these not-yet-LEAs as well, but it didn't give any improvements, so I dropped it. So this change requires some heuristic as well, otherwise we can't be sure whether we improve code or make it worse.

4. The biggest part of code size improvement of the pass is from replacing %esp with other register in moves to/from stack (example: lea 256(%esp), %eax;  mov $111, 260(%esp)). That's profitable because if a new displacement fits 1 byte (and the old one does not), we save few bytes from different encoding of the instruction. However at the moment when the LEA pass works we don't have the actual displacement, only frame index and some displacement relative to it, so we can't really judge whether we should replace %esp with another register or not. So another possible improvement is to keep %esp cases unchanged until after RA, and than make a decision to replace %esp according to actual displacements.

That's my vision of what could be done to the pass in the future. I'm most interested in having a deeper analysis to understand when the replacement of address calculation is profitable, but I have no ideas how to do that.
About tests, I will add some soon to increase the coverage.


================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:43
@@ +42,3 @@
+
+  const char *getPassName() const override { return "X86 LEA Optimize"; }
+
----------------
Fixed.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:56
@@ +55,3 @@
+  /// \brief Choose the best LEA instruction from the list to replace address
+  /// calculation in MI instruction. Return the address displacement and the
+  /// distance between MI and the choosen LEA in AddrDispShift and Dist.
----------------
Fixed here and in the other places.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:60
@@ +59,3 @@
+                     const MachineInstr &MI, MachineInstr *&LEA,
+                     int64_t &AddrDispShift, int &Dist);
+
----------------
Fixed.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:61
@@ +60,3 @@
+                     int64_t &AddrDispShift, int &Dist);
+
+  /// \brief Returns true if two machine operand are identical and they are not
----------------
Extended the comment.
Put 'const' here and in other places where possible.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:69
@@ +68,3 @@
+
+  /// \brief Returns true if two instructions have memory operands that only
+  /// differ by displacement. The numbers of the first memory operands for both
----------------
Extended the comment.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:98
@@ +97,3 @@
+  const MachineBasicBlock *MBB = First.getParent();
+
+  // Both instructions must be in the same basic block.
----------------
Should I put "\p" before every argument name mentioned in every comment?

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:99
@@ +98,3 @@
+
+  // Both instructions must be in the same basic block.
+  assert(Last.getParent() == MBB &&
----------------
Fixed.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:101
@@ +100,3 @@
+  assert(Last.getParent() == MBB &&
+         "Instructions are in different basic blocks");
+
----------------
Fixed.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:105
@@ +104,3 @@
+         std::distance(MBB->begin(), MachineBasicBlock::const_iterator(&First));
+}
+
----------------
> // 3) Displacement of the new memory operand should fit in 1 byte if possible.
> // 4) The LEA should be as close to MI as possible, and prior to it if
> //    possible.

These two conditions define the best LEA. Choosing 1 byte displacement over the bigger one leads to saving few bytes through different instruction encoding. And we try to use the closest LEA to avoid significant increasing of LEA's def liverange.
And it just seems more right to me to give priority to the first LEA before MI than to the first LEA after MI regardless of the distance between LEAs and MI to avoid instruction moving.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:133
@@ +132,3 @@
+      continue;
+
+    // Make sure address displacement fits 4 bytes.
----------------
The case where this is needed is extremely rare. I used to have this in the initial version of the patch, but I wasn't able to write the test for it. So I just decided to drop it.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:173
@@ +172,3 @@
+  return MO1.isIdenticalTo(MO2) &&
+         (!MO1.isReg() ||
+          !TargetRegisterInfo::isPhysicalRegister(MO1.getReg()));
----------------
Fixed.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:176
@@ +175,3 @@
+}
+
+bool OptimizeLEAPass::isLEA(const MachineInstr &MI) {
----------------
To avoid large 'if' expressions I created a separate function to perform the check: isIdenticalOp.

================
Comment at: lib/Target/X86/X86OptimizeLEAs.cpp:221
@@ +220,3 @@
+// calculated by some LEA and replace their memory operands with its def
+// register.
+bool OptimizeLEAPass::removeRedundantAddrCalc(
----------------
Fixed.


http://reviews.llvm.org/D13294





More information about the llvm-commits mailing list