[PATCH] Add peephole optimization to use LEA instructions of Intel Atom

Gurd, Preston preston.gurd at intel.com
Mon Apr 15 18:21:35 PDT 2013


Hi Nadav,

For the current Atom, LEAs are not always better. Compared to an ADD, the LEA requires its inputs to be available early, so that its output can be available early. Ideally, the code generator should use LEA only for operations whose result will be used in address calculations. It is not clear to me how one would do this.

The current X86 code generator tries to use LEA for some kinds of 3 address code, in order to eliminate the possibility of an unwanted move instruction. I tried to change that code to defer the time when an LEA would be used, but that caused some code to get worse, so the existing code for generating LEAs is not changed by the patch.

Because the decision as to whether or not it is profitable to use an LEA depends on the number of cycles between the LEA and the next use of its output, the pass in the patch runs after the code is register allocated and scheduled.

The convertToThreeAddress function is not guarded by the AfterRegAlloc flag. If the flag is false, then the function behaves exactly as it did before. If the flag is true, then it allows moves to be converted to LEAs when appropriate and it prevents the conversion of certain instructions taking an immediate operand when  the form of the immediate operand would be rejected by addImm().

The patch uses the convertToThreeAddress function because it already handles all of the cases that would be needed.

For EEMBC the overall difference in performance was negligible, several of the benchmarks saw improvements of 10-20% with this patch. For SPEC2000, there was a minor gain of about  0.35%.

If there is anything else which you think I need to explain, or if you have a suggestion for a better approach, please let me know!

Preston


From: Nadav Rotem [mailto:nrotem at apple.com]
Sent: Thursday, April 11, 2013 7:36 PM
To: reviews+D660+public+a10049d8ed57ba7c at llvm-reviews.chandlerc.com<mailto:reviews+D660+public+a10049d8ed57ba7c at llvm-reviews.chandlerc.com>
Cc: Du Toit, Stefanus; Gurd, Preston; llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
Subject: Re: [PATCH] Add peephole optimization to use LEA instructions of Intel Atom

Hi Preston,

Why are you doing this optimization after register allocation, or even during instruction selection ?  If LEAs are better, why not use them all the time ?  Why are you going through the convertToThreeAddress, and not just do the conversion ?  Other users of convertToThreeAddress won't benefit from it because it is guarded by the isAfterRA flag, and you don't really need any of the logic in convertToThreeAddress ?

Thanks,
Nadav

On Apr 11, 2013, at 3:09 PM, Preston Gurd <preston.gurd at intel.com<mailto:preston.gurd at intel.com>> wrote:


Hi nadav, sdt,

This patch adds a pass which runs after register allocation when an Intel Atom processor is the target. The LEA instruction in the current Atom processor is specifically optimized to work best when its result is to be used as part of a memory address. This patch finds load or store instructions which use a base and/or an index register. It looks for an instruction in the current basic block which sets the base and/or index in the previous 5 instructions and then attempts to convert the instruction into an LEA, using the existing convertToThreeAddress code. For instance, it could convert an add or a move into an LEA. Since the result of an LEA is available earlier than the result of an ADD or a Move, there can be less delay in starting the load or store which references the register.

This is done after register allocation. The convertToThreeAddress function needed to have slightly different behaviour when invoked after register allocation. It was then useful to convert move instructions to LEAs. Also, some instructions defined to take immediate operands can only be converted to LEAs after register allocation if the immediate operand is an absolute number.

This patch adds a lit test which verifies that the replacement of an ADD by and LEA is NOT done if it is not needed.

Please review.


http://llvm-reviews.chandlerc.com/D660

Files:
 test/CodeGen/X86/atom-fixup-lea2.ll
 test/CodeGen/X86/lsr-static-addr.ll
 test/CodeGen/X86/atom-fixup-lea1.ll
 test/CodeGen/X86/atom-fixup-lea3.ll
 lib/Target/X86/X86.td
 lib/Target/X86/X86Subtarget.cpp
 lib/Target/X86/X86TargetMachine.cpp
 lib/Target/X86/X86InstrInfo.cpp
 lib/Target/X86/X86Subtarget.h
 lib/Target/X86/X86InstrInfo.h
 lib/Target/X86/X86.h
 lib/Target/X86/X86FixupLEAs.cpp
 lib/Target/X86/CMakeLists.txt
<D660.1.patch>_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130416/8db05113/attachment.html>


More information about the llvm-commits mailing list