[llvm-commits] FW: Tuning LLVM Greedy Register Allocator to optimize for code size when targeting ARM Thumb 2 instruction set

Mon Jan 30 12:01:32 PST 2012

Resubmitting this patch without CMN fix that were submitted in a separate
patch. 

Thanks

-Zino

From: Zino Benaissa [mailto:zinob at codeaurora.org] 
Sent: Monday, January 23, 2012 5:12 PM
To: 'llvm-commits at cs.uiuc.edu'
Cc: 'rajav at codeaurora.org'
Subject: FW: Tuning LLVM Greedy Register Allocator to optimize for code size
when targeting ARM Thumb 2 instruction set 

Description: 

This contribution extends LLVM greedy Register Allocator to optimize for
code size when LLVM compiler targets ARM Thumb 2 instruction set. This
heuristic favors assigning register R0 through R7 to operands used in
instruction that can be encoded in 16 bits (16-bit is allowed only if R0-7
are used). Operands that appear most frequently in a function (and in
instructions that qualify) get R0-7 register.

This heuristic is turned on by default and has impact on generated code only
if -mthumb compiler switch is used. To turn this heuristic off use
-disable-favor-r0-7 feature flag. 

This patch modifies: 
1) The LLVM greedy register allocator located in LLVM/CodeGen directory: To
add the new code size heuristic.
2) The ARM-specific flies located in LLVM/Target/ARM directory: To add the
function that determines which instruction can be encoded in 16-bits and a
fix to enable the compiler to emit CMN instruction in 16-bits encoding. 
3) The LLVM test suite: fix test/CodeGen/Thumb2/thumb2-cmn.ll test.

Performance impact: 

I focused on -Os and -mthumb  flags. But observed similar improvement  with
-O3 and -mthumb. Runtime measured on Qualcomm 8660.

Code size:

-          SPEC2000  benchmarks between 0 to 0.6% code size reduction (with
no noticeable regression).    

-          EEMBC benchmarks between 0 to  6% reduction (no noticeable
regression).  Automotive and Networking average about 1% code size reduction
and Consumer about 0.5%.

Runtime:

-          SPEC2000 between -1% and 6% speed up (Spec2k/ammp 6%)

-          EEMBC overall averages faster -1 to 5%.

Modified:

   test/CodeGen/Thumb2/thumb2-cmn.ll

   include/llvm/Target/TargetInstrInfo.h

   include/llvm/CodeGen/LiveInterval.h

   lib/Target/ARM/Thumb2SizeReduction.cpp

   lib/Target/ARM/ARMBaseInstrInfo.cpp

   lib/Target/ARM/ARMBaseInstrInfo.h

   lib/CodeGen/RegAllocGreedy.cpp

   lib/CodeGen/CalcSpillWeights.cpp

for details see RACodeSize.txt 

Testing: 

See ARMTestSuiteResult.txt and ARMSimple-Os-mthumb.txt

Note -O3 is also completed on X86 and ARM CPUs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120130/b7c03d11/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ARMTestSuiteResult.txt
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120130/b7c03d11/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ARMsimple-Os-mthumb.txt
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120130/b7c03d11/attachment-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: RACodeSize.txt
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120130/b7c03d11/attachment-0002.txt>