<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">Hi all,</span></div>
<div style=""><br>
</div>
<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">We're planning to turn on -consider-local-interval-cost for all targets which fixes some sub-optimal register allocation
in certain cases. Since this is a target-independent change, we'd like to give people the opportunity to run their own numbers or raise any concerns.</span><br>
</div>
<div style=""><br>
</div>
<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">The option enables a more accurate consideration of local interval costs when selecting a split candidate. It is already
enabled on X86 and we've seen the same issue (see below) on AArch64. We expect that this would also be a (latent) issue for other targets so enabling the option for all seems like the right thing to do.</span><br>
</div>
<div style=""><br>
</div>
<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">The tl;dr i</span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">s
that turning on this option has a small impact on compile time and shows some positives and negatives on some individual benchmarks but no change in geomean for both SPEC2017 and the LLVM test suite on AArch64.</span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">
The full details are below and </span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">on
</span><a href="https://reviews.llvm.org/D69437"><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">https://reviews.llvm.org/D69437</span></a><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">.
The commit that added -consider-local-interval-cost is here</span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">
</span><a href="https://reviews.llvm.org/rL323870"><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">https://reviews.llvm.org/rL323870</span></a><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">.</span></div>
<div style=""><br>
</div>
<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">Please let us know what you think.</span><br>
</div>
<div style=""><br>
</div>
<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">Cheers,</span><br>
</div>
<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">Sanne</span><br>
</div>
<div id="appendonsend"></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Sanne Wouda via Phabricator <reviews@reviews.llvm.org><br>
<b>Sent:</b> 25 October 2019 18:53<br>
<b>To:</b> Sanne Wouda <Sanne.Wouda@arm.com><br>
<b>Cc:</b> matze@braunis.de <matze@braunis.de>; quentin.colombet@gmail.com <quentin.colombet@gmail.com>; Kristof Beyls <Kristof.Beyls@arm.com>; hiraditya@msn.com <hiraditya@msn.com>; llvm-commits@lists.llvm.org <llvm-commits@lists.llvm.org>; t.p.northover@gmail.com
<t.p.northover@gmail.com>; mcrosier@codeaurora.org <mcrosier@codeaurora.org>; john.reagan@vmssoftware.com <john.reagan@vmssoftware.com>; jji@us.ibm.com <jji@us.ibm.com>; yuanfang.chen@sony.com <yuanfang.chen@sony.com>; David Green <David.Green@arm.com><br>
<b>Subject:</b> [PATCH] D69437: [RAGreedy] Enable -consider-local-interval-cost by default</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt">
<div class="PlainText">sanwou01 created this revision.<br>
Herald added subscribers: llvm-commits, hiraditya, kristof.beyls, qcolombet, MatzeB.<br>
Herald added a project: LLVM.<br>
sanwou01 added reviewers: SjoerdMeijer, samparker, dmgreen.<br>
<br>
The greedy register allocator occasionally decides to insert a large number of<br>
unnecessary copies, see below for an example. The -consider-local-interval-cost<br>
option (which X86 already enables by default) fixes this. We enable this option<br>
for all target backends. To turn the new default behaviour off, use<br>
-consider-local-interval-cost=false.<br>
<br>
We evaluated the impact of this change on compile time, code size and<br>
performance benchmarks.<br>
<br>
This option has a small impact on compile time, measured on CTMark. A 0.1%<br>
geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%<br>
on individual benchmarks.<br>
<br>
The effect on both code size and performance on AArch64 for the LLVM test suite<br>
is nil on the geomean with individual outliers (ignoring short exec_times)<br>
between:<br>
<br>
best worst<br>
<br>
size..text -3.3% +0.0%<br>
exec_time -5.8% +2.3%<br>
<br>
On SPEC CPUŽ 2017 (compiled for AArch64) there is a minor reduction (-0.2% at<br>
most) in code size on some benchmarks, with a tiny movement (-0.01%) on the<br>
geomean. Neither intrate nor fprate show any change in performance.<br>
<br>
This patch makes the following changes.<br>
<br>
- For all targets, enableAdvancedRASplitCost() returns true. Individual targets can still override enableAdvancedRASplitCost() to force the previous behaviour.<br>
<br>
- Remove X86's now unncessary override.<br>
<br>
- Ensures that -consider-local-interval-cost=false overrides the new default behaviour.<br>
<br>
This matrix multiply example:<br>
<br>
$ cat test.c<br>
long A[8][8];<br>
long B[8][8];<br>
long C[8][8];<br>
<br>
void run_test() {<br>
for (int k = 0; k < 8; k++) {<br>
for (int i = 0; i < 8; i++) {<br>
for (int j = 0; j < 8; j++) {<br>
C[i][j] += A[i][k] * B[k][j];<br>
}<br>
}<br>
}<br>
}<br>
<br>
results in the following generated code on AArch64:<br>
<br>
$ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -<br>
[...]<br>
// %for.cond1.preheader<br>
// =>This Inner Loop Header: Depth=1<br>
add x14, x11, x9<br>
str q0, [sp, #16] // 16-byte Folded Spill<br>
ldr q0, [x14]<br>
mov v2.16b, v15.16b<br>
mov v15.16b, v14.16b<br>
mov v14.16b, v13.16b<br>
mov v13.16b, v12.16b<br>
mov v12.16b, v11.16b<br>
mov v11.16b, v10.16b<br>
mov v10.16b, v9.16b<br>
mov v9.16b, v8.16b<br>
mov v8.16b, v31.16b<br>
mov v31.16b, v30.16b<br>
mov v30.16b, v29.16b<br>
mov v29.16b, v28.16b<br>
mov v28.16b, v27.16b<br>
mov v27.16b, v26.16b<br>
mov v26.16b, v25.16b<br>
mov v25.16b, v24.16b<br>
mov v24.16b, v23.16b<br>
mov v23.16b, v22.16b<br>
mov v22.16b, v21.16b<br>
mov v21.16b, v20.16b<br>
mov v20.16b, v19.16b<br>
mov v19.16b, v18.16b<br>
mov v18.16b, v17.16b<br>
mov v17.16b, v16.16b<br>
mov v16.16b, v7.16b<br>
mov v7.16b, v6.16b<br>
mov v6.16b, v5.16b<br>
mov v5.16b, v4.16b<br>
mov v4.16b, v3.16b<br>
mov v3.16b, v1.16b<br>
mov x12, v0.d[1]<br>
fmov x15, d0<br>
ldp q1, q0, [x14, #16]<br>
ldur x1, [x10, #-256]<br>
ldur x2, [x10, #-192]<br>
add x9, x9, #64 // =64<br>
mov x13, v1.d[1]<br>
fmov x16, d1<br>
ldr q1, [x14, #48]<br>
mul x3, x15, x1<br>
mov x14, v0.d[1]<br>
fmov x17, d0<br>
mov x18, v1.d[1]<br>
fmov x0, d1<br>
mov v1.16b, v3.16b<br>
mov v3.16b, v4.16b<br>
mov v4.16b, v5.16b<br>
mov v5.16b, v6.16b<br>
mov v6.16b, v7.16b<br>
mov v7.16b, v16.16b<br>
mov v16.16b, v17.16b<br>
mov v17.16b, v18.16b<br>
mov v18.16b, v19.16b<br>
mov v19.16b, v20.16b<br>
mov v20.16b, v21.16b<br>
mov v21.16b, v22.16b<br>
mov v22.16b, v23.16b<br>
mov v23.16b, v24.16b<br>
mov v24.16b, v25.16b<br>
mov v25.16b, v26.16b<br>
mov v26.16b, v27.16b<br>
mov v27.16b, v28.16b<br>
mov v28.16b, v29.16b<br>
mov v29.16b, v30.16b<br>
mov v30.16b, v31.16b<br>
mov v31.16b, v8.16b<br>
mov v8.16b, v9.16b<br>
mov v9.16b, v10.16b<br>
mov v10.16b, v11.16b<br>
mov v11.16b, v12.16b<br>
mov v12.16b, v13.16b<br>
mov v13.16b, v14.16b<br>
mov v14.16b, v15.16b<br>
mov v15.16b, v2.16b<br>
ldr q2, [sp] // 16-byte Folded Reload<br>
fmov d0, x3<br>
mul x3, x12, x1<br>
[...]<br>
<br>
With -consider-local-interval-cost the same section of code results in the<br>
following:<br>
<br>
$ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -<br>
[...]<br>
.LBB0_1: // %for.cond1.preheader<br>
// =>This Inner Loop Header: Depth=1<br>
add x14, x11, x9<br>
ldp q0, q1, [x14]<br>
ldur x1, [x10, #-256]<br>
ldur x2, [x10, #-192]<br>
add x9, x9, #64 // =64<br>
mov x12, v0.d[1]<br>
fmov x15, d0<br>
mov x13, v1.d[1]<br>
fmov x16, d1<br>
ldp q0, q1, [x14, #32]<br>
mul x3, x15, x1<br>
<br>
cmp x9, #512 // =512<br>
<br>
mov x14, v0.d[1]<br>
fmov x17, d0<br>
fmov d0, x3<br>
mul x3, x12, x1<br>
[...]<br>
<br>
<br>
Repository:<br>
rG LLVM Github Monorepo<br>
<br>
<a href="https://reviews.llvm.org/D69437">https://reviews.llvm.org/D69437</a><br>
<br>
Files:<br>
llvm/lib/CodeGen/RegAllocGreedy.cpp<br>
llvm/lib/CodeGen/TargetSubtargetInfo.cpp<br>
llvm/lib/Target/X86/X86Subtarget.h<br>
llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll<br>
<br>
</div>
</span></font></div>
</body>
</html>