<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">Hi all,</span></div>

<div style=""><br>

</div>

<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">We're planning to turn on -consider-local-interval-cost for all targets which fixes some sub-optimal register allocation

 in certain cases. Since this is a target-independent change, we'd like to give people the opportunity to run their own numbers or raise any concerns.</span><br>

</div>

<div style=""><br>

</div>

<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">The option enables a more accurate consideration of local interval costs when selecting a split candidate. It is already

 enabled on X86 and we've seen the same issue (see below) on AArch64. We expect that this would also be a (latent) issue for other targets so enabling the option for all seems like the right thing to do.</span><br>

</div>

<div style=""><br>

</div>

<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">The tl;dr i</span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">s

 that turning on this option has a small impact on compile time and shows some positives and negatives on some individual benchmarks but no change in geomean for both SPEC2017 and the LLVM test suite on AArch64.</span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">

 The full details are below and </span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">on

</span><a href="https://reviews.llvm.org/D69437"><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">https://reviews.llvm.org/D69437</span></a><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">.

 The commit that added -consider-local-interval-cost is here</span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">

</span><a href="https://reviews.llvm.org/rL323870"><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">https://reviews.llvm.org/rL323870</span></a><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">.</span></div>

<div style=""><br>

</div>

<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">Please let us know what you think.</span><br>

</div>

<div style=""><br>

</div>

<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">Cheers,</span><br>

</div>

<div style=""><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; line-height: normal; color: rgb(0, 0, 0);">Sanne</span><br>

</div>

<div id="appendonsend"></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Sanne Wouda via Phabricator <reviews@reviews.llvm.org><br>

<b>Sent:</b> 25 October 2019 18:53<br>

<b>To:</b> Sanne Wouda <Sanne.Wouda@arm.com><br>

<b>Cc:</b> matze@braunis.de <matze@braunis.de>; quentin.colombet@gmail.com <quentin.colombet@gmail.com>; Kristof Beyls <Kristof.Beyls@arm.com>; hiraditya@msn.com <hiraditya@msn.com>; llvm-commits@lists.llvm.org <llvm-commits@lists.llvm.org>; t.p.northover@gmail.com

 <t.p.northover@gmail.com>; mcrosier@codeaurora.org <mcrosier@codeaurora.org>; john.reagan@vmssoftware.com <john.reagan@vmssoftware.com>; jji@us.ibm.com <jji@us.ibm.com>; yuanfang.chen@sony.com <yuanfang.chen@sony.com>; David Green <David.Green@arm.com><br>

<b>Subject:</b> [PATCH] D69437: [RAGreedy] Enable -consider-local-interval-cost by default</font>

<div> </div>

</div>

<div class="BodyFragment"><font size="2"><span style="font-size:11pt">

<div class="PlainText">sanwou01 created this revision.<br>

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls, qcolombet, MatzeB.<br>

Herald added a project: LLVM.<br>

sanwou01 added reviewers: SjoerdMeijer, samparker, dmgreen.<br>

<br>

The greedy register allocator occasionally decides to insert a large number of<br>

unnecessary copies, see below for an example.  The -consider-local-interval-cost<br>

option (which X86 already enables by default) fixes this.  We enable this option<br>

for all target backends.  To turn the new default behaviour off, use<br>

-consider-local-interval-cost=false.<br>

<br>

We evaluated the impact of this change on compile time, code size and<br>

performance benchmarks.<br>

<br>

This option has a small impact on compile time, measured on CTMark. A 0.1%<br>

geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%<br>

on individual benchmarks.<br>

<br>

The effect on both code size and performance on AArch64 for the LLVM test suite<br>

is nil on the geomean with individual outliers (ignoring short exec_times)<br>

between:<br>

<br>

  best     worst<br>

<br>

size..text     -3.3%    +0.0%<br>

exec_time      -5.8%    +2.3%<br>

<br>

On SPEC CPUŽ 2017 (compiled for AArch64) there is a minor reduction (-0.2% at<br>

most) in code size on some benchmarks, with a tiny movement (-0.01%) on the<br>

geomean.  Neither intrate nor fprate show any change in performance.<br>

<br>

This patch makes the following changes.<br>

<br>

- For all targets, enableAdvancedRASplitCost() returns true.  Individual targets can still override enableAdvancedRASplitCost() to force the previous behaviour.<br>

<br>

- Remove X86's now unncessary override.<br>

<br>

- Ensures that -consider-local-interval-cost=false overrides the new default behaviour.<br>

<br>

This matrix multiply example:<br>

<br>

     $ cat test.c<br>

     long A[8][8];<br>

     long B[8][8];<br>

     long C[8][8];<br>

  <br>

     void run_test() {<br>

       for (int k = 0; k < 8; k++) {<br>

         for (int i = 0; i < 8; i++) {<br>

          for (int j = 0; j < 8; j++) {<br>

            C[i][j] += A[i][k] * B[k][j];<br>

          }<br>

         }<br>

       }<br>

     }<br>

<br>

results in the following generated code on AArch64:<br>

<br>

  $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -<br>

  [...]<br>

                                        // %for.cond1.preheader<br>

                                        // =>This Inner Loop Header: Depth=1<br>

        add     x14, x11, x9<br>

        str     q0, [sp, #16]           // 16-byte Folded Spill<br>

        ldr     q0, [x14]<br>

        mov     v2.16b, v15.16b<br>

        mov     v15.16b, v14.16b<br>

        mov     v14.16b, v13.16b<br>

        mov     v13.16b, v12.16b<br>

        mov     v12.16b, v11.16b<br>

        mov     v11.16b, v10.16b<br>

        mov     v10.16b, v9.16b<br>

        mov     v9.16b, v8.16b<br>

        mov     v8.16b, v31.16b<br>

        mov     v31.16b, v30.16b<br>

        mov     v30.16b, v29.16b<br>

        mov     v29.16b, v28.16b<br>

        mov     v28.16b, v27.16b<br>

        mov     v27.16b, v26.16b<br>

        mov     v26.16b, v25.16b<br>

        mov     v25.16b, v24.16b<br>

        mov     v24.16b, v23.16b<br>

        mov     v23.16b, v22.16b<br>

        mov     v22.16b, v21.16b<br>

        mov     v21.16b, v20.16b<br>

        mov     v20.16b, v19.16b<br>

        mov     v19.16b, v18.16b<br>

        mov     v18.16b, v17.16b<br>

        mov     v17.16b, v16.16b<br>

        mov     v16.16b, v7.16b<br>

        mov     v7.16b, v6.16b<br>

        mov     v6.16b, v5.16b<br>

        mov     v5.16b, v4.16b<br>

        mov     v4.16b, v3.16b<br>

        mov     v3.16b, v1.16b<br>

        mov     x12, v0.d[1]<br>

        fmov    x15, d0<br>

        ldp     q1, q0, [x14, #16]<br>

        ldur    x1, [x10, #-256]<br>

        ldur    x2, [x10, #-192]<br>

        add     x9, x9, #64             // =64<br>

        mov     x13, v1.d[1]<br>

        fmov    x16, d1<br>

        ldr     q1, [x14, #48]<br>

        mul     x3, x15, x1<br>

        mov     x14, v0.d[1]<br>

        fmov    x17, d0<br>

        mov     x18, v1.d[1]<br>

        fmov    x0, d1<br>

        mov     v1.16b, v3.16b<br>

        mov     v3.16b, v4.16b<br>

        mov     v4.16b, v5.16b<br>

        mov     v5.16b, v6.16b<br>

        mov     v6.16b, v7.16b<br>

        mov     v7.16b, v16.16b<br>

        mov     v16.16b, v17.16b<br>

        mov     v17.16b, v18.16b<br>

        mov     v18.16b, v19.16b<br>

        mov     v19.16b, v20.16b<br>

        mov     v20.16b, v21.16b<br>

        mov     v21.16b, v22.16b<br>

        mov     v22.16b, v23.16b<br>

        mov     v23.16b, v24.16b<br>

        mov     v24.16b, v25.16b<br>

        mov     v25.16b, v26.16b<br>

        mov     v26.16b, v27.16b<br>

        mov     v27.16b, v28.16b<br>

        mov     v28.16b, v29.16b<br>

        mov     v29.16b, v30.16b<br>

        mov     v30.16b, v31.16b<br>

        mov     v31.16b, v8.16b<br>

        mov     v8.16b, v9.16b<br>

        mov     v9.16b, v10.16b<br>

        mov     v10.16b, v11.16b<br>

        mov     v11.16b, v12.16b<br>

        mov     v12.16b, v13.16b<br>

        mov     v13.16b, v14.16b<br>

        mov     v14.16b, v15.16b<br>

        mov     v15.16b, v2.16b<br>

        ldr     q2, [sp]                // 16-byte Folded Reload<br>

        fmov    d0, x3<br>

        mul     x3, x12, x1<br>

  [...]<br>

<br>

With -consider-local-interval-cost the same section of code results in the<br>

following:<br>

<br>

  $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -<br>

  [...]<br>

  .LBB0_1:                              // %for.cond1.preheader<br>

                                        // =>This Inner Loop Header: Depth=1<br>

        add     x14, x11, x9<br>

        ldp     q0, q1, [x14]<br>

        ldur    x1, [x10, #-256]<br>

        ldur    x2, [x10, #-192]<br>

        add     x9, x9, #64             // =64<br>

        mov     x12, v0.d[1]<br>

        fmov    x15, d0<br>

        mov     x13, v1.d[1]<br>

        fmov    x16, d1<br>

        ldp     q0, q1, [x14, #32]<br>

        mul     x3, x15, x1<br>

<br>

cmp     x9, #512                // =512<br>

<br>

        mov     x14, v0.d[1]<br>

        fmov    x17, d0<br>

        fmov    d0, x3<br>

        mul     x3, x12, x1<br>

  [...]<br>

<br>

<br>

Repository:<br>

  rG LLVM Github Monorepo<br>

<br>

<a href="https://reviews.llvm.org/D69437">https://reviews.llvm.org/D69437</a><br>

<br>

Files:<br>

  llvm/lib/CodeGen/RegAllocGreedy.cpp<br>

  llvm/lib/CodeGen/TargetSubtargetInfo.cpp<br>

  llvm/lib/Target/X86/X86Subtarget.h<br>

  llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll<br>

<br>

</div>

</span></font></div>

</body>

</html>