[PATCH] D38279: [MachineScheduler] Enable latency heuristic based on scheduled lat.

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 26 15:31:12 PDT 2017


fhahn created this revision.
Herald added subscribers: javed.absar, kristof.beyls, aemerson.

The motivation of this patch is to improve scheduling for the test case
`test/CodeGen/AArch64/misched-sdiv.ll` with the MachineScheduler. A
similar test is part of `test/Codegen/ARM`.
I think ideally we would schedule SDIV as early as possibly as any instruction
scheduled before SDIV will increase the critical path. By how much
depends on the number of in-order pipeline stages.

The following happens during the test case. After scheduling the `sub` instruction,
both `sdiv` and `add` are added to the available bottom queue. When picking
the best candidate from the bottom queue, CurrZone.getCurrCycle()
returns 0, which plus the RemLatency is lower than the critical path,
so the latency heuristic is not used. I think
using the current cycle when scheduling top-down makes sense, as it is
that's the point where dispatching it later will impact the computed critical
path length. But when scheduling bottom-up, wouldn't it make sense to
use the latency already scheduled (at least when the candidate is on
the critical path), as this more accurately represents the cost of
scheduling the instruction?

There probably is a better way to handle this and I would appreciate
any input! PostRA scheduling does not take care of that case, as the
registers allocated prevent moving the SDIV instruction up and also is
disbaled on cores like Cortex-A72.

I did some initial benchmark runs on AArch64 with this patch:

- AArch64 Cortex-A72 LLVM test-suite & spec2k: -0.22% on execution time
- AArch64 Cortex-A57 SPEC2017: +0.74% on score


https://reviews.llvm.org/D38279

Files:
  lib/CodeGen/MachineScheduler.cpp
  test/CodeGen/AArch64/machine-combiner.ll
  test/CodeGen/AArch64/misched-sdiv.ll


Index: test/CodeGen/AArch64/misched-sdiv.ll
===================================================================
--- /dev/null
+++ test/CodeGen/AArch64/misched-sdiv.ll
@@ -0,0 +1,28 @@
+; RUN: llc < %s -mtriple=aarch64-unknown-linux -mcpu=cortex-a57 -verify-misched -debug-only=machine-scheduler -o - 2>&1 > /dev/null | FileCheck %s --check-prefix=CHECK --check-prefix=A57_SCHED
+; RUN: llc < %s -mtriple=aarch64-unknown-linux -mcpu=generic -verify-misched -debug-only=machine-scheduler -o - 2>&1 > /dev/null | FileCheck %s --check-prefix=CHECK --check-prefix=GENERIC
+
+; Check the latency for instructions for both generic and cortex-a57.
+; SDIV should be scheduled at the block's begin (20 cyc of independent M unit).
+;
+; CHECK:       ********** MI Scheduling **********
+; CHECK:      foo:BB#0 entry
+
+; CHECK:      ** Final schedule for BB#0 ***
+; GENERIC:    LDRWui
+; GENERIC:    SDIV
+; A57_SCHED:  SDIV
+; A57_SCHED:  LDRWui
+; CHECK:      ********** INTERVALS **********
+
+
+; Function Attrs: norecurse nounwind readnone
+define hidden i32 @foo(i32 %a, i32 %b, i32 %c, i32* %d) local_unnamed_addr #0 {
+entry:
+  %xor = xor i32 %c, %b
+  %ld = load i32, i32* %d
+  %add = add nsw i32 %xor, %ld
+  %div = sdiv i32 %a, %b
+  ;%div1 = sdiv i32 %add, %b
+  %sub = sub i32 %div, %add
+  ret i32 %sub
+}
Index: test/CodeGen/AArch64/machine-combiner.ll
===================================================================
--- test/CodeGen/AArch64/machine-combiner.ll
+++ test/CodeGen/AArch64/machine-combiner.ll
@@ -63,9 +63,9 @@
 ; CHECK-LABEL:   reassociate_adds5:
 ; CHECK:         fadd  s0, s0, s1
 ; CHECK-NEXT:    fadd  s1, s2, s3
+; CHECK-NEXT:    fadd  s2, s4, s5
 ; CHECK-NEXT:    fadd  s0, s0, s1
-; CHECK-NEXT:    fadd  s1, s4, s5
-; CHECK-NEXT:    fadd  s1, s1, s6
+; CHECK-NEXT:    fadd  s1, s2, s6
 ; CHECK-NEXT:    fadd  s0, s0, s1
 ; CHECK-NEXT:    fadd  s0, s0, s7
 ; CHECK-NEXT:    ret
Index: lib/CodeGen/MachineScheduler.cpp
===================================================================
--- lib/CodeGen/MachineScheduler.cpp
+++ lib/CodeGen/MachineScheduler.cpp
@@ -2441,7 +2441,9 @@
   // acyclic latency during PostRA, and highly out-of-order processors will
   // skip PostRA scheduling.
   if (!OtherResLimited) {
-    if (IsPostRA || (RemLatency + CurrZone.getCurrCycle() > Rem.CriticalPath)) {
+    unsigned Issued = CurrZone.isTop() ? CurrZone.getCurrCycle() :
+                                         CurrZone.getScheduledLatency();
+    if (IsPostRA || (RemLatency + Issued > Rem.CriticalPath)) {
       Policy.ReduceLatency |= true;
       DEBUG(dbgs() << "  " << CurrZone.Available.getName()
             << " RemainingLatency " << RemLatency << " + "


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D38279.116667.patch
Type: text/x-patch
Size: 2695 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170926/585c21c4/attachment-0001.bin>


More information about the llvm-commits mailing list