<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/107532>107532</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISCV] PreRA MachineScheduler introduces spills with MicroOpBufferSize=0
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:RISC-V,
mi-sched
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
lukel97
</td>
</tr>
</table>
<pre>
https://godbolt.org/z/rP5vfe4s5
```llvm
define <vscale x 8 x i64> @f(ptr %p0, ptr %p1, ptr %p2, ptr %p3, ptr %p4) {
%v0 = load <vscale x 8 x i64>, ptr %p0
%v1 = load <vscale x 8 x i64>, ptr %p1
%r0 = add <vscale x 8 x i64> %v0, %v1
%v2 = load <vscale x 8 x i64>, ptr %p2
%r1 = add <vscale x 8 x i64> %r0, %v2
%v3 = load <vscale x 8 x i64>, ptr %p3
%r2 = add <vscale x 8 x i64> %r1, %v3
%v4 = load <vscale x 8 x i64>, ptr %p4
%r3 = add <vscale x 8 x i64> %r2, %v4
ret <vscale x 8 x i64> %r3
}
```
In the above example, even though at most 2 m8 registers should be live throughout the block, we end up spilling when compiled with `llc -mtriple=riscv64 -mattr=+v`.
```asm
f: # @f
addi sp, sp, -16
csrr a5, vlenb
slli a5, a5, 3
mv a6, a5
slli a5, a5, 1
add a5, a5, a6
sub sp, sp, a5
vl8re64.v v8, (a0)
csrr a0, vlenb
slli a0, a0, 4
add a0, sp, a0
addi a0, a0, 16
vs8r.v v8, (a0) # Unknown-size Folded Spill
vl8re64.v v8, (a1)
csrr a0, vlenb
slli a0, a0, 3
add a0, sp, a0
addi a0, a0, 16
vs8r.v v8, (a0) # Unknown-size Folded Spill
vl8re64.v v8, (a2)
addi a0, sp, 16
vs8r.v v8, (a0) # Unknown-size Folded Spill
vl8re64.v v0, (a3)
vl8re64.v v8, (a4)
csrr a0, vlenb
slli a0, a0, 4
add a0, sp, a0
addi a0, a0, 16
vl8r.v v16, (a0) # Unknown-size Folded Reload
csrr a0, vlenb
slli a0, a0, 3
add a0, sp, a0
addi a0, a0, 16
vl8r.v v24, (a0) # Unknown-size Folded Reload
vsetvli a0, zero, e64, m8, ta, ma
vadd.vv v16, v16, v24
addi a0, sp, 16
vl8r.v v24, (a0) # Unknown-size Folded Reload
vadd.vv v24, v24, v0
vadd.vv v16, v16, v24
vadd.vv v8, v16, v8
csrr a0, vlenb
slli a0, a0, 3
mv a1, a0
slli a0, a0, 1
add a0, a0, a1
add sp, sp, a0
addi sp, sp, 16
ret
```
The cause is that the PreRA machine scheduler reorders the loads to hide the latencies, making the register pressure really high:
```llvm
# Machine code for function f: NoPHIs, TracksLiveness, TiedOpsRewritten
Function Live Ins: $x10 in %0, $x11 in %1, $x12 in %2, $x13 in %3, $x14 in %4
0B bb.0 (%ir-block.0):
liveins: $x10, $x11, $x12, $x13, $x14
32B %3:gpr = COPY $x13
48B %2:gpr = COPY $x12
64B %1:gpr = COPY $x11
80B %0:gpr = COPY $x10
96B %5:vrm8 = VL8RE64_V %0:gpr :: (load (<vscale x 1 x s512>) from %ir.p0)
112B %6:vrm8 = VL8RE64_V %1:gpr :: (load (<vscale x 1 x s512>) from %ir.p1)
144B %8:vrm8 = VL8RE64_V %2:gpr :: (load (<vscale x 1 x s512>) from %ir.p2)
160B %10:vrm8 = VL8RE64_V %3:gpr :: (load (<vscale x 1 x s512>) from %ir.p3)
168B %4:gpr = COPY $x14
208B %12:vrm8 = VL8RE64_V %4:gpr :: (load (<vscale x 1 x s512>) from %ir.p4)
216B %7:vrm8 = PseudoVADD_VV_M8 undef %7:vrm8(tied-def 0), %5:vrm8, %6:vrm8, -1, 6, 3
224B %16:vrm8 = PseudoVADD_VV_M8 undef %16:vrm8(tied-def 0), %8:vrm8, %10:vrm8, -1, 6, 3
232B %11:vrm8 = PseudoVADD_VV_M8 undef %11:vrm8(tied-def 0), %7:vrm8, %16:vrm8, -1, 6, 3
240B %13:vrm8 = PseudoVADD_VV_M8 undef %13:vrm8(tied-def 0), %11:vrm8, %12:vrm8, -1, 6, 3
248B $v8m8 = COPY %13:vrm8
256B PseudoRET implicit killed $v8m8
# End machine code for function f.
```
The scheduler actually is aware of the register pressure exceeding the target register pressure maximum, but proceeds to aggressively schedule for latency anyway:
```
Scheduling SU(14) $v8m8 = COPY %13:vrm8
Bottom Pressure:
VM=8
Cycle: 1 BotQ.A
Scheduling SU(13) %13:vrm8 = PseudoVADD_VV_M8 undef %13:vrm8(tied-def 0), %11:vrm8, %12:vrm8, -1, 6, 3
Bottom Pressure:
VM=16
Cycle: 2 BotQ.A
Scheduling SU(11) %11:vrm8 = PseudoVADD_VV_M8 undef %11:vrm8(tied-def 0), %7:vrm8, %16:vrm8, -1, 6, 3
Bottom Pressure:
VM=24
Cycle: 3 BotQ.A
Scheduling SU(10) %16:vrm8 = PseudoVADD_VV_M8 undef %16:vrm8(tied-def 0), %8:vrm8, %10:vrm8, -1, 6, 3
Bottom Pressure:
VM=32
VM: 32 <= 32(+ 0 livethru)
Cycle: 4 BotQ.A
Scheduling SU(7) %7:vrm8 = PseudoVADD_VV_M8 undef %7:vrm8(tied-def 0), %5:vrm8, %6:vrm8, -1, 6, 3
Bottom Pressure:
VM=40
VM: 40 > 32(+ 0 livethru)
```
The issue is that when MicroOpBufferSize=0, the machine scheduler won't consider scheduling any instructions unless they are fully ready, ignoring the register pressure.
```c++
// MicroOpBufferSize is the number of micro-ops that the processor may buffer
// for out-of-order execution.
//
// "0" means operations that are not ready in this cycle are not considered
// for scheduling (they go in the pending queue). Latency is paramount. This
// may be more efficient if many instructions are pending in a schedule.
//
// "1" means all instructions are considered for scheduling regardless of
// whether they are ready in this cycle. Latency still causes issue stalls,
// but we balance those stalls against other heuristics.
//
// "> 1" means the processor is out-of-order. This is a machine independent
// estimate of highly machine specific characteristics such as the register
// renaming pool and reorder buffer.
unsigned MicroOpBufferSize;
static const unsigned DefaultMicroOpBufferSize = 0;
```
This is the default when there is no scheduling model specified. If an OoO model is used, e.g. by passing `-mcpu=sifive-p450`, the register pressure checks kick in and we avoid spilling.
One possible way to address this could be by setting the default MicroOpBufferSize to 1, which is still in order but allows stalls.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzUWV1v2zrS_jXMzcCGRMlfF7lI4gZvgdO33bYnwF4d0OLY4oYStSQlx_31C5L6smO76fYUxRaBXFHDeZ4ZznCGEjNG7ErEWzK7J7P1DattrvStrJ9RrhY3G8UPt7m1lSHJHaGPhD7uFN8oaadK7wh9_Eboo_40a7aYmhmJ1iS6a6_zKPxJ2RRhiONWlAgkeWhMxiTCCyzhBcQ8Jck7IGm0JXRZWQ2EzqqI0AfobuLxDR3fJOOblNAVkMV9gAM31kRAkjVIxfgF4LGCaDwz_pGZ8WimDpiMX5oYiLnpHmfstgBNfwSajqHjN0DrHpq-hk5-BDoZQ9O3QMcddPIaOv0R6HQMnbwFmnbQ6TG0RnttWkd0sT4J7LGW9yXYHIFtVIOAL6yoJDo4bNA9UfUuB2ahUMYChWIJGnfCWNQGTK5qyWGDIEWDYHPtpFVtvcaNVNmz07RHwJJDXYGphJSi3ME-xxIyVVRCIoe9sDn4fMtgUlgtHIVkrYXJmnkKk4JZq0myJvS-IfNoejZZmWlzdUuSOyA0CWnZeir8Y5wL92sqxytcJ_H8WCgzWnvhmXvcSCw3rYCRUgxPwjU5nlw0LdK8FTl6ek5B_IoinIqwE4am3sCpGT1UI5ca5-m0JQLNMgTPkkWEri6YGp2aekrYC4RreoFwNCITnff7WE3v9sYstWN7TBQu_HML-2f5XKp9OTHiG8Kjkhw5fHGh9V0XxH-PC5L_YRfQVy44phbonyZFy_AXEow6zckrgpeNSc-v5y-L56urOGLrFzOev8FZl3z1GV01-W9i9ZcGamcbTf8u2xqDtulpf0OtfPWZe4DCr7Rl_v_sxM-M82nTdH7ufmh6dq0uhPXfb88pu6C5-4l-zIZTseVYavnW6HjDFtbXrfhcTJxTcKlujUTYBZmjunUu_MYCp0um0V7pZr7mCBmrDYIwYHMWWpFPGj_fQcGy3HXxJsuR1xI1aFSau07GCbk1NGAV5IJjGGEWy0ygCeH37DoXN951QFBpNKbWboRJeYBc7HJ32rh-mHBh9KHlkimOsFUatnWZWaFK8O3L_6tP__few37VLHs2f4gGSzRhRCD_WJnPuNfCWiyD0sduvhOF96UJXVD6EkcgStcOthts-hLH7Ujcj9B2hPYjSTuS9CNpO3LUg0b3JFptNtPI5Q6hM6Envu-b-mZjcMUKfI8oxrwGPgOPAX_ADToSeu_VeE7J3a7SvnN--Pjpn90ML5cuezl6Vq49OczTXi4-K9cG7zLq5aKzcm38rea93Iwkd40ull7w6Y_l53fz9K-nYw13wQ_LcGagy3EbH8MLmFlM_blhBVutCvC-nVZDDxfHg0PmFwHjnwUcOqY4HTy2vAhIfxZw6E_ieXC9MyO6CJj8LGAyAhxiJz271m0s0mgQdDovUEt_ltrQ3tB4CK_FGPCTwZqrp7v1-q-np78-LKEuOW7HYoQurUA-ccM-fMJZctY_fhiHUDgSuet8VCYoHWXL_G348fw6geUxgX6JLzAY5X8cv5FBfJ3B4oTBd3yQDjtBnLyRQXKdwYjhw1EwXaLQx13aLFv0NjpHWEF21kZMYPf53VcQRSVFJiw8C-nO3a2Wo3JFE3hX8r5WnqlP0--U36G8sszWvi4KA2zPNILaXiig-JIh8q7AWqZ3aM-IFexFFHXhXLKpLVRauWm-arPdzkmJBuWh5-CphzJ-AFYe9uxwqT6H2y9hoiPy5U9Cl3F4Mfd9d98ra1XhOg3PtEd5-kCS9ZGLHw6ZdM8hhntl_zG9u4CcBOTfFWtXDeqashOL6HWL4s6i35K_Vw2i6VmDkusGRZ1Bv2VLvGpQ0r9bdbd3kFAgyYOjl1Dfqt1D5Fsym-u6LzMn5qdXzV-01v-GenTV9DQ6Nj2NgCTvrtt9aTMTxtTDWcK_tvwgMq0-Vvf1dov6i_iGJFn7XtbtXK_PGHtVErqwkKnSCI66e-TcyMoDiNJYXfu91UBdSjT-OHIAt19ua7d9amT84BDErlT64ink_HvRjNB799e_c34k9PG1EcFGhLIuNqjdRl04kYmqRucov98aozQU7AAbP_lEsdtwVW0najvxZyvAF8xqZ930WPJkHqE0IpRCgaw0oCrULLjEgztflMoGT7hziM2FgczFaf-sczDyM5RGTnfx5_y7U0ERQoWlLz3_rrFGQldT-KMtGcJAxTQrVF3aKXzNhTnR7f2AUChXw7ZbkQksLYgtFK_W1vHskEQJrI-RzjGX_BIPfmFSvlY6GH5qqsYd09yHlNqeKN7naHPUQ6yd8e3gCGOFlOFkbdqkMJZJ6U6kJ4pdYd4jbJhkZeYO0sp0wsB2zNEH5aFzrLUwVmTmu7HhEnjkh-NgFOYo5MJC-aajz0dRcnTOx9KeKEdjRcGs703c-V0ehiSuMBNbkUGWM80yiy1bMHWWAzNHaXiiVmPJCrcGlVISWMm7lw1t2vQW16X_jMnP7St91hrLrOOhnPP6GWvcslra17nsNuKon35hcwsucibwoCdsb25l_G5QqnEoFYqj7DyCfArvt8BK-Kg-to-Egdog9y_vprspbA5QMWN8xs2jSZFVNUnWRmxFg5MqnXlC7a75utHLcsyeDTyL7NknS8ldTLFGCd5_RTra7z6WCJUyRmwkwp4dfEvIuQ7bqQvo7lvV5gAGre220c741160Cnzh2eciy519IQlECd1CWpeQam_a8J7e8NuEr5IVu8HbeEFn82iRRtFNfpswhsmMr2Yp44jJguJsO8828yTexjxGvBG3NKJptIrm0YIm8WqapXy1WaTxPMoy3PAFSSMsmJBTKZtiqvTuxmfhbRwtZgm9kWyD0vhP4ZRuWPaMJSfJ3ef3Xx4mTy5_6AOhtBATv6RuYLa-0bdO2WRT7wxJIymMNYN6K6z039adjicyW7dv0do3V1_6AidKqxWvMzRhZUz4nne2TN7UWp5-lBc2rzfTTBWEPvqXZOFnUmn1L8wsoY_eUkPoY2tsc0v_EwAA__94NNci">