<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/71424>71424</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[SCEV][IndVarSimplification] Run time regression in Embench/primecount benchmark
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:RISC-V,
loopoptim,
llvm:optimizations,
llvm:SCEV
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
LebedevRI,
nikic,
preames,
sjoerdmeijer,
xortator
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
martong
</td>
</tr>
</table>
<pre>
There is a run time regression in [Embench](https://github.com/embench/embench-iot)'s primecount benchmark since LLVM release 15.0. I have found with git bisect that the offending change is https://reviews.llvm.org/D129710 .
I have made the measurements with `-target riscv32 -march=rv32imc -mabi=ilp32` and I've been using an instruction accurate simulator which handles all instruction as taking exactly one cycle. The number of cycles is ~50% more in LLVM 15.0 than in previous releases.
[Primecount](https://github.com/embench/embench-iot/blob/4c5eb87983f51ca7fcf7855306877b3d1c3aabf1/src/primecount/primecount.c#L106) counts the number of primes up to a certain constant, it contains a _**loop nest**_ of depth 3 and there are `goto`s to break out from the middle loop.
I could reduce the initial code (linked above) to this simplest case where there is still some meaningful difference in the run time:
```
int a[2];
int b() {
int c[2];
int d = 0;
a[0] = c[0] = ++d;
while (1) {
for (int e = 0; e < d; ++e) {
if (a[e] > 2)
goto f;
while (c[e] < 3)
c[e] += a[e];
}
break;
f:
++d;
}
return 3;
}
```
which is equivalent to the below code if we get rid of the `goto`:
```
int a[2];
int b() {
int c[2];
int d = 0;
int exit_outer_loop = 0;
a[0] = c[0] = ++d;
while (!exit_outer_loop) {
for (int e = 0; e < d; ++e) {
if (a[e] > 2) {
++d;
exit_outer_loop = 0; // ensures the outer loop runs again
break; // exits the inner loop
}
while (c[e] < 3)
c[e] += a[e];
exit_outer_loop = 1; // ensures the outer loop terminates if inner loop completes
}
}
return 3;
}
```
What happens here is that the offending commit enables an optimization of the induction variable `c[e]` in the innermost loop. During `rewriteLoopExitValues` we move the calculation of the final exit value of the IV into the loop's preheader.
```
; Preheader:
while.cond3.preheader: ; preds = %for.body
%arrayidx4 = getelementptr inbounds [2 x i32], ptr %c, i32 0, i32 %e.011
%arrayidx4.promoted = load i32, ptr %arrayidx4, align 4, !tbaa !4
br label %while.cond3
...
*** IR Dump After IndVarSimplifyPass on while.cond3 ***
; Preheader:
while.cond3.preheader: ; preds = %for.body
%arrayidx4 = getelementptr inbounds [2 x i32], ptr %c, i32 0, i32 %e.011
%arrayidx4.promoted = load i32, ptr %arrayidx4, align 4, !tbaa !4
%smax = call i32 @llvm.smax.i32(i32 %arrayidx4.promoted, i32 3)
%1 = sub i32 %smax, %arrayidx4.promoted
%umin = call i32 @llvm.umin.i32(i32 %1, i32 1)
%2 = sub i32 %1, %umin
%umax = call i32 @llvm.umax.i32(i32 %0, i32 1)
%3 = udiv i32 %2, %umax
%4 = add i32 %umin, %3
%5 = mul i32 %0, %4
br label %while.cond3
```
This was not happening before the offending commit because the SCEV expression that divides with `1 umax %0` was considered unsafe to expand (`isSafeToExpand`). Now, after the offending commit, in `rewriteLoopExitValues` we discover that this expansion has a high cost (`isHighCostExpansion`), but since the loop can be deleted we do the expansion anyway. So, the offending commit is logically correct, but it reveals an underlying weakness.
I was thinking about as a possible solution, to consider the product of the trip counts of the containing loops and compare that to the gain we have with the expansion. More formally, do the expansion only if `I * J * X < I * J * X'`, where `I` and `J` are the trip count of the outermost and middle loops. `X` is the number of all operations before expansion in the innermost loop, `X'` is the same, but after the expansion; this needs to calculate with the trip count of the innermost loop. The problem is that SCEV is not able to infer the loop trip count (backedge taken count neither) for neither of these loops. This is perhaps because the induction variable of the innermost loop is dependent on the IV of an outer loop. Any pointers on SCEV's limitation could be helpful at this time. I was also experimenting to use `Attributor`'s `AAPotentialValues` to see if the dataflow framework could discover the value of `c[e]`, unfortunately it could not. (It reached the top.)
Any feed-back is more than welcome. I've assigned people with history in the related components, my intention was solely to draw their attention. Other than this, perhaps discussing the issue and possible fix would be more appropriate to be done in discourse and an RFC?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsWEtv4zjy_zTMpRBBoiI7PuSQJ_4Z9Py30d3onVuDEksWJxKpJSk72cN-9kWVHrbT7pnGYLGnFQJH4qOqWM9fUYVgthbxRhR3QsoPWKLG3adnIaWQ90JKa15MtXz1HlWHYfkOvzv0ukPzO_pl8NX5qKLjgeLhQg2xcf6mUz46u70onX67-dKgRzABFPjBQjQdgsetxxCMs2AsiOLusSvRVo0oHoS8bmLsg8hvhXwS8mlrYjOUSeU6IZ9wWre8XRoXhdwIuQ7Qe9Nh5QYbgec65V8gGFshfPjw9Vfw2KIKCFmRpAk8Q6N2CLUbrIa9iQ1sTYTSBKwixEbRD4Kra7Ta2C1UjbJbPsipfB53Bvchadtdlzi_FfLpIZObdZZCItIHkd6OvxO_Tmlkyh2qMHjs0MYw8her9DIqv8UI3oRql0u47JSvGpE_-F0uTVfRQGlE_mDaPpdilYKyGp6FXO8QSkQLQyBhFSk2RD9UkZSsqmrwKiIE0w0tGQz2jakaaJTVLQZQbXu6IUBUL0QJX1UV2zdwFqF6q1pM4EuDYIeuRA-uHgcD6eVfRSpkAZ0je9tR56Rr0iYbuidduSHMlggnChLF3cfFgn_VE57K1pVCPl1VBZbX6811XhdZpdZ1Va-viyJPV9frdZnrrMqVKutMyKfgKyGfDs5z8pFUQuYfsnQl5AZ4JLD1DufntQGGHqIDBRX6qIyFytkQFVG7BxPpk4YpCr4JeSvkbetcDxZDHD-_ES2NfWwgZ6NGjhvlkfxi66ITqzQQj9KjegE3RKi960ZfMlq3CETync9Vbmg1eNRDNbqdsSYa1ULlNIKQ162xL6hBlW6HdMboIDYmkKf0LYYIFYXMnoWJcyiHaNoWguvYja2x23poQZu6Ro8Ub8YyszngyYSjWKt0-uNPYyMoUdxR8hD53WGwFPKapBHraRCAhqv3a8dhDSJ_gPQwSjRTUTzweHX8IeSdkHf6iMC-MS1rInvHEKB2niaIBS4s-PUeiMREDU83mpo2kQg4cn0ESSlqIUsPGRTqIzHoWUSpDpvvIT9sPozLO5JnZnJCR6wfDh_sLEfT9WIKXvqdNo42e4yDt5Avs8vcOyuOqcQEwH8MZqdatHF0I0pJrduPvmZq2COMyU2Ts9P8wbX_2y7CRn018ZsbIvpvHI3vlvw1NxIye0f3P-RX_PzAud4vO2fa8fnRmWHMsYCWqtKY5HgZZxWK5ABqq4w9JTf717L91UwZ0lg7bT7ecfCvP3X26fkpnz9_ruzPzxXRd8aqSBWsPpIZKkf5L2I4G1iH15-OkvH37wQsGtX3aAPM6fQc2nBdZyKgVSXXZwuuj6Yz_1Rcn6fwMVZPBXunvKGlFFGzxggeTImYD9a5EMcaAQ-DJy5ilXrcexPxg3P946uJX1U7YKCde4TO7caiUam2ItxwxLo2VrWsddjRnnn8-SuF1hj9o_MzMsMGlUafwEnFf6eh_A4-ziuXfMB-klTO6jzpj2bhDx6i1HvUYQrWonY-ITS62E8Wynv1ZvTrFa_ZYsSWwVgfPRhbEiwMBE0lvILJOaXIe6BZIYuKy3ouIZ1fhCwwSbPsHIek965zEcck1DqlmeKB3LKSxlRrthb4VcgslkrR_6uZcOmhVSW2tO9IN8d6TZJTYCWnP3j-BA9D18NtTQHwbPVX5T9ToTf120cVAjgLRzRh2XlC7n9m-gkzCVmETr2OpYPhNXG_SrlNoJmEaV9PQn0vxCzxcUYUssiYYhjK-ThEaxTiHJHDxqGjVuucNDRzKk02M89Omcv3zLOJM5E44fWjkw_fnTz9Aa-cCQza7GZmcmGmXo8Wjp6htJ7XsTDj0vxoXcHruqGFY85E4OdC61w6_0JQea8CWDendcqrJdZuxMvfp_QSKzWEcfLz_eNXwNd-boa5EGizMxoPPWEGozpJYMrLKnBzYTR61DDYoGokvIWvPfUNBEFWqQmfVY1f3CMPktRyk8D_uz27Lof_OenYFvbP6oI2oXI7JsGFi6Af8eEzNIqanMZsG6io4Mzy_J_ZNvcuxMd55SgUcSyHOPXpc9WASlkoETRSDdbMdKwpB0bKvu3VWwKfHdE4q2sToHVbQ174BpXzHqs4MzQRPO5QtVxdB6vRt2-0d4_qxWII7zop0ntsjOWuWJXUf_FJexeCocobXDtQhWRp3GIjlqz3jir1XCSjN_3cTU5DU4NIxEkBgXtAAiGKHUktkJowGOmDbxPYR07UksCv5Hq18x2dmoT5TnPOtm-MJFfpM-V4-IV_f2MEdjIi5JrNdD91gLRjvnUQq_QXfp88_XCo-UyMsxh10PqjJjUktPs3Bijv-2nKGK5Hz2gjzKF0EP4souFIJoos70w0qA5ncx98fiFF5Yed1yLVILLZhHOOFPv9qd5jqS-jfcsWuwXMcWCbMS0wLIsOjK0nCUbkeSAs5HWpqhfUW4SoXtBO4xYNtdyE76llmD4nOcKiSk5CJkCPvlF9OMkwZxDi2WPQfo09Wk3dm7MzliOD2CPAnMCtfYPeGRvRM16gozLIa01n4ogRx1uHEqHBtq-HFuY0EU2HCYzBpNrASQu9oYJOnh8dkOBild7G6E05ROfZ_9aBB28_ukhLVXvIR9FBQG4vSWStoqqp56y96nDv_MskzFHOwgNiPUXL5CuDrZ2PA7UEFCRx2m5dTMhOz5Q1VNWgHr3D9Qm3X0epghRUI-pLsinptRtrgaKwbSvHGhhv7KYrWQ09ur6dvK4xITr_ttyhIDnkmAucRRsDidnRPOuCIJsKlH1I4OhAe7WnncaDitOSBP7GrsNSkCUY1Uz-QqoZAt8bsmOEMCBH7JLaavMK-9mmfBzV99713lCsRMep2lm-92FFDz6MJJSFT0_3In-60De53uQbdYE32WqzkRsp8-KiuUnr4kqtcllJna03xdV6g9n1OivllS5X5Xp1YW5kKvMsS1fZOl9nRbIpsrrK1mUmq1WVq424SrFTpl2uYC_4CDfr7EpeXXBFD9OlN4eZ1SK__fT8-f7y63KRTb7NHdZhpN11Ir89brvC-8nR9_nq29_Q2GU5bANhHRPi4Ur4IprY8r07bygeRHF3Ar5NxfSpy_10_o78cbnuPHfLfTH49uYPLktZ3PHfZe_d71wGn1hLQcgnVtS_AwAA__9BAXAZ">