[llvm] [RISCV][PoC] Schedule RVV instructions with same type first (PR #95924)

Pengcheng Wang via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 19 20:06:26 PDT 2024


wangpc-pp wrote:

We need to do auto-vectorization to generate RVV instructions, so I just test the runtime of TSVC on K230 board.
Options (use `-mtune=sifive-p670` here just for vector scheduling):
* before: `clang -O2 -march=rv64gcv_zba_zbb_zbs_zbc -mtune=sifive-p670`
* after: `clang -O2 -march=rv64gcv_zba_zbb_zbs_zbc -mtune=sifive-p670 -mllvm -riscv-enable-schedule-same-vtype`

name| before | after | after/before
-- | -- | -- | --
s2102 | 73.541 | 30.665 | 0.416978
s252 | 7.994 | 5.858 | 0.7328
s114 | 57.398 | 50.594 | 0.881459
s1111 | 31.729 | 28.35 | 0.893504
s4116 | 13.206 | 11.828 | 0.895653
s1421 | 22.705 | 20.353 | 0.89641
s4115 | 16.842 | 15.114 | 0.897399
s471 | 6.411 | 5.818 | 0.907503
s3251 | 36.247 | 33.349 | 0.920049
s422 | 49.604 | 45.715 | 0.921599
s4112 | 14.28 | 13.164 | 0.921849
s152 | 14.872 | 13.712 | 0.922001
s1115 | 70.36 | 65.187 | 0.926478
s128 | 16.384 | 15.225 | 0.92926
s141 | 55.395 | 51.796 | 0.93503
s2710 | 10.834 | 10.207 | 0.942127
s424 | 23.916 | 22.763 | 0.95179
s1213 | 27.269 | 25.964 | 0.952143
s221 | 16.418 | 15.638 | 0.952491
s122 | 7.898 | 7.552 | 0.956191
s4114 | 16.081 | 15.385 | 0.956719
s1119 | 9.899 | 9.48 | 0.957672
s119 | 10.922 | 10.521 | 0.963285
s276 | 67.157 | 64.79 | 0.964754
s491 | 15.247 | 14.725 | 0.965764
s4113 | 18.565 | 17.954 | 0.967089
s323 | 27.369 | 26.769 | 0.978077
s118 | 36.679 | 35.953 | 0.980207
s279 | 13.361 | 13.124 | 0.982262
s353 | 19.059 | 18.731 | 0.98279
s256 | 22.725 | 22.37 | 0.984378
s222 | 18.044 | 17.763 | 0.984427
s243 | 14.326 | 14.125 | 0.98597
s115 | 34.492 | 34.13 | 0.989505
s1281 | 55.798 | 55.236 | 0.989928
s318 | 14.222 | 14.121 | 0.992898
s241 | 45.126 | 44.829 | 0.993418
s274 | 14.573 | 14.482 | 0.993756
s1251 | 55.382 | 55.073 | 0.994421
s111 | 11.033 | 10.981 | 0.995287
s351 | 54.015 | 53.796 | 0.995946
vif | 4.387 | 4.37 | 0.996125
s292 | 9.011 | 8.978 | 0.996338
s258 | 0.293 | 0.292 | 0.996587
s3110 | 17.651 | 17.594 | 0.996771
s116 | 64.726 | 64.517 | 0.996771
s321 | 11.351 | 11.32 | 0.997269
s242 | 8.126 | 8.104 | 0.997293
s442 | 11.146 | 11.118 | 0.997488
s453 | 14.043 | 14.012 | 0.997792
s3112 | 7.48 | 7.464 | 0.997861
s291 | 16.383 | 16.348 | 0.997864
s162 | 7.632 | 7.616 | 0.997904
s124 | 9.485 | 9.469 | 0.998313
s251 | 30.493 | 30.448 | 0.998524
s1221 | 4.267 | 4.261 | 0.998594
s341 | 25.323 | 25.292 | 0.998776
s313 | 36.626 | 36.582 | 0.998799
vdotr | 73.265 | 73.178 | 0.998813
s481 | 28.464 | 28.441 | 0.999192
s123 | 17.599 | 17.586 | 0.999261
s482 | 27.494 | 27.481 | 0.999527
s1113 | 25.105 | 25.094 | 0.999562
vtvtv | 29.63 | 29.619 | 0.999629
vag | 22.7 | 22.692 | 0.999648
s2111 | 52.88 | 52.865 | 0.999716
s322 | 10.58 | 10.577 | 0.999716
s332 | 14.874 | 14.87 | 0.999731
s3111 | 7.481 | 7.479 | 0.999733
s352 | 85.591 | 85.574 | 0.999801
vsumr | 62.344 | 62.334 | 0.99984
s1161 | 28.72 | 28.716 | 0.999861
s132 | 18.589 | 18.587 | 0.999892
s317 | 22.739 | 22.737 | 0.999912
s316 | 74.607 | 74.601 | 0.99992
s000 | 6.71 | 6.71 | 1
s113 | 15.401 | 15.401 | 1
s331 | 12.723 | 12.723 | 1
s314 | 84.298 | 84.299 | 1.000012
s277 | 22.83 | 22.835 | 1.000219
vpvpv | 29.128 | 29.135 | 1.00024
s312 | 83.805 | 83.826 | 1.000251
s3113 | 92.538 | 92.565 | 1.000292
s443 | 20.051 | 20.057 | 1.000299
s311 | 62.329 | 62.351 | 1.000353
s4121 | 7.283 | 7.286 | 1.000412
s271 | 29.64 | 29.657 | 1.000574
s342 | 29.462 | 29.485 | 1.000781
s1279 | 11.357 | 11.366 | 1.000792
s261 | 20.939 | 20.964 | 1.001194
vpvtv | 29.442 | 29.482 | 1.001359
s2244 | 7.904 | 7.915 | 1.001392
s2711 | 29.644 | 29.686 | 1.001417
s452 | 29.449 | 29.494 | 1.001528
s112 | 27.384 | 27.45 | 1.00241
s173 | 25.803 | 25.872 | 1.002674
s315 | 21.168 | 21.241 | 1.003449
s1351 | 40.024 | 40.163 | 1.003473
s244 | 21.168 | 21.243 | 1.003543
s232 | 4.743 | 4.767 | 1.00506
s278 | 16.781 | 16.872 | 1.005423
s2712 | 30.811 | 31.016 | 1.006653
s174 | 25.666 | 25.859 | 1.00752
vbor | 2.419 | 2.438 | 1.007854
s161 | 22.161 | 22.336 | 1.007897
s212 | 18.311 | 18.493 | 1.009939
s121 | 18.121 | 18.305 | 1.010154
s151 | 30.192 | 30.499 | 1.010168
vpvts | 5.893 | 5.953 | 1.010182
s127 | 10.654 | 10.764 | 1.010325
s175 | 6.044 | 6.107 | 1.010424
s131 | 30.186 | 30.502 | 1.010468
s176 | 5.869 | 5.933 | 1.010905
s171 | 5.608 | 5.673 | 1.011591
s431 | 56.242 | 56.915 | 1.011966
vtv | 56.171 | 56.866 | 1.012373
vpv | 56.18 | 56.909 | 1.012976
s2251 | 33.527 | 33.963 | 1.013004
s172 | 5.61 | 5.685 | 1.013369
s273 | 14.475 | 14.671 | 1.013541
s423 | 28.373 | 28.785 | 1.014521
s2101 | 18.673 | 18.945 | 1.014566
s319 | 120.721 | 122.631 | 1.015822
vas | 20.567 | 20.901 | 1.01624
s293 | 4.169 | 4.25 | 1.019429
s2233 | 251.383 | 256.287 | 1.019508
s451 | 17.201 | 17.579 | 1.021975
s126 | 22.997 | 23.542 | 1.023699
s257 | 42.741 | 43.761 | 1.023865
s235 | 459.815 | 470.903 | 1.024114
s231 | 232.829 | 238.524 | 1.02446
s441 | 15.573 | 15.96 | 1.024851
s281 | 28.693 | 29.479 | 1.027393
s1232 | 237.11 | 244.2 | 1.029902
s233 | 662.192 | 684.512 | 1.033706
s253 | 11.363 | 11.769 | 1.03573
s125 | 7.407 | 7.688 | 1.037937
s211 | 26.624 | 27.68 | 1.039663
s13110 | 17.562 | 18.347 | 1.044699
s275 | 38.142 | 39.928 | 1.046825
va | 24.023 | 25.205 | 1.049203
s421 | 25.037 | 26.426 | 1.055478
s1244 | 40.337 | 42.873 | 1.06287
s343 | 23.314 | 24.934 | 1.069486
s1112 | 14.517 | 15.864 | 1.092788
s272 | 11.442 | 12.529 | 1.095001
s2275 | 721.855 | 799.062 | 1.106956
s254 | 21.049 | 24.103 | 1.14509
s31111 | 12.804 | 16.004 | 1.249922
s255 | 7.586 | 9.541 | 1.257712
s4117 | 8.046 | 10.265 | 1.275789
Geomean|   |   | 0.993474

We can see some improvements and some regressions as well. In total, we don't have much gain here (about 0.65%).
The result is highly implementation-specific, and it may not be so convincing.
I will do more benchmarking and improve the heuristics.

https://github.com/llvm/llvm-project/pull/95924


More information about the llvm-commits mailing list