[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos
Kristof Beyls via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 20 00:56:48 PST 2017
kristof.beyls added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:290-294
+ case Interleave:
+ for (auto &I : IRT) {
+ OriginalMCID = &TII->get(I.OrigOpc);
+ for (auto &Repl : I.ReplOpc)
+ ReplInstrMCID.push_back(&TII->get(Repl));
----------------
az wrote:
> kristof.beyls wrote:
> > This still seems to be doing a bit of work, even though the information could already be cached from processing an earlier function.
> > I guess this could be improved by caching per Subpass PS, rather than per MCInstrDesc*?
> > I'm not sure if this is needed. Could you benchmark the compile-time impact as is on e.g. CTMark both when targeting Exynos (i.e. when shouldExitEarly returns false), and for another target (i.e. when shouldExitEarly returns true).
> Here are some compile time numbers I got by running CTMark. Every number represents the best of three runs. The first two columns are the ones that you asked for. The last two columns show the results when the whole pass is disabled. Even though there are some trends such as the A57 compile time numbers in column2 are in general lower than the numbers in column1, the difference is usually small that it can be considered noise.
>
> Let me know if you have any more comments.
>
> |Benchmark Exynos-m1 cortex-a57 exynos-m1 (pass disabled) cortex-a57 (pass disabled)|
> |7zip 123.7240 123.1800 122.6360 122.6640|
> |Bullet 82.6360 82.1200 82.0840 82.3680|
> |ClamAV 51.3680 51.1360 50.3680 50.7920|
> |consumer-typeset 34.4440 34.2800 34.3760 34.3800|
> |kimwitu++ 34.3680 34.2480 34.6080 34.2800|
> |lencod 49.8160 49.7760 49.2880 49.2560|
> |mafft 23.0160 23.0360 23.1760 23.0200|
> |SPASS 42.2720 42.1440 41.8800 41.7720|
> |sqlite3 25.0840 25.1520 24.9920 25.1200|
> |tramp3d-v4 37.1520 37.4240 37.4560 37.0240|
Thanks for the data!
I've added some geomean calculations:
| Benchmark |Exynos-m1 |cortex-a57 |exynos-m1(pass disabled) |cortex-a57(pass disabled) |
| 7zip |123.724 |123.18 |122.636 |122.664 |
| Bullet |82.636 |82.12 |82.084 |82.368 |
| ClamAV |51.368 |51.136 |50.368 |50.792 |
| consumer-typeset |34.444 |34.28 |34.376 |34.38 |
| kimwitu++ |34.368 |34.248 |34.608 |34.28 |
| lencod |49.816 |49.776 |49.288 |49.256 |
| mafft |23.016 |23.036 |23.176 |23.02 |
| SPASS |42.272 |42.144 |41.88 |41.772 |
| sqlite3 |25.084 |25.152 |24.992 |25.12 |
| tramp3d-v4 |37.152 |37.424 |37.456 |37.024 |
| GEOMEAN |44.14092425 |44.06844708 |43.9700723 |43.90935266 |
| COMPARED TO DISABLED |100.39% |100.36% |
So, compared to having the pass disabled, both when targeting Exynos-m1 and Cortex-A57, it seems this adds about 0.4% to the compile time.
If I've understood correctly from earlier comments, at the moment, for Cortex-A57, this pass isn't expected to make changes to generated code.
I guess that on this set of benchmarks, even for Exynos-m1, this pass may not make many, if any, changes?
If so, adding 0.4% compile time for not changing the program output seems a bit high to me.
@mcrosier : since you were concerned about compile time of this pass before, what do you think?
https://reviews.llvm.org/D38196
More information about the llvm-commits
mailing list