[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos

Mon Nov 20 00:56:48 PST 2017

kristof.beyls added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:290-294
+  case Interleave:
+    for (auto &I : IRT) {
+      OriginalMCID = &TII->get(I.OrigOpc);
+      for (auto &Repl : I.ReplOpc)
+        ReplInstrMCID.push_back(&TII->get(Repl));
----------------
az wrote:
> kristof.beyls wrote:
> > This still seems to be doing a bit of work, even though the information could already be cached from processing an earlier function.
> > I guess this could be improved by caching per Subpass PS, rather than per MCInstrDesc*?
> > I'm not sure if this is needed. Could you benchmark the compile-time impact as is on e.g. CTMark both when targeting Exynos (i.e. when shouldExitEarly returns false), and for another target (i.e. when shouldExitEarly returns true).
> Here are some compile time numbers I got by running CTMark. Every number represents the best of three runs.  The first two columns are the ones that you asked for. The last two columns show the results when the whole pass is disabled. Even though there are some trends such as the A57 compile time numbers in column2 are in general lower than the numbers in column1, the difference is usually small that it can be considered noise.
> 
> Let me know if you have any more comments.
> 
> |Benchmark         Exynos-m1               cortex-a57          exynos-m1 (pass disabled)     cortex-a57 (pass disabled)|
> |7zip                        123.7240                   123.1800                     122.6360                            122.6640|
> |Bullet                       82.6360                     82.1200                       82.0840                              82.3680|
> |ClamAV                    51.3680                    51.1360                        50.3680                             50.7920|
> |consumer-typeset    34.4440                    34.2800                        34.3760                             34.3800|
> |kimwitu++               34.3680                    34.2480                        34.6080                              34.2800|
> |lencod                      49.8160                    49.7760                        49.2880                              49.2560|
> |mafft                        23.0160                    23.0360                         23.1760                              23.0200|
> |SPASS                      42.2720                     42.1440                        41.8800                              41.7720|
> |sqlite3                      25.0840                     25.1520                        24.9920                             25.1200|
> |tramp3d-v4             37.1520                      37.4240                        37.4560                              37.0240|
Thanks for the data!
I've added some geomean calculations:

| Benchmark	|Exynos-m1	|cortex-a57	|exynos-m1(pass disabled)	|cortex-a57(pass disabled) |
| 7zip	|123.724	|123.18	|122.636	|122.664 |
| Bullet	|82.636	|82.12	|82.084	|82.368 |
| ClamAV	|51.368	|51.136	|50.368	|50.792 |
| consumer-typeset	|34.444	|34.28	|34.376	|34.38 |
| kimwitu++	|34.368	|34.248	|34.608	|34.28 |
| lencod	|49.816	|49.776	|49.288	|49.256 |
| mafft	|23.016	|23.036	|23.176	|23.02 |
| SPASS	|42.272	|42.144	|41.88	|41.772 |
| sqlite3	|25.084	|25.152	|24.992	|25.12 |
| tramp3d-v4	|37.152	|37.424	|37.456	|37.024 |
| GEOMEAN	|44.14092425	|44.06844708	|43.9700723	|43.90935266 |
| COMPARED TO DISABLED	|100.39%	|100.36% |

So, compared to having the pass disabled, both when targeting Exynos-m1 and Cortex-A57, it seems this adds about 0.4% to the compile time.
If I've understood correctly from earlier comments, at the moment, for Cortex-A57, this pass isn't expected to make changes to generated code.
I guess that on this set of benchmarks, even for Exynos-m1, this pass may not make many, if any, changes?
If so, adding 0.4% compile time for not changing the program output seems a bit high to me.
@mcrosier : since you were concerned about compile time of this pass before, what do you think?

https://reviews.llvm.org/D38196