[PATCH] D109958: [LoopFlatten] Enable it by default

Tue Oct 5 05:42:14 PDT 2021

SjoerdMeijer added a comment.

With the bootstrap failure fixed in D110234 <https://reviews.llvm.org/D110234> and another recently raised issue D110712 <https://reviews.llvm.org/D110712>, and having tested this more, I would like to pick this up again.

> It would be good to have some performance testing for this too.

Like I mentioned in the description, this gives a really good improvement on an embedded benchmark, but is generic enough to trigger a lot in for example the llvm test suite (and other). Because LoopFlatten removes an inner-loop, it is unlikely LoopFlatten makes things worse, and should be a case of "it should give the same or better performance". Supporting this with some data:

| Test                                                                               | # flattened loops | % diff |
| MultiSource/Applications/JM/lencod/lencod.test                                     | 3                 | -0.28  |
| MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test                          | 1                 | -9.04  |
| MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test                    | 3                 | -5.75  |
| MultiSource/Applications/JM/ldecod/ldecod.test                                     | 1                 | 0.97   |
| MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test                            | 3                 | 0.29   |
| MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test                                  | 17                | -0.37  |
| SingleSource/Benchmarks/Misc/himenobmtxpa.test                                     | 2                 | 0.09   |
| MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test/32  | 2                 | -1.27  |
| MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test/64  | 2                 | -0.47  |
| MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test/128 | 2                 | -0.24  |
| MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test/256 | 2                 | -0.18  |
| MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test                        | 20                | -0.21  |
| MultiSource/Benchmarks/Rodinia/pathfinder/pathfinder.test                          | 1                 | -0.84  |
| MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test                          | 1                 | 0.50   |
| MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test                    | 1                 | 0.11   |
| MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test                                | 2                 | -1.39  |
| MultiSource/Benchmarks/Rodinia/backprop/backprop.test                              | 1                 | -0.45  |
|

Negative numbers are reductions in exec times, so is better.
Take these numbers this with a little bit of salt because the test suite can be a bit noisy. But like I said, I think the take away message is that LoopFlatten is a nice simplification doing some good here and there (I actually haven't paid attention to it, but should help code-size a bit too I guess).

What do we think of this?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109958/new/

https://reviews.llvm.org/D109958