[PATCH] D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition.

Tue Jul 12 20:27:48 PDT 2016

t.p.northover added a comment.

For the giggles (really only that, as you might guess I view all numbers in this thread and any other LLVM benchmarking attempts with the deepest suspicion, I'm certainly not an experimental designer), I ran 3 tests on a Cyclone-like processor:

- Enable the PredictableSelectIsExpensive feature (this seems reasonable, like Kryo predictable branches are very very cheap on Cyclone).
- That plus the suggested heuristic
- The first with exactly the opposite of the suggested heuristic.

The SPEC2006 results are below (> 1  => improvement)

| Benchmark                     | suggested speedup | opposite speedup |
| 433.milc/433.milc             | 1.00235569837               | 0.996451239064              |
| 444.namd/444.namd             | 1.00085232996               | 1.00266512107              |
| 447.dealII/447.dealII         | 1.00232774224               | 1.00139244117              |
| 450.soplex/450.soplex         | 0.991283676704               | 1.0              |
| 470.lbm/470.lbm               | 1.00033869904               | 0.992413122292              |
| 400.perlbench/400.perlbench   | 1.0124059534               | 1.00875815688              |
| 401.bzip2/401.bzip2           | 0.996946748407               | 0.999879439635              |
| 403.gcc/403.gcc               | 1.00149565217               | 1.01734859727              |
| 429.mcf/429.mcf               | 1.00949811937               | 0.997803188194              |
| 445.gobmk/445.gobmk           | 0.973701955496               | 0.996549344375              |
| 456.hmmer/456.hmmer           | 1.00178618018               | 1.00164546294              |
| 458.sjeng/458.sjeng           | 1.00405505452               | 0.994367699255              |
| 462.libquantum/462.libquantum | 0.993502343417               | 0.989287229529              |
| 464.h264ref/464.h264ref       | 0.996045116438               | 1.00229694506              |
| 471.omnetpp/471.omnetpp       | 0.998019595581               | 1.00524934383              |
| 473.astar/473.astar           | 1.00208689727               | 0.998415807973              |
| 483.xalancbmk/483.xalancbmk   | 1.00291218638               | 0.987646150452              |

Geomeans were 0.99935572086 for the patch, 0.99951568293 for the complete opposite. Both worse than the status-quo, but whether in a statistically significant way, who knows?

I think the only conclusion we can really draw from this is that the LLVM project really needs to hire an actual scientist who specializes in designing experiments (not a computer scientist, not a mathematician who dabbles in programming) and give them some clout. We shouldn't be making these kinds of decisions based on ad-hoc runs of a handful of 10 year old benchmarks on ${RANDOM_HARDWARE}.

As for this patch, meh.

Tim.

http://reviews.llvm.org/D21299