[PATCH] D36432: [IPSCCP] Add function specialization ability
Matthew Simpson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 9 09:51:56 PDT 2017
mssimpso added a comment.
In https://reviews.llvm.org/D36432#835343, @davide wrote:
> Very neat work, thanks, this was in my todolist for a while :)
> Some meta comments, I'll try to find some time to review this more carefully this week.
Thanks for taking a look!
> I wonder if we're at a point where it makes sense to split the lattice solver from the transforms, as with your partial specialization code SCCP is growing a lot :)
> This would have also a bunch of other advantages, e.g. we could more easily try to plug arbitrary lattices (for example for variables/ranges/bits).
> IIRC Chris Lattner started a propagation enging a while ago, it should still be in tree, but unused. What are your thoughts?
Yes, I think I see what you're talking about. Analysis/SparsePropagation.h? I haven't looked at this closely yet, but I can see some advantages to using it. Any idea why SCCP hasn't already been switched over to use it?
> That said, the approach I had in mind (as mentioned on llvm-dev) was that of:
> 1. implementing "real" jump functions for IPSCCP. To the best of my understanding complicated jump functions never got traction (so, e.g. GCC implements only constant and passthrough).
> 2. Propagate the information down the DAG of SCC (of the callgraph). From what I can see your approach is instead iterative so everytime you specialize something, you do another round of solving. I'm not sure whether this converges more slowly, or if it matters in practice, probably worth taking some numbers.
Right, I've incorporated specialization as an iterative solve. I'm not an expert in this area, but the iterative approach seemed to be the most straightforward extension of our current IPSCCP implementation (which seems to be more-or-less the Wegman, Zadeck approach). It makes us iterate, but we only need to visit the specializations on subsequent iterations.
(In the case of indirect call promotion, we could also revisit the called functions, but the current patch doesn't do this. For example, after call promotion, a function may no longer be address-taken, enabling us to then track it's arguments. We would have to reset their lattice state, though, since they would have already made it to overdefined.)
Am I right in thinking the comment about jump functions is about our IPSCCP infrastructure in general, rather than specialization specifically? If I understand correctly, jump functions (like in Callahan et al.) are a trade-off between the precision of the analysis and the time/space needed to solve it. They summarize functions and then propagate the summaries through call sites. I think our current IPSCCP implementation should (at least in theory) be more precise than this. For example, if we find out via constant propagation that a call is not executed, this could enable more propagation/specialization, etc. But again, this is a trade-off since we may have to revisit a large portion of the module. I would definitely appreciate your thoughts.
> I think your heuristic is on the right track (it's the same one I used in an early prototype), but it would be nice to collect numbers to tune it further.
Here are some numbers (AArch64, LTO). It looks like the current tuning is fairly conservative for the LLVM test suite and SPEC. We only hit a handful of the benchmarks, and of the ones we do, spec2017/mcf is the big winner (16% faster). MultiSource/Benchmarks/McCat/12-IOtest is also improved. I would expect us to hit more benchmarks if/when we consider other optimization opportunities beyond indirect call promotion. The table below shows the performance increase, the code size increase, the number of specialized functions, and the number of specializations of those functions. It also shows the number of extra iterations of the IPSCCP solve loop that were performed due to specialization. (I'm in Phabricator, so hopefully this table will look OK.)
| Benchmark | Perf % (+ faster) | Size % (+ larger) | # Added Solver Iters | # Specializations | # Specialized Funcs |
| ------------------------------------------- | ----------------- | ----------------- | -------------------- | ----------------- | ------------------- |
| llvm/MultiSource/Applications/SPASS | 0.30 | 0.03 | 2 | 18 | 6 |
| llvm/MultiSource/Applications/lua | 0.00 | 0.04 | 1 | 2 | 1 |
| llvm/MultiSource/Applications/siod | -0.17 | 0.21 | 0 | 2 | 1 |
| llvm/MultiSource/Benchmarks/McCat/12-IOtest | 7.04 | -7.20 | 2 | 18 | 1 |
| llvm/MultiSource/Benchmarks/Ptrdist/bc | 0.11 | 1.30 | 1 | 1 | 1 |
| spec2000/perlbmk | -0.25 | -0.00 | 1 | 3 | 1 |
| spec2006/gcc | 0.29 | 0.32 | 1 | 43 | 6 |
| spec2006/gobmk | 0.03 | 0.10 | 2 | 4 | 3 |
| spec2006/hmmer | 0.00 | 0.57 | 1 | 5 | 1 |
| spec2017/blender | 0.58 | 0.84 | 14 | 1441 | 44 |
| spec2017/gcc | 0.08 | 1.54 | 8 | 141 | 22 |
| spec2017/imagick | -0.44 | -0.06 | 1 | 5 | 1 |
| spec2017/mcf | 16.01 | 19.26 | 1 | 2 | 1 |
| spec2017/parest | 0.09 | -0.02 | 0 | 1 | 1 |
| spec2017/perlbench | -0.44 | 0.15 | 2 | 5 | 2 |
| spec2017/povray | -0.28 | 2.19 | 1 | 5 | 1 |
| spec2017/xz | 0.31 | 0.91 | 2 | 5 | 3 |
> I'm a little bit reluctant about having this enabled by default (in the beginning)a s unfortunately cloning can have catastrophic consequences (I had a chat with the GCC folks where they reported substantial growth in the executable sizes for large programs without real speedups, and I also had an early prototype that confirmed this thesis).
> Thanks again for working on this!
Makes sense. Hopefully folks could try out whatever we end up with on some larger programs.
In https://reviews.llvm.org/D36432#835608, @efriedma wrote:
> The way this is written, there isn't much point to sticking the code into IPSCCP; you could just as easily make this a separate pass which runs afterwards. (The only thing you're getting is constant propagation inside the cloned function, which you could just as easily run separately.)
Thanks for the comments! The old specialization pass was indeed separate. I think (long term) it makes sense to incorporate this into IPSCCP. It could be the case that specialization enables further propagation, which then enables additional specialization, etc. I don't think we could get that kind of behavior out of a separate pass. But if we only want to care about indirect call promotion, we could probably achieve the same thing with a separate pass that runs before constant propagation.
Comment at: lib/Transforms/Scalar/SCCP.cpp:2279
+ DEBUG(dbgs() << "SPECIALIZING FUNCTIONS\n");
+ SolveForConstants |= FS.specializeFunctions();
> This loop doesn't work the way you want it to. IPSCCP isn't actually complete until the "while (ResolvedUndefs)" loop finishes, and specializing based on a partially unsolved Solver probably won't give you the results you want.
I'm not sure this is really a problem. Sure IPSCCP isn't yet complete, but `ResolvedUndefsIn` only moves values from unknown to constant. If an actual argument does become constant while resolving undefs, we should analyze the corresponding (formal argument, constant) pair in `getSpecializationBonus` the next time we visit the function when looking for specialization opportunities.
More information about the llvm-commits