[PATCH] D93838: [LLVM] [SCCP] : Add Function Specialization pass

Tue Apr 13 21:41:37 PDT 2021

ChuanqiXu added a comment.

I just run the SPEC2017 intrate fullLTO with this patch  and I limits the iterations with 10 times and 20 times.
Here is my result

| benchmark     | Speedup with limiting 10 iteration | Speedup with limiting 20 iteration |
| ------------- | ---------------------------------- | ---------------------------------- |
| 505.mcf_r     | 8.4%                               | 8.4%                               |
| 520.omnetpp_r | 0.4%                               | 4%                                 |
|

I didn't run 548 since it requires fortran frontend to emit LLVM IR. Other benchmarks in SPEC2017 intrate don't show significant result.

It is interesting that the result of `520.omnetpp_r` is different from the result before, which shows a great regression.

In D93838#2624107 <https://reviews.llvm.org/D93838#2624107>, @mivnay wrote:

> F15848356: function_specialize_spec_2017_graviton2.png <https://reviews.llvm.org/F15848356>

It is also interesting that the result diff with different limitations.

Then there are compile-time/code-size changes:

| benchmark       | compile-time change with limiting 10 iteration | compile-time change with limiting 20 iteration |
| --------------- | ---------------------------------------------- | ---------------------------------------------- |
| 500.perlbench_r | 27%                                            | 27%                                            |
| 502.gcc_r       | 9%                                             | 9%                                             |
| 505.mcf_r       | 17%                                            | 17%                                            |
| 520.omnetpp_r   | 10%                                            | 14%                                            |
| 523.xalancbmk_r | 3%                                             | 5%                                             |
|

Other benchmarks including 525, 531, 541 and 557 don't show significant change. 
The time change listed here are the compilation time for the whole compiling process instead of linking time.

Finally, the code size change:

| benchmark     | code size change with limiting 10 iteration | code size change with limiting 20 iteration |
| ------------- | ------------------------------------------- | ------------------------------------------- |
| 505.mcf_r     | 14%                                         | 14%                                         |
| 520.omnetpp_r | 13%                                         | 13%                                         |
|

The code sizes for other benchmarks don't show significant change.

================
Comment at: llvm/lib/Transforms/Scalar/SCCP.cpp:2577
+    // already be constant.
+    if (!Solver.getLatticeValueFor(A).isOverdefined())
+      return false;
----------------
Joe wrote:
> ChuanqiXu wrote:
> > Joe wrote:
> > > What if the LatticeValue is a ConstantRange? There could be some great specialization opportunities there. Currently, only checking for overdefined leads to:
> > > 
> > > ```
> > > 
> > > // specialized
> > > foo(a)
> > > foo(1)
> > > foo(2)
> > > 
> > > // not specialized
> > > foo(1)
> > > foo(2)
> > > foo(3)
> > > ```
> > ConstantRange seems to be a much more complex problem. We need analysis for Function to calculate/model the benefit if we can know the range information for some arguments.
> > Then how do we handle the range is another problem. For example, if there is a range [0, 100] for argument `a` of function foo, the best specialization solution maybe specialize foo into foo_1 and foo_2. Then all the call site with argument in [0, 50) should call foo_1 and all the call site with argument in [50, 100] should call foo_2. But how can we find the best split point?
> > My point is, it may not be a  good solution to specialize the function if we find constant range. I prefer to handle this in successive patch. 
> Absolutely, I agree that this can be handled well in a successive patch. However, seeing as it currently doesn't specialize for more than MaxConstantsThreshold, can't we just have a naive implementation that specializes for all the constant range given it's small enough? It just seems odd to me that it won't specialize foo(1) and foo(2).
IMHO, it doesn't make sense to me to specialize functions only if the constant range is small enough (e.g., [2, 5)). For the foo example, the specialized function foo_range_1_3 may be just the same with the original foo if there is no optimization incurred by the range information. In this case, it seems like we only copy the function and make the code size larger without any benefit. In my mind, it is not easy to implement a naive analysis to give the benefit for the range information.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93838/new/

https://reviews.llvm.org/D93838