[all-commits] [llvm/llvm-project] ddc702: [NewPM] Don't skip SCCs not in current RefSCC

Fri Mar 18 14:16:50 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: ddc702376a34d6c0b965008fd81c3c34e69d3ee7
      https://github.com/llvm/llvm-project/commit/ddc702376a34d6c0b965008fd81c3c34e69d3ee7
  Author: Arthur Eubanks <aeubanks at google.com>
  Date:   2022-03-18 (Fri, 18 Mar 2022)

  Changed paths:
    M llvm/include/llvm/Analysis/CGSCCPassManager.h
    M llvm/lib/Analysis/CGSCCPassManager.cpp
    A llvm/test/Other/cgscc-refscc-mutation-order.ll
    M llvm/test/Transforms/Inline/cgscc-cycle-debug.ll

  Log Message:
  -----------
  [NewPM] Don't skip SCCs not in current RefSCC

With D107249 I saw huge compile time regressions on a module (150s ->
5700s). This turned out to be due to a huge RefSCC in
the module. As we ran the function simplification pipeline on functions
in the SCCs in the RefSCC, some of those SCCs would be split out to
their RefSCC, a child of the current RefSCC. We'd skip the remaining
SCCs in the huge RefSCC because the current RefSCC is now the RefSCC
just split out, then revisit the original huge RefSCC from the
beginning.  This happened many times because many functions in the
RefSCC were optimizable to the point of becoming their own RefSCC.

This patch makes it so we don't skip SCCs not in the current RefSCC so
that we split out all the child RefSCCs on the first iteration of
RefSCC. When we split out a RefSCC, we invalidate the original RefSCC
and add the remainder of the SCCs into a new RefSCC in
RCWorklist. This happens repeatedly until we finish visiting all
SCCs, at which point there is only one valid RefSCC in
RCWorklist from the original RefSCC containing all the SCCs that
were not split out, and we visit that.

For example, in the newly added test cgscc-refscc-mutation-order.ll,
we'd previously run instcombine in this order:
f1, f2, f1, f3, f1, f4, f1

Now it's:
f1, f2, f3, f4, f1

This can cause more passes to be run in some specific cases,
e.g. if f1<->f2 gets optimized to f1<-f2, we'd previously run f1, f2;
now we run f1, f2, f2.

This improves kimwitu++ compile times by a lot (12-15% for various -O3 configs):
https://llvm-compile-time-tracker.com/compare.php?from=2371c5a0e06d22b48da0427cebaf53a5e5c54635&to=00908f1d67400cab1ad7bcd7cacc7558d1672e97&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D121953