[PATCH] D78391: [ValueLattice] Allow two range extensions.

Tue Apr 28 14:02:36 PDT 2020

fhahn planned changes to this revision.
fhahn marked 2 inline comments as done.
fhahn added a comment.

It's probably worth exploring D79036 <https://reviews.llvm.org/D79036> first and then apply this on top of D79036 <https://reviews.llvm.org/D79036>. (it seems like there still are additional cases with wins).

================
Comment at: llvm/include/llvm/Analysis/ValueLattice.h:358
       // otherwise.
-      if (Opts.CheckWiden && ++NumRangeExtensions >= 1) {
+      if (Opts.CheckWiden && ++NumRangeExtensions >= 2) {
         // Make sure WidenCR is only used if it contains the merge result.
----------------
nikic wrote:
> nikic wrote:
> > Add `MaxRangeExtensions = 2` static const?
> Or rather `MaxRangeExtensions = 1` and then `++NumRangeExtensions > MaxRangeExtensions`. I only now realized that the code as written only allows extending the range once, not twice. Previously it allowed zero extensions.
Yeah, in my wording I considered setting the range as the 'first' extensions. I agree that has potential for confusion :(. I'll update/clarify that.

================
Comment at: llvm/test/Transforms/SCCP/constant-range-struct.ll:106
+; CHECK-NEXT:    call void @use(i1 true)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[T_3:%.*]] = icmp ne i64 [[V2]], 0
----------------
nikic wrote:
> fhahn wrote:
> > nikic wrote:
> > > fhahn wrote:
> > > > nikic wrote:
> > > > > nikic wrote:
> > > > > > Disclaimer: First time I'm looking at SCCP in LLVM.
> > > > > > 
> > > > > > I feel like we really should be handling this already, without support for multiple widenings (and also, we should handle this for cases with more phi arguments). What happens in `@struct2` is that we first visit the entry block, then the `false` block, then the `exit` block (computing everything there), then the `true` block, then `true -> exit` becomes feasible and we update the phi constant ranges, and then merge in new values for all instructions in `exit`, going to overdefined (without widening).
> > > > > > 
> > > > > > If we had instead visited `entry`, `false`, `true`, `exit`, then not only would we have saved on duplicate computations, we would have also avoided the need to perform widening.
> > > > > > 
> > > > > > I feel like the right way to approach this is to fix the block visitation order, and then more widening is something we can do on top of that (which should then only matter for loops).
> > > > > Just as a quick experiment, visiting BBs in BFS instead of DFS: https://gist.github.com/nikic/85824a0d14225cd1af5a8b0e25982c0e
> > > > > 
> > > > > Some wins, some losses. What we really want is to visit the BBs in RPO order.
> > > > > I feel like the right way to approach this is to fix the block visitation order, and then more widening is something we can do on top of that (which should then only matter for loops).
> > > > 
> > > > In some cases it boils down to iteration order, yes. What makes things a bit more tricky is that the order of blocks also depends on the current value of the branch conditions (i.e. if we can prove a condition is true/false we don't need to visit one of the predecessors), which in turn depends on the iteration order. While only constants were supported, it did not matter too much, but becomes more relevant with constant ranges.
> > > > 
> > > > In the example above, we have to execute both true and false straight away, as the condition is overdefined. In other scenarios a different iteration order won't help unfortunately.
> > > > 
> > > > Granted, RPO would probably be better in general, but more expensive and we probably would have to keep the worklist in RPO order as well. It might be worth a try though (and not matter too much in terms of compile-time). I'll put something together, but the widening extension and iteration order address different issues (although there is some overlap)
> > > I played around with this a bit more. I think making use of RPO proper may be non-trivial for the interprocedural case. I came up with this hack that prefers visiting blocks where all predecessors are known first: https://gist.github.com/nikic/ab0222b3b0b37b73f96d9a1d47bdd100 I think excluding degenerate cases with a priori unreachable blocks (not blocks found unreachable by SCCP), this should give the same result as visiting in RPO.
> > > 
> > > > I'll put something together, but the widening extension and iteration order address different issues (although there is some overlap)
> > > 
> > > I do understand that these address different issues, but I suspect that most of the benefit you're seeing on test-suite is not due to the benefits of widening itself, which should really only manifest in loops, but rather due to avoiding the iteration order issue for the common case of two argument phis.
> > > 
> > > Ideally SCCP as an algorithm should not depend on iteration order (outside of performance concerns), but I can see how this is hard to realize once ranges are involved.
> > > I came up with this hack that prefers visiting blocks where all predecessors are known first: https://gist.github.com/nikic/ab0222b3b0b37b73f96d9a1d47bdd100 I think excluding degenerate cases with a priori unreachable blocks (not blocks found unreachable by SCCP), this should give the same result as visiting in RPO.
> > 
> > Interesting, thanks for sharing! I gave it a try and it seems to give some nice improvements in some cases, but there are a few notable regressions below (I'm running -O3 LTO for MultiSource & SPEC)
> > 
> > It's definitely an interesting direction to explore, but I think it we would have to take a look at the regressions to understand what's going wrong there.
> > 
> > IPNumInstRemoved regressions
> >  test-suite...peg2/mpeg2dec/mpeg2decode.test    70.00   31.00   -55.7%
> >  test-suite...CFP2000/177.mesa/177.mesa.test   202.00  183.00   -9.4%
> > 
> > NumInstRemoved
> >  test-suite...CFP2000/177.mesa/177.mesa.test   1862.00  839.00   -54.9%
> >  test-suite...marks/7zip/7zip-benchmark.test   1445.00  1412.00  -2.3%
> >  test-suite.../CINT2000/176.gcc/176.gcc.test   3092.00  3060.00  -1.0%
> > 
> > 
> > > I do understand that these address different issues, but I suspect that most of the benefit you're seeing on test-suite is not due to the benefits of widening itself, which should really only manifest in loops, but rather due to avoiding the iteration order issue for the common case of two argument phis.
> > 
> > I collected numbers with your diff as baseline, followed by D78376, followed by this patch.
> > 
> > The changes between D78376 and this patch are as follows and look like they are in the same ball-park, presumably covering mostly different/additional cases. (for the mpeg2decode.test case it mostly restores the regressions with the baseline):
> > 
> > IPNumInstRemoved                                               D78376.  D78391
> >  test-suite...peg2/mpeg2dec/mpeg2decode.test    31.00   74.00  138.7%
> >  test-suite...CFP2000/177.mesa/177.mesa.test   185.00  249.00  34.6%
> >  test-suite...rks/FreeBench/mason/mason.test     6.00    8.00  33.3%
> >  test-suite...ks/Prolangs-C/agrep/agrep.test    75.00   82.00   9.3%
> >  test-suite...oxyApps-C++/miniFE/miniFE.test    43.00   47.00   9.3%
> >  test-suite...langs-C/football/football.test   116.00  125.00   7.8%
> >  test-suite...ications/JM/ldecod/ldecod.test   181.00  195.00   7.7%
> >  test-suite...pplications/treecc/treecc.test    80.00   86.00   7.5%
> >  test-suite...lications/SIBsim4/SIBsim4.test    18.00   19.00   5.6%
> >  test-suite...tions/lambda-0.1.3/lambda.test    23.00   24.00   4.3%
> >  test-suite.../CINT2000/175.vpr/175.vpr.test   100.00  104.00   4.0%
> >  test-suite...urce/Applications/lua/lua.test   301.00  313.00   4.0%
> >  test-suite.../Applications/lemon/lemon.test    30.00   31.00   3.3%
> >  test-suite...eeBench/analyzer/analyzer.test   103.00  105.00   1.9%
> >  test-suite...lications/ClamAV/clamscan.test   1016.00 1031.00  1.5%
> > 
> > The changes between your diff and D78376 are
> > 
> > IPNumInstRemoved                                              diff   D78376
> >  test-suite...ks/Prolangs-C/agrep/agrep.test    58.00    75.00  29.3%
> >  test-suite...cations/hexxagon/hexxagon.test    36.00    42.00  16.7%
> >  test-suite...rks/FreeBench/pifft/pifft.test    30.00    32.00   6.7%
> >  test-suite...CFP2000/188.ammp/188.ammp.test    72.00    76.00   5.6%
> >  test-suite.../CINT2000/254.gap/254.gap.test   171.00   178.00   4.1%
> >  test-suite...000/197.parser/197.parser.test    64.00    66.00   3.1%
> >  test-suite.../Benchmarks/Bullet/bullet.test   339.00   345.00   1.8%
> >  test-suite...TimberWolfMC/timberwolfmc.test    75.00    76.00   1.3%
> >  test-suite...pplications/treecc/treecc.test    79.00    80.00   1.3%
> >  test-suite.../CINT2000/181.mcf/181.mcf.test    84.00    85.00   1.2%
> >  test-suite...CFP2000/177.mesa/177.mesa.test   183.00   185.00   1.1%
> >  test-suite.../Applications/sgefa/sgefa.test   115.00   116.00   0.9%
> >  test-suite...lications/ClamAV/clamscan.test   1008.00  1016.00  0.8%
> >  test-suite...6/482.sphinx3/482.sphinx3.test   141.00   142.00   0.7%
> >  test-suite.../CINT2006/403.gcc/403.gcc.test   3994.00  4007.00  0.3%
> > 
> > 
> > Overall it looks like there are plenty of further improvements down the road :) And I think changes to the iteration order are mostly complementary to this patch. I definitely need to update the tests for this patch though, with more complex cases that are not also solved by changes to the iteration order.
> > It's definitely an interesting direction to explore, but I think it we would have to take a look at the regressions to understand what's going wrong there.
> 
> Yeah, I was definitely too optimistic that the order would match RPO... Now that I think about this again, it probably falls apart with loops.
> 
> > Overall it looks like there are plenty of further improvements down the road :) And I think changes to the iteration order are mostly complementary to this patch. I definitely need to update the tests for this patch though, with more complex cases that are not also solved by changes to the iteration order.
> 
> Thanks for running the numbers! I agree that these issues are complimentary, and allowing the additional range extension makes sense.
> 
> I'm also wondering whether it would make sense to invert the current relationship, where phis can be arbitrarily widened, while for everything else widening is limited. Based on a quick prototype (https://gist.github.com/nikic/885119c67456a0d607db730af744e25b) this is sometimes better, sometimes worse (for MaxWidenings=0). I'm not really clear on what the tradeoff here is, regarding the position where we allow widening and where we don't.
> I'm also wondering whether it would make sense to invert the current relationship, where phis can be arbitrarily widened, while for everything else widening is limited. Based on a quick prototype (https://gist.github.com/nikic/885119c67456a0d607db730af744e25b) this is sometimes better, sometimes worse (for MaxWidenings=0). I'm not really clear on what the tradeoff here is, regarding the position where we allow widening and where we don't.

Initially it did not matter too much (as nothing besides PHIS and calls had range support) and it probably also causes fewer state-changes as is. Also I did not want to complicate SCCP by adding additional state to track the number of extensions for certain values.

I had a look and it seems quite straight forward to do now that ValueLatticeElement tracks the number of extensions though. The question is what sensible limits are for the number of extensions of PHIs (and the other cases which can cause like stores and call edges). For Phis it seems natural to tie the number of extensions to the number of active incoming values (this patch also allows one more, but I have to run the numbers to see if the additional step matters). For calls/stores, it might make sense to tie it to the number of users of the global, but initially I picked an arbitrary value (10). Seems to work quite well. I've put up a patch D79036. It's not quite ready for review yet, I still want to experiment a bit more with different thresholds.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78391/new/

https://reviews.llvm.org/D78391