[PATCH] D126236: [CaptureTracking] Increase limit but use it for all visited uses.

Thu Jun 2 05:45:45 PDT 2022

fhahn added a comment.

In D126236#3546453 <https://reviews.llvm.org/D126236#3546453>, @nikic wrote:

>> Good point. Below I added tables with NumCaptured, NumNotcaptured, the sum and NotCaptured/Sum. I've only included SPEC2006/SPEC2017, as those have the largest number of capture queries.
>
> Thanks! I've put the data in a spreadsheet for easy comparison: https://docs.google.com/spreadsheets/d/13cwt-aWjAb_a1Ri9vqS4RgCslfDkEwSqYoYpEtuoeeI/edit?usp=sharing For Limit=60 we're doing slightly better than baseline on average (1.1x) with two outlier regressions. For Limit=80 we avoid one outlier and slightly improve the average (1.2x). With Limit=100 we avoid both outliers and again slightly improve the average (1.3x).

I suppose the question is how far we want to go with respect to avoiding any outlier regressions :) I think either 60, 80 or 100 are clear improvements for opaque pointers.

When comparing the numbers on the compile-time-tracker for 60 and 100, the majority of the difference is down to `tramp3d-v4`: https://llvm-compile-time-tracker.com/compare.php?from=520563fdc146319aae90d06f88d87f2e9e1247b7&to=818719fad01d472412c963629671a81a8703b25b&stat=instructions

But I don't think this is mostly caused by time spent in capture-tracking but additional optimizations (binary size difference is about 1.2% for LTO). Looking at the stats, there's an increase in # of dead stores removed and also # of instructions removed by GVN.

Given that, I'd be slightly inclined to go for the larger limit out of the options.

>> It might be worth exploring different limits for different clients, but I think another major client is GVN (via MemDepAnalysis) and I think exploring a sufficient number of uses there can be quite important (the regression I was investigating was due to missed load elimination by GVN)
>
> Unfortunately MemDepAnalysis is exactly the area where cost increases are most problematic :( In part because cross-BB MDA uses completely ridiculous cutoffs -- we should probably go ahead and reduce those, and not wait for the magic bullet of NewGVN or MSSA. But that's neither here nor there.

Yeah that's unfortunate. But I just noticed that MemDepAnalysis appears to use `BatchAA` for most queries already, so it should also cache some of the capture results, right?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126236/new/

https://reviews.llvm.org/D126236