[PATCH] D126236: [CaptureTracking] Increase limit but use it for all visited uses.

Tue May 24 06:13:45 PDT 2022

nikic added a comment.

In D126236#3533958 <https://reviews.llvm.org/D126236#3533958>, @fhahn wrote:

>> is there a number you're seeing which roughly matches performance you were getting with typed pointers?
>
> I'm not sure if that data point is too interesting in isolation. It might be more interesting to look at the impact on the `capture-tracking.NumNotCapturedBefore` statistic. Below are numbers for MultiSource/SPEC2006/SPEC2017 on X86 with -O3. As there are loads of changes, I removed programs where the change is small (+-1-2%) and where the absolute value is low (< ~40) to keep things a bit more compact.
>
> The first table shows current main with opaque pointers disabled vs enabled.  Note that there are quite a few notable regressions. With a limit of 60, we still see regressions, going to 100 reduces that further. I also tried a limit of 200 and that only slightly reduces the number of regressions further.

Looking at just the NumNotCapturedBefore statistic is only meaningful if NumCapturedBefore+NumNotCapturedBefore is approximately the same in both configurations. Is that the case? It would probably be more meaningful to look for changes in the ratio NumNotCapturedBefore/(NumCapturedBefore+NumNotCapturedBefore), i.e. the percentage of queries that succeed.

I do agree with the general change here -- in fact, some time ago I was looking into the same change for the reverse reason: The current implementation of the limit sometimes results in much more uses being visited than is reasonable. I ultimately ended up not pursing it because is has significant impact on optimization behavior and I didn't have time to analyze it in detail.

The actual value of the limit matters though -- the current capture tracking implementation is intended to be cheap enough that uncached usage is viable, and we can clearly see that compile-time is quite sensitive to changes to this limit. We shouldn't overshoot in the other direction here. If we want to raise this too high, we'll have to start thinking about using different limits for different callers. E.g. visiting many users should be fine in DSE because results are cached for the whole duration of the pass, while visiting many uses in (non-batch) AA is problematic.

PS: I think some of the data in your table is incorrect, e.g. it has the line ` MultiSourc...nchmarks/tramp3d-v4/tramp3d-v4 4	975	5190	4.3%`, where the percentage is much smaller than the change.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126236/new/

https://reviews.llvm.org/D126236