[clang] [analyzer] Retry UNDEF Z3 queries at most "crosscheck-with-z3-retries-on-timeout" times (PR #120239)

Mon Jan 6 00:31:40 PST 2025

steakhal wrote:

> I'm a bit surprised by the idea of using multiple attempts instead of a single run with a larger timeout -- intuitively we're wasting the already performed calculations if we are impatient and abort+restart the calculations after each short timeout (instead of allocating all the time for a single run).
> 
> I can imagine a world where the runtime of the probability distribution of Z3 queries has a very "large tail", and in that case the restarts would be useful. E.g. as a toy example consider a probability distribution where rerunning one particular query takes <200 ms with 90% chance and 2000 ms in the remaining 10% of runs -- in this world doing three runs with 300 ms timeout is much better better than doing one run with 900 ms timeout.
> 
> (For more context, see e.g. [wikipedia on the memorylessness property](https://en.wikipedia.org/wiki/Memorylessness) of distributions. In a memoryless distribution the expected remaining time is independent of the time that's already spent on waiting; if the distribution is even more skewed than that, then there are points when starting a new calculation is better than continuing the already running one.)
> 
> However, I don't see any reason to assume that the distribution has a "long tail" -- in fact eyeballing the graphs from your earlier measurements I'd guess that the outlier slow queries are relatively rare.

I adjusted the first commit message to highlight my answer to these questions. I hope that's clear enough.

> To check this question, it would be nice to have measurements that compare the total analysis time and flakiness of e.g. t i m e o u t = X , r e p e a t s = 2 and t i m e o u t = 3 X , r e p e a t s = 0 for various values of X . In addition to the timeout X = 300 ms that you suggest as default (because it fits the capabilities of your system) it would be important to have a few other measurements that e.g. use X = 150 ms or X = 200 ms etc. to approximate the situation when somebody uses a timeout of 300 ms on a system that is weaker than yours.
> 
> To simplify the comparison, I'd suggest doing these comparisons without specifying an aggregate timeout -- but if you can suggest a reasonable value for it, then I'm also fine with that.

I think we discussed this under the RFC, but I want to make sure you confirm this before I'd discard this point.

@NagyDonat I think I'm ready for the next round of review.

https://github.com/llvm/llvm-project/pull/120239