[llvm] [IR] Use block numbers in PredIteratorCache (PR #101885)

Sun Aug 4 07:42:23 PDT 2024

aengelke wrote:

> where only a small fraction of blocks will be queried.

This boils down to the question: when is a vector faster than a DenseMap? This depends on:

- number of inserted elements (I)
- total element space (vector only) (N)
- element size (16 bytes in this case)

I made a small benchmark ([dmvsvec.txt](https://github.com/user-attachments/files/16487941/dmvsvec.txt) -- `clang++ -O3 -DNDEBUG -fno-exceptions -fno-rtti`) that inserts I elements (with random number in 0..N-1) into a map/vector (vector preallocated to size N) and then probes all I elements. Not a real scientific benchmark, single machine only, and only to get an intuition where the boundaries are. Key results:

- For I=1, the vector is better than the map for N<70 (2%, so the map is better if less than 2% of the elements are inserted+probed)
- For I=5, the break even is at N~80 (6%)
- For I=10, the break even is at N~110 (9%)
- For I=20, the break even is at N~160 (12%)
- For I=40, the break even is at N~270 (15%)
- For I=150, the break even is at N~2000 (8%) -- for many insertions, when the DenseMap has to grow larger element sizes, the vector becomes more beneficial again.

For PredIteratorCache, there are likely multiple probes, so the vector is preferable in even more cases. (E.g., I=5, and 5 probes instead of one => N~100).

I have no statistics about the PIC users, but I think that they probe >5% of the basic blocks of a function is a somewhat reasonable assumption?

https://github.com/llvm/llvm-project/pull/101885