[PATCH] D114095: [clang][lex] Include tracking: simplify and move to preprocessor

Mon Dec 13 03:35:16 PST 2021

jansvoboda11 added a comment.

You're right, I measured only this patch, not per-submodule include tracking (D112915 <https://reviews.llvm.org/D112915>).

With per-submodule tracking, the results look like this:

  +----------+------------------+------------------+-------------------+------------------+
  | original |     ID vector    |    bit vector    | subm. w incl. [%] | 1 in bitvec. [%] |
  +----------+------------------+------------------+-------------------+------------------+
  |    23348 |    23380     +32 |    23380     +32 |       100.0       |       33.3       |
  |    52188 |    52224     +36 |    52224     +36 |       100.0       |        6.3       |
  |    74808 |    74856     +48 |    74840     +32 |       100.0       |        9.4       |
  |   171772 |   171836     +64 |   171840     +68 |        40.0       |        5.7       |
  |   206524 |   206584     +60 |   206540     +16 |       100.0       |       32.5       |
  |   227716 |   227904    +188 |   227856    +140 |         6.3       |       33.3       |
  |   253656 |   253812    +156 |   253788    +132 |        90.0       |        7.6       |
  |   271332 |   271584    +252 |   271524    +192 |        85.7       |        8.5       |
  |   280280 |   280460    +180 |   280428    +148 |        91.7       |        5.6       |
  |   340024 |   340176    +152 |   340144    +120 |        21.4       |       10.9       |
  |   394692 |   394928    +236 |   394872    +180 |        25.0       |       18.1       |
  |   629740 |   630028    +288 |   629940    +200 |        83.3       |       20.0       |
  |   707456 |   707732    +276 |   707676    +220 |        85.7       |       13.9       |
  |   785508 |   785632    +124 |   785616    +108 |        85.7       |       18.1       |
  |   835204 |   835824    +620 |   835616    +412 |        93.8       |       12.5       |
  |   887764 |   888004    +240 |   887940    +176 |         8.7       |       19.1       |
  |   971352 |   971504    +152 |   971500    +148 |        66.7       |        6.0       |
  |   994112 |   995048    +936 |   994672    +560 |        93.5       |        9.1       |
  |  1248888 |  1249408    +520 |  1249352    +464 |        29.6       |        5.0       |
  |  3642908 |  3650076   +7168 |  3652668   +9760 |        74.5       |        2.8       |
  |  4112848 |  4114016   +1168 |  4113780    +932 |        17.7       |        5.3       |
  |  5213344 |  5216228   +2884 |  5216552   +3208 |        22.0       |        2.2       |
  +----------+------------------+------------------+-------------------+------------------+
  | 22325464 | 22341244  +15780 | 22342748  +17284 |                   |                  |
  +----------+------------------+------------------+-------------------+------------------+

The `subm. w incl. [%]` column shows the percentage of submodules that include any headers and for the ones that do, `1 in bitvec. [%]` shows how sparse are the bitvectors on average (what percentage of `1` bits they contain).

It seems like smaller modules are generally better off with bitvectors, but for larger modules with greater cumulative number of includes, the bitvectors get long and sparse. And it's the larger modules whose size ends up impacting the overall size of module cache. I think that matches my intuition and roughly corresponds to your own measurements.

(Note that in any case, the module cache growth is negligible: `.071%` for ID vector and  `.077%` for bitvector.)

Given that, I think we should commit this patch with ID vectors, even though in isolation (without D112915 <https://reviews.llvm.org/D112915>) it's the worse solution. WDYT?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114095/new/

https://reviews.llvm.org/D114095