[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)

Wed Aug 28 00:59:42 PDT 2024

ChuanqiXu9 wrote:

I think now I understand the problem. The root cause happens in https://github.com/llvm/llvm-project/blob/175aa864f33786f3a6a4ee7381cbcafd0758501a/clang/lib/Serialization/MultiOnDiskHashTable.h#L329

The description in () is optional. You can skip it if you're not interested it or in the first iteration.

what the code does is: when we write a on-disk hash table, try to write the imported merged hash table in the same process so that we don't need to read these tables again. However, in line 329 the function will try to omit the data from imported table with the same key which already emitted by the current module file. This is the root cause of the problem.

(The wrotten merged hash table are called overiden files, and they will be removed in https://github.com/llvm/llvm-project/blob/175aa864f33786f3a6a4ee7381cbcafd0758501a/clang/lib/Serialization/MultiOnDiskHashTable.h#L133-L137)

(when will the table will be merged? when the number of on disk hash table for the same item is large than some threshold (by default 4), we will merge them into an in memory table to try to speedup the querying. So this is majorly an optimization.)

It is bad to skip data with the same key. Since it violates the big assumption that we discussed for a long time:
- It is bad to have different key values for the logical same specializations.
- But it is actually good to have the same key values for the different specializations. And the code should work well if we counts the hash value for all template arguments as 0x12345678.

And the implicitly optimization to skip data with the same key, violates the second assumption above. So this is the root cause of the problem.

(Why my previous try works? Since it will remove the imported table if it loads all the items from it, so it avoids the "optimization" surprisingly.)

Then it looks pretty simple to overcome the issue, just skip the optimization like I did in the most new commit.

@ilya-biryukov @alexfh I think we can start another round of test. Thanks in ahead.

https://github.com/llvm/llvm-project/pull/83237