[PATCH] D151165: [ThinLTO] Make the cache key independent of the module identifier paths

Teresa Johnson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 27 06:52:40 PDT 2023


tejohnson added a comment.

In D151165#4538190 <https://reviews.llvm.org/D151165#4538190>, @nikic wrote:

> @tejohnson Can you please confirm the correctness of this change?
>
> This change has caused major compile-time regressions in Rust (compiled modules no longer get reused in incremental builds), because modules get added to the index in different orders across compilations. On the surface, it seems like we could easily work around this by adding a sort over the module key. However, the modules involved in the compilation can actually change, e.g. if references to symbols in a module are added/removed so that the module does not need to be linked at all (or starts needing to be linked). Even with a sort, this is going to shift the module indices, and result in at least unnecessary cache invalidation (I'm not sure it can result in result in incorrect cache reuse).
>
> The basic premise of this patch (that we can use the module order instead of the module key) seems to be premised on a specific compilation model that is not valid for all thin lto consumers.

Thanks for the report, yeah I can see how using the module id here can result in spurious differences. Using the module hash should address the issue I think.

In D151165#4538667 <https://reviews.llvm.org/D151165#4538667>, @akyrtzi wrote:

> What about changing the code to sort using the module hash:
>
>      llvm::sort(ImportModulesVector,
>                 [](const ImportModule &Lhs, const ImportModule &Rhs) -> bool {
>   -               return Lhs.getId() < Rhs.getId();
>   +               return Lhs.getHash() < Rhs.getHash();
>                 });
>
> would that resolve your issue?

This is better, @nikic can you confirm?

The module ID is an older concept that predates the module hash. We should probably remove that from the in-memory index completely, since it just takes up space and can lead to confusion about what values are stable and should be used. The numeric id is utilized in the Bitcode format for compactness, but I don't think we need it in memory anymore (quick scan of the codebase suggests not). Let me see if I can remove that once this issue is fixed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151165/new/

https://reviews.llvm.org/D151165



More information about the llvm-commits mailing list