[llvm] [ThinLTO] Use StringRef instead of std::string (PR #97963)

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 9 08:25:53 PDT 2024


teresajohnson wrote:

> Thanks for your explanation and example!
> 
> Yes, there is no documentation says BitcodeModule must not be released after being read. However, releasing the BitcodeModule after being read is problematic. See
> 
> https://github.com/llvm/llvm-project/blob/d358151a4fe94c72186d58f8903f044a116d0fa8/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7336-L7353
> 
> 
> The name is a direct reference to the string table of the BitcodeModule. Then it is stored in the GlobalValueSummaryInfo. If the BitcodeModule is released after being read, then the name field in GlobalValueSummaryInfo may point to the memory that has been freed.

We've documented in the index that the name StringRef is only valid through the thin link, when summary is read from an bitcode object, due to this: 
https://github.com/llvm/llvm-project/blob/d358151a4fe94c72186d58f8903f044a116d0fa8/llvm/include/llvm/IR/ModuleSummaryIndex.h#L152-L157

It was originally only used for debugging messages, but now is used (during the thin link) for recording devirtualized target names for WPD too.

However, for cases like with llvm-lto where we produce a combined index emitted to a file for testing and inspection, the names will simply not be available when inspecting or otherwise using that index (they aren't written by the combined index bitcode writer). E.g. if disassembling the dumped combined index for inspection you just won't be able to see any names, only the guids. So, not incorrect, just less verbose.

My concern for TypeIdCompatibleVtableMap is whether that data structure will be used in any way after the thin link, but it doesn't appear so. It also turns out it is not even written by the combined index bitcode writer (and in fact we [document that](https://github.com/llvm/llvm-project/blob/baf22a527ca6571cdf5cd57c524a1fc261207947/llvm/include/llvm/Bitcode/LLVMBitCodes.h#L291-L292)).

> So I think the interface llvm::readModuleSummaryIndex is not correct.

It is correct per the current expectations, which this patch changes.

There are ways to deal with this through some of your suggestions below, but is it worth the brittleness? How much memory do we save with this change?

> 
> Maybe we can try one of these solutions:
> 
> 1. Add an argument that indicates if the BitcodeModule is retained after being read. If not, then always call Index->saveString.
> 2. Add some words in the documentation that the BitcodeModule should not be released after being read.


https://github.com/llvm/llvm-project/pull/97963


More information about the llvm-commits mailing list