[llvm] [ThinLTO] Use StringRef instead of std::string (PR #97963)

via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 10 04:05:21 PDT 2024


luxufan wrote:

> > Thanks for your explanation and example!
> > Yes, there is no documentation says BitcodeModule must not be released after being read. However, releasing the BitcodeModule after being read is problematic. See
> > https://github.com/llvm/llvm-project/blob/d358151a4fe94c72186d58f8903f044a116d0fa8/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L7336-L7353
> > 
> > The name is a direct reference to the string table of the BitcodeModule. Then it is stored in the GlobalValueSummaryInfo. If the BitcodeModule is released after being read, then the name field in GlobalValueSummaryInfo may point to the memory that has been freed.
> 
> We've documented in the index that the name StringRef is only valid through the thin link, when summary is read from an bitcode object, due to this:
> 
Our downstream project is using this name StringRef in ThinLTO backend (in runThinLTOBackendThread). Because we want to create "@xxx = external global ptr" global variable according to this name StringRef. So I have a question if ThinLTO is started from lld (not llvm-lto), is the name StringRef still valid after the thin link?

IMO, this name StringRef is still valid after thin link because in 
https://github.com/llvm/llvm-project/blob/main/lld/ELF/Driver.cpp#L2493-L2514
in position of lto->compile(), the bitcode files have not be closed (or munmap if using mmap), which means the module data is still in memory. 
What do you think? Please correct me if I'm missing something.
> https://github.com/llvm/llvm-project/blob/d358151a4fe94c72186d58f8903f044a116d0fa8/llvm/include/llvm/IR/ModuleSummaryIndex.h#L152-L157
> 
> It was originally only used for debugging messages, but now is used (during the thin link) for recording devirtualized target names for WPD too.
> 
> However, for cases like with llvm-lto where we produce a combined index emitted to a file for testing and inspection, the names will simply not be available when inspecting or otherwise using that index (they aren't written by the combined index bitcode writer). E.g. if disassembling the dumped combined index for inspection you just won't be able to see any names, only the guids. So, not incorrect, just less verbose.
> 
> My concern for TypeIdCompatibleVtableMap is whether that data structure will be used in any way after the thin link, but it doesn't appear so. It also turns out it is not even written by the combined index bitcode writer (and in fact we [document that](https://github.com/llvm/llvm-project/blob/baf22a527ca6571cdf5cd57c524a1fc261207947/llvm/include/llvm/Bitcode/LLVMBitCodes.h#L291-L292)).
> 
> > So I think the interface llvm::readModuleSummaryIndex is not correct.
> 
> It is correct per the current expectations, which this patch changes.
> 
> There are ways to deal with this through some of your suggestions below, but is it worth the brittleness? How much memory do we save with this change?
> 
> > Maybe we can try one of these solutions:
> > 
> > 1. Add an argument that indicates if the BitcodeModule is retained after being read. If not, then always call Index->saveString.
> > 2. Add some words in the documentation that the BitcodeModule should not be released after being read.



https://github.com/llvm/llvm-project/pull/97963


More information about the llvm-commits mailing list