[llvm] [ThinLTO] Use StringRef instead of std::string (PR #97963)

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Sun Jul 14 21:52:24 PDT 2024


teresajohnson wrote:

> Our downstream project is using this name StringRef in ThinLTO backend (in runThinLTOBackendThread). Because we want to create "@xxx = external global ptr" global variable according to this name StringRef.

Can you tell me more about when and why you are doing this? I might be able to suggest a way that wouldn't rely on the name being available in the summary at that point.

> So I have a question if ThinLTO is started from lld (not llvm-lto), is the name StringRef still valid after the thin link?
> 
> IMO, this name StringRef is still valid after thin link because in https://github.com/llvm/llvm-project/blob/main/lld/ELF/Driver.cpp#L2493-L2514 in position of lto->compile(), the bitcode files have not be closed (or munmap if using mmap), which means the module data is still in memory. What do you think? Please correct me if I'm missing something.

It looks like lld does keep these files open and in memory for in-process ThinLTO, although afaik there is no requirement or guarantee from the ThinLTO side that this would be the case. In fact, for distributed ThinLTO lld does release the memory before writing out the combined index, and we don't have the name in the serialized combined index. If you won't ever use distributed ThinLTO (supported by just a couple of build systems), or plan to upstream the optimization, it might work out ok (assuming lld doesn't change this handling). But like I mentioned above, if you tell me what you are doing in this transformation I might be able to suggest another way.
> 
> > https://github.com/llvm/llvm-project/blob/d358151a4fe94c72186d58f8903f044a116d0fa8/llvm/include/llvm/IR/ModuleSummaryIndex.h#L152-L157
> > 
> > It was originally only used for debugging messages, but now is used (during the thin link) for recording devirtualized target names for WPD too.
> > However, for cases like with llvm-lto where we produce a combined index emitted to a file for testing and inspection, the names will simply not be available when inspecting or otherwise using that index (they aren't written by the combined index bitcode writer). E.g. if disassembling the dumped combined index for inspection you just won't be able to see any names, only the guids. So, not incorrect, just less verbose.
> 
> > My concern for TypeIdCompatibleVtableMap is whether that data structure will be used in any way after the thin link, but it doesn't appear so. It also turns out it is not even written by the combined index bitcode writer (and in fact we [document that](https://github.com/llvm/llvm-project/blob/baf22a527ca6571cdf5cd57c524a1fc261207947/llvm/include/llvm/Bitcode/LLVMBitCodes.h#L291-L292)).
> > > So I think the interface llvm::readModuleSummaryIndex is not correct.
> > 
> > 
> > It is correct per the current expectations, which this patch changes.
> > There are ways to deal with this through some of your suggestions below, but is it worth the brittleness? How much memory do we save with this change?
> 
> Although I have not measured the memory consumption, I don't think there will be a remarkable memory reduction after this patch. The reason why I submitted this patch is our downstream project needs this patch, otherwise, I have to refactor a lot in our downstream project. And IMO, it is good to replace std::string with StringRef so I submitted it here. Anyway, I have applied this patch to our downstream project. If it is not worth merging this patch into upstream, I can close this PR.
> 
> > > Maybe we can try one of these solutions:
> > > 
> > > 1. Add an argument that indicates if the BitcodeModule is retained after being read. If not, then always call Index->saveString.
> > > 2. Add some words in the documentation that the BitcodeModule should not be released after being read.



https://github.com/llvm/llvm-project/pull/97963


More information about the llvm-commits mailing list