[llvm] [ThinLTO] Use StringRef instead of std::string (PR #97963)

Mon Jul 15 16:21:38 PDT 2024

luxufan wrote:

> > Our downstream project is using this name StringRef in ThinLTO backend (in runThinLTOBackendThread). Because we want to create "@xxx = external global ptr" global variable according to this name StringRef.
> 
> Can you tell me more about when and why you are doing this? I might be able to suggest a way that wouldn't rely on the name being available in the summary at that point.

TypeIdCompatibleVTableMap is a map from type id identifier to each address point. What we are doing currently is we want to check if the virtual table pointer is equal to one of its address points. This is similar to the CFI virtual table verification in LLVM. 
With split LTO, all virtual tables are grouped into a single Regular LTO module. So if I want to check the address point in the other modules, I must create an external reference @_ZTVxxx = external global ptr. Then use ICMP instructions to compare these two pointers, for example, **icmp eq ptr vptr, ptr getelementptr inbounds (i8, @_ZTVxxx, i64 16))**. For **@_ZTVxxx = external global ptr**, I need to use the name field in ModuleSummaryIndex to create the global variable.
> 
> > So I have a question if ThinLTO is started from lld (not llvm-lto), is the name StringRef still valid after the thin link?
> > IMO, this name StringRef is still valid after thin link because in https://github.com/llvm/llvm-project/blob/main/lld/ELF/Driver.cpp#L2493-L2514 in position of lto->compile(), the bitcode files have not be closed (or munmap if using mmap), which means the module data is still in memory. What do you think? Please correct me if I'm missing something.
> 
> It looks like lld does keep these files open and in memory for in-process ThinLTO, although afaik there is no requirement or guarantee from the ThinLTO side that this would be the case. In fact, for distributed ThinLTO lld does release the memory before writing out the combined index, and we don't have the name in the serialized combined index. If you won't ever use distributed ThinLTO (supported by just a couple of build systems), or plan to upstream the optimization, it might work out ok (assuming lld doesn't change this handling). But like I mentioned above, if you tell me what you are doing in this transformation I might be able to suggest another way.

Thanks for your explanation! It makes sense! 

And in the development of these days, I found that although the BitcodeModule is always kept, the Module is not kept. This means if I new a global variable in the Module, then storing the name of this global variable into ModuleSummaryIndex is not correct. As far as I know, the new global variable name is stored in the Module instead of BitcodeModule.

Regarding distributed ThinLTO, can we save the name in the ModuleSummaryIndex(always call Index.saveString interface before store the name)? 
> 
> > > https://github.com/llvm/llvm-project/blob/d358151a4fe94c72186d58f8903f044a116d0fa8/llvm/include/llvm/IR/ModuleSummaryIndex.h#L152-L157
> > > 
> > > It was originally only used for debugging messages, but now is used (during the thin link) for recording devirtualized target names for WPD too.
> > > However, for cases like with llvm-lto where we produce a combined index emitted to a file for testing and inspection, the names will simply not be available when inspecting or otherwise using that index (they aren't written by the combined index bitcode writer). E.g. if disassembling the dumped combined index for inspection you just won't be able to see any names, only the guids. So, not incorrect, just less verbose.
> > 
> > 
> > > My concern for TypeIdCompatibleVtableMap is whether that data structure will be used in any way after the thin link, but it doesn't appear so. It also turns out it is not even written by the combined index bitcode writer (and in fact we [document that](https://github.com/llvm/llvm-project/blob/baf22a527ca6571cdf5cd57c524a1fc261207947/llvm/include/llvm/Bitcode/LLVMBitCodes.h#L291-L292)).
> > > > So I think the interface llvm::readModuleSummaryIndex is not correct.
> > > 
> > > 
> > > It is correct per the current expectations, which this patch changes.
> > > There are ways to deal with this through some of your suggestions below, but is it worth the brittleness? How much memory do we save with this change?
> > 
> > 
> > Although I have not measured the memory consumption, I don't think there will be a remarkable memory reduction after this patch. The reason why I submitted this patch is our downstream project needs this patch, otherwise, I have to refactor a lot in our downstream project. And IMO, it is good to replace std::string with StringRef so I submitted it here. Anyway, I have applied this patch to our downstream project. If it is not worth merging this patch into upstream, I can close this PR.
> > > > Maybe we can try one of these solutions:
> > > > 
> > > > 1. Add an argument that indicates if the BitcodeModule is retained after being read. If not, then always call Index->saveString.
> > > > 2. Add some words in the documentation that the BitcodeModule should not be released after being read.

https://github.com/llvm/llvm-project/pull/97963