[PATCH] D11721: [ThinLTO] Data structures for holding ThinLTO function index/summary

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 17 09:42:26 PDT 2015


tejohnson added a comment.

Will update the patch as noted in responses.

But we need to decide on the best way to represent the module path strings. See my responses below for why I have the current organization and let me know if you think otherwise. The two best choices I see are:

1. (current patch) Use ThinLTOModulePathStringTable to own the module path strings, which are referenced via StringRefs saved in the per-function info. The module ID saved in this table is just used for consistent renaming. This is also a little easier when renaming the module's own strings, since it doesn't require saving the ThinLTO module ID on the Module object, we can just lookup via the Module's path/name already saved there.
2. Change ThinLTOModulePathStringTable to be map from ID to string (needs to be a std::string since there won't be a structure owning the string memory anymore). Save module id in per-function info (instead of string ref), use it to lookup the module path from the ThinLTOModulePathStringTable when we want to do importing. Need to find the ThinLTO module ID associated with the current module's path when reading in the module string table and save it on Module object for later renaming. Save a module path on the Index in the per-module case and use it when building the combined index.


================
Comment at: include/llvm/IR/ThinLTOInfo.h:47
@@ +46,3 @@
+  // Path of module containing function IR
+  StringRef ModulePath;
+  // Index of function within bitcode module block in its module.
----------------
davidxl wrote:
> This field 'ModulePath' belongs to ThinLTOFunctionSummaryIndex class (when used before combining).
But it can't be on the ThinLTOFunctionSummaryIndex class in the combined index. I could have a ModulePath on the ThinLTOFunctionSummaryIndex class just for the case of the per-module index, but it seemed confusing to have it in two places (and which one is used depends on the context). Since StringRefs are small and fast to construct (just a pointer to a char * and a length), and we need one here anyway for the combined case, it seemed clearer to me to always keep it in the per-function info. For the per-module case this would be populated when we read in the per-module ThinLTO information from the IR.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:51
@@ +50,3 @@
+  // Function summary information used to help make importing decisions.
+  ThinLTOFunctionSummary *FunctionSummary;
+  // Used during bitcode parsing and writing to hold the offset of the
----------------
davidxl wrote:
> Is there a 1-1 mapping from ThinLTOFunctionInfo to FunctionSummary? If yes, can ThinLTOFunctionInfo inherite from FunctionSummary?
When creating the combined index it is not 1-1. Another reason why I have a pointer here instead of inheriting or combining the two is to be more efficient in the case where the function summary information is read lazily during importing. I.e. initially only the ThinLTO symtab and module paths are read, and each function's summary is only read when considering importing that function.

In fact, I should move the BitcodeIndex and ModulePath into the ThinLTOFunctionSummary object for the same reason - they are only populated when the function summary is read/parsed. Will make that change to the patch.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:116
@@ +115,3 @@
+typedef StringMap<ThinLTOFunctionInfoList> ThinLTOFunctionMap;
+typedef StringMap<uint64_t> ThinLTOModulePathStringTable;
+
----------------
davidxl wrote:
> Why making a reverse map from stringPath to module id instead of a map from module id to module path?
> 
> In the combined summary index, ThinLTOFunctionInfo object should contain a module id field. From that module id, the module path can be obtained.
The string table map is not queried directly, as described in the associated RFC it is just for owning the module strings, since the string is duplicated when inserting to the StringMap. We don't need to save the module ID and use it to look up in the string table since it is fast and fairly efficient just to save the StringRef directly.

I will update the comments in this file to reflect some of the info in the RFC and make it hopefully a lot clearer.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:124
@@ +123,3 @@
+
+ private:
+  // Map from function name to list of function information instances
----------------
davidxl wrote:
> I think it is better to have a ModulePath field in this class -- this field will only be used for per-module SummaryIndex.
In the combined index we need a module path in the function info, as noted earlier. I think it is confusing to have module paths in multiple places used in different contexts.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:163
@@ +162,3 @@
+  void addFunctionInfo(StringRef FuncName,
+                       ThinLTOFunctionInfo Info) {
+    if (FunctionMap.count(FuncName))
----------------
davidxl wrote:
> why passing by value?
Will change to pass by ref.

================
Comment at: lib/IR/ThinLTOInfo.cpp:44
@@ +43,3 @@
+
+    if (ModPath.empty()) {
+      ModPath = Info.modulePath();
----------------
davidxl wrote:
> The ModPath should be read once from 'Other' outside the loop.
Relates to the earlier discussion on where to save the module path.

================
Comment at: lib/IR/ThinLTOInfo.cpp:56
@@ +55,3 @@
+      // the same name. The symbol table created for the compiled index
+      // file should contain the renamed symbols.
+      SmallString<256> NewName(FuncName);
----------------
davidxl wrote:
> Is it using the same naming scheme when local function is static promoted? If yes, it might be better to have a common helper function such as "globalizeThinLTOLocalName", or "getGlobalThinLTONameForLocal" ..
Agreed. Will add a helper function.


http://reviews.llvm.org/D11721





More information about the llvm-commits mailing list