[PATCH] D11721: [ThinLTO] Data structures for holding ThinLTO function index/summary

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 21 08:58:09 PDT 2015


tejohnson added a comment.

Hi Mehdi,

Thanks for the comments! Responses below. I am not going to update the patch just yet as I am working on implementing the bitcode reading/writing for the format proposed in the latest round of http://reviews.llvm.org/D11722 comments, including the records so that I can do better end to end testing, and there are going to be a few more changes here as a result. I tried to explain the separation of the summary and info structs and some of the other issues below. Let me know if that doesn't clarify the organization.

BTW, since there are now so many RFCs and patches floating around, I put together a page that points to all of them (and I will try to keep this up to date!). See https://sites.google.com/site/llvmthinlto/.

Thanks
Teresa


================
Comment at: include/llvm/IR/ThinLTOInfo.h:2
@@ +1,3 @@
+//===-- llvm/ThinLTOInfo.h - ThinLTO Index and Summary ----------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
----------------
joker.eph wrote:
> I agree with Philip, I'm not sure it needs to be "LTO". Now it is not a blocker to me, it shouldn't be a big deal to introduce it here and refactor it later. At least it won't be coupled to anything else.
I'll probably go ahead and remove the ThinLTO part of the name, as I am planning to do for the bitcode sections. Some of it is still pretty ThinLTO-specific but I will note that in the comments instead of the filename and data structure names.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:25
@@ +24,3 @@
+
+// Function summary information to aid decisions and implementation of
+// importing. This is a separate class from ThinLTOFunctionInfo to
----------------
joker.eph wrote:
> I'll never say enough how I really enjoyed the job you did with documenting your stuff with RFCs on the mailing list. Now I hope this information (in the RFCs) will be nicely translated in the code as well.
> 
> Since you'll introduce a lot of new classes, what about trying to adopt off-hand a format that will result is an nice doxygen? Especially in stuff that goes the public header.
> 
> Just a hint to help: we turned on "autobrief" on in LLVM in May, so the first line is interpreted as a the "brief description" unless you add the keyword \brief (then is it the first sentence that is matched).
> 
> >  /// Do the thing with the stuff.
> >  ///
> >  /// This does whatever, using \c param1 for something and returning
> >  /// some sort of result.
> 
> or
> 
> > /// \brief This function does X. It caches some values or something,
> > /// and is complicated in some minor way.
> 
> (Now there are no strong requirements of sticking with the doxygen syntax that I'm aware of, and it won't prevent your patches from being integrated).
> 
Thanks for the tip. I will change the comments to be nice for doxygen.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:80
@@ +79,3 @@
+  // True if we are still parsing into this instance.
+  bool Parsing;
+  // Used to flag functions that have local linkage types and need to
----------------
joker.eph wrote:
> Is seems to me from reading the implementation that unless you call `recordParsedInfo()` with `nullptr` you always have the invariant `Parsing == !FunctionSummary`. 
> So it seems that this flag is useless?
Yeah, I have removed that flag in the new version of the patch I am working on. It was previously more interesting when I had the bitcode index and module path in this structure, but I have moved all of that to the function summary (and am moving the local function flag as well now).

================
Comment at: include/llvm/IR/ThinLTOInfo.h:98
@@ +97,3 @@
+    if (FunctionSummary)
+      delete FunctionSummary;
+  }
----------------
joker.eph wrote:
> This implies that `ThinLTOFunctionInfo` owns its `FunctionSummary`, so is there a reason not to use `std::unique_ptr`?
I was trying to be clever about sharing the pointer in the combined index creation phase, but this is not going to work as is anyway. I will probably change this to a unique_ptr as you suggest, and just duplicate when it is merged into the combined index.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:102
@@ +101,3 @@
+  // Use the default copy constructor (when creating combined index),
+  // which will cause the same ThinLTOFunctionSummary object to be shared.
+
----------------
joker.eph wrote:
> This is scary to read with respect to what I see in the destructor, how are you avoiding double deletion?
Yeah I already hit an issue with this in some follow on work I am doing to populate these during bitcode writing, as this struct is being copied around and was duplicating the pointer leading to double-frees. I am changing this.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:132
@@ +131,3 @@
+// COMDAT functions of the same name.
+typedef std::vector<ThinLTOFunctionInfo> ThinLTOFunctionInfoList;
+// Map from function name to corresponding function info structures.
----------------
joker.eph wrote:
> Can you provide some information about what happens in the case of COMDAT functions? (if you did on the mailing I'd appreciate a pointer to the RFC thread)
There are a couple mentions in the RFCs, relating to why this is a vector and also relating to how they are handled from a linkage handling aspect. 

The former can be found in the file API and data structures RFC (http://lists.llvm.org/pipermail/llvm-dev/2015-August/088920.html):
"There may be more than one ThinLTOFunctionInfo for a given function name in the combined function index/summary map due to COMDATs. While the plugin step that creates the combined function map may decide to
select one representative COMDAT instance, we don’t want the design to preclude holding multiple as it may be advantageous to import a particular COMDAT from different modules in different backend instances."

The latter can be found in Section 3 of the symbol linkage RFC (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088025.html)

================
Comment at: include/llvm/IR/ThinLTOInfo.h:161
@@ +160,3 @@
+ public:
+  ThinLTOFunctionSummaryIndex() { }
+
----------------
joker.eph wrote:
> Why do you need this declaration?
Don't, removing it.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:164
@@ +163,3 @@
+  // Use default copy constructor. For the StringMaps this will
+  // call std::swap during the assignment.
+
----------------
joker.eph wrote:
> Not clear to me how a default copy constructor can involve "std::swap"?
> 
I don't think this is right now, looks like StringMap copy constructor calls std::move. I was trying to reason through what would happen if using the default copy constructor. But I think it is better to simply disable the copy constructor as I don't think we should need that.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:168
@@ +167,3 @@
+  // (e.g. from a native object wrapped bitcode file's symtab).
+  void setFunctionMap(ThinLTOFunctionMap &&FuncMap) {
+    FunctionMap = std::move(FuncMap);
----------------
joker.eph wrote:
> I believe the usual pattern is to take by value instead of an R-value ref, unless I missed something.
Will fix.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:188
@@ +187,3 @@
+    if (FunctionMap.count(FuncName))
+      FunctionMap.lookup(FuncName).push_back(Info);
+    else
----------------
joker.eph wrote:
> Since `ThinLTOFunctionInfo` owns the `FunctionSummary`, it sound it shouldn't be copied but moved, you'll run into double free. 
> (unless I misunderstood something about the `ThinLTOFunctionInfo` above, in which case forget this comment)
Yep, I am changing this.

================
Comment at: include/llvm/IR/ThinLTOInfo.h:191
@@ +190,3 @@
+      FunctionMap.insert(std::make_pair(FuncName,
+                                        ThinLTOFunctionInfoList(1, Info)));
+  }
----------------
joker.eph wrote:
> I don't think you gain anything by doing this test which involves two queries in the map (lookup + insert/update).
> What about just doing:
> ```
> FunctionMap.lookup(FuncName).push_back(std::move(Info));
> ```
> 
Good idea, will fix.

================
Comment at: lib/IR/ThinLTOInfo.cpp:24
@@ +23,3 @@
+void ThinLTOFunctionSummaryIndex::mergeFrom(
+    ThinLTOFunctionSummaryIndex *Other) {
+
----------------
joker.eph wrote:
> I'm just starting to understand `ThinLTOFunctionSummaryIndex` and some "weirdness" associated.
> Do you build a `ThinLTOFunctionSummaryIndex` for each module and only then you combine them together using this `mergeFrom()` method?
> 
> Looks like a missing abstraction level, some concepts are mixed together, which would explain why you are using a static `NextModuleId` here, which sounds fishy to me. Also this assumes that the `Other` will have an empty ModuleMap, i.e. all the functions come from the same module.
> (unless I misunderstood what is going on here)
I'll try to document this better with comments. Let me explain here and see if this makes more sense. This function is only used during the phase-2 linker step, when all we are doing is taking the per-module function indexes (pulled out of the bitcode IR .o files created by -fthinlto -c), and merging them into a global combined index that will be emitted as a separate file (for use during importing in the phase-3 parallel backends).

The per-module indexes don't have a module string table in bitcode, as the index is from a single module. So here we need to build the ModulePathStringTable in "this" index which is the combined index being built. See the gold-plugin.cpp call to mergeFrom() in the latest D11723 patch for how this is used.

When we build the module path string table for the combined index we need to assign a unique ID to each module, which is what NextModuleId is for. It doesn't technically need to be a static member, but I felt that better reflected the fact that this is only used for one index - the combined index.

Let me know if that makes sense or whether some changes to the data structure would better reflect this.

================
Comment at: lib/IR/ThinLTOInfo.cpp:44
@@ +43,3 @@
+    if (ModPath.empty()) {
+      ModPath = Info.functionSummary()->modulePath();
+      addModulePath(ModPath, NextModuleId);
----------------
joker.eph wrote:
> It seems to me that this supposed that `ThinLTOFunctionSummary ` has been parsed. If the `ModulePath` field was stored in the `ThinLTOFunctionInfo`, this wouldn't be an issue.
> But since I don't fully understand the reason for the separation of these two classes (I'm probably missing a use-case).
Yes, when combining the per-module indexes the function summary needs to have been fully parsed. Although since the per-module index doesn't actually have the module path string table, in the per-module summaries that field is populated once we parse a per-module index (from the module's identifier), not from the bitcode itself.

The reason for keeping the function summary separate from the function info struct is when reading the combined index during importing we want to support lazy reading of the summaries (since we won't need most of them).


http://reviews.llvm.org/D11721





More information about the llvm-commits mailing list