[PATCH] D17066: libLTO: add a ThinLTOCodeGenerator on the model of LTOCodeGenerator.

Mon Mar 7 22:55:53 PST 2016

joker.eph marked 27 inline comments as done.
joker.eph added a comment.

Thanks Teresa for the review. See answers inlined.

================
Comment at: include/llvm-c/lto.h:666
@@ +665,3 @@
+/**
+ * Sets the expiration (in seconds) for an entry in the cache.
+ *
----------------
tejohnson wrote:
> What is the default?
No guarantee provided by the API :)

================
Comment at: include/llvm/LTO/ThinLTOCodeGenerator.h:46
@@ +45,3 @@
+
+/// Cache behavior controls.
+struct CachingOptions {
----------------
tejohnson wrote:
> How do PruningInterval and Expiration interact? Are they mutually exclusive, or does a module get pruned at the min of these?
PruningInterval controls the interval between two check on the cache: i.e. if you set 2h the plugin will not bother checking for cache invalidation for the next two hours.
Cache Expiration is per entry: when pruning you are deleting the entries older than this value.
I documented getter/setter, not enough?

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:1
@@ +1,2 @@
+//===-ThinLTOCodeGenerator.cpp - LLVM Link Time Optimizer -----------------===//
+//
----------------
ygao wrote:
> This is probably copied from LTOCodeGenerator.cpp. Can you confirm that the summary line is accurate?
It is not very helpful, but matches LTOCodeGenerator indeed.

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:65
@@ +64,3 @@
+        getLazyBitcodeModule(MemoryBuffer::getMemBuffer(Buffer, false), Context,
+                             /* ShouldLazyLoadMetadata */ false);
+  } else {
----------------
tejohnson wrote:
> Why not lazy load metadata too?
Aren't they almost immediately materialized anyway? I thought that laziness would have a cost for no real benefit maybe?
(I didn't see a change on the memory profile when lazy loading or not)

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:84
@@ +83,3 @@
+  // User asked to save temps, let dump the bitcode file after import.
+  auto SaveTempPath = TempDir + llvm::utostr(count) + Suffix;
+  std::error_code EC;
----------------
tejohnson wrote:
> Include Module's SourceFileName to get more meaningful dump file names?
SourceFileName can fit! Good point.

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:96
@@ +95,3 @@
+/// \return true if the global will be internalized.
+static bool InternalizeGlobal(Module &TheModule, GlobalValue &GV,
+                              StringSet<> &Keep, const FunctionInfoIndex &Index,
----------------
tejohnson wrote:
> This is doing more than internalization, so name should probably be adjusted to reflect that. I don't have a good name off the top of my head. Maybe split into 2 and call the first something like resolveLinkOnceAndWeak, and if that returns false then invoke the internalization handling (maybe outline  keepIfPreserved and make it return a bool).
I remove the internalization phase from this patch for now. This will change a lot with the graph in the summary.

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:104
@@ +103,3 @@
+
+  if (GV.hasAvailableExternallyLinkage()) {
+    Keep.insert(GV.getName());
----------------
tejohnson wrote:
> Should available_externally also be kept in SingleModule mode?
Good point. Dunno.

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:112
@@ +111,3 @@
+
+  // There is nothing more restrictive than private linkage.
+  if (GV.hasPrivateLinkage())
----------------
tejohnson wrote:
> I'm confused by the comment. It is already local if private, so no need to internalize or add to Keep, which is consistent with what the code is doing. I just don't understand what the comment about being restrictive is saying.
This is copy/pasted from  `LTOCodeGenerator::applyRestriction`. I read it as "don't need to do anything thing you won't internalize any more something already private"

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:160
@@ +159,3 @@
+  }
+  if (Linkage == GlobalValue::LinkOnceODRLinkage) {
+    // We need to emit only one of these, the first module will keep
----------------
tejohnson wrote:
> Combine the LinkOnce cases and just pick the appropriate Weak linkage based on the LinkOnce linkage, the handling is identical otherwise.
OK, I'll keep this for a future version of the patch, I removed the internalize for now.

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:317
@@ +316,3 @@
+      // optimizations already ran, we have a pre-optimized copy.
+      return loadModuleFromBuffer(OptimizedBuffer->getMemBufferRef(), Ctx,
+                                  Lazy);
----------------
tejohnson wrote:
> Does this need to be guarded by lock? Comments for LockM indicate it is just to guard optimization.
You need to block "reader" of `OptimizedBuffer` in case there is already a `writer` populating it.
(in case of contention on the lock with too many reader there would be a possible mitigation with atomic flags for a fast-path, but not worth it here I think)

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:329
@@ +328,3 @@
+    if (!PreservedSymbols.empty() || !CrossReferencedSymbols.empty())
+      InternalizeModule(*TheModule, Index, PreservedSymbols,
+                        CrossReferencedSymbols, SingleModule);
----------------
tejohnson wrote:
> Why don't we want to do this when there are no preserved or cross referenced symbols? Couldn't the internalization be even more aggressive in that case? Also, wouldn't we at least want the linkonce/weak handling?
No preserved or cross referenced symbols means that you will internalize *everything* and then global DCE will remove *everything*. This is of little interest. So I just considered that empty() means the linker didn't provide any information at all. It helps implementing testing with llvm-lto as well.

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:345
@@ +344,3 @@
+  /// Helper map to find the original Module buffer from its identifier
+  /// !! Relies on having a unique getBufferIdentifier() !!
+  StringMap<MemoryBufferRef> ModuleMap;
----------------
tejohnson wrote:
> When would this not be the case?
The buffers identifier are supplied by the linker and can be anything:

```
extern void thinlto_codegen_add_module(thinlto_code_gen_t cg,
                                       const char *identifier, const char *data,
                                       int length);
```

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:352
@@ +351,3 @@
+  /// Map of Module Identifier to MemoryBuffer used to load a Module.
+  std::unordered_map<std::string, EarlyOptimizedModule> OptimizedModuleMap;
+
----------------
tejohnson wrote:
> Out of curiosity, why a StringMap for ModuleMap and an unordered_map from string here? They are both index by the identifier string.
Here there is no good reason I think. I think that StringMap will allocate the entry with the key, so you want a value that is quite small (for rehashing / growing the map).

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:426
@@ +425,3 @@
+  PMB.LoopVectorize = true;
+  PMB.SLPVectorize = true;
+  PMB.VerifyInput = true;
----------------
bruno wrote:
> Although not really necessary for this patch, it would be nice to have the flags to disable the inliner and vectorizer like we currently have in full LTO. This is specially useful when investigating differences between full and thin.
Good point, will update to try using a set of flags as close as possible the LTOCodeGenerator

================
Comment at: lib/LTO/ThinLTOCodeGenerator.cpp:592
@@ +591,3 @@
+  /// Helper to load a cached module after early global optimizations
+  PreOptimizedModuleMap ModuleMap(Modules, Index, PreservedSymbols,
+                                  CrossReferencedSymbols);
----------------
tejohnson wrote:
> Doesn't instantiating here mean that the map isn't leveraged across threads? Oh, I see that this routine and the above promote() are only invoked from llvm-lto. Is there a reason why llvm-lto doesn't use ThinLTOCodeGenerator::run()?
The reason llvm-lto does not use `run()` is to test steps in isolation.

http://reviews.llvm.org/D17066