[PATCH] D23599: [ThinLTO] Add caching to the new LTO API

Sat Aug 20 06:52:34 PDT 2016

tejohnson added a comment.

I had a bit of trouble following the flow, so I added in some suggestions for more comments and some assertion checking. Also, I think I found a couple of errors, see below. Once you have a fixed patch uploaded I will give it a try with gold-plugin.

================
Comment at: include/llvm/LTO/Caching.h:31
@@ +30,3 @@
+  SmallString<128> EntryPath;    ///< Path to this entry in the cache.
+  SmallString<128> TempFilename; ///< Path to an optional temporary file.
+  /// User supplied callback.
----------------
Is it really optional? It seems like we should always have a TempFilename if we tried to write to the cache via getStream(). Maybe: "Path to temporary file used to buffer output that will be written to a cache entry when this object is destroyed."

================
Comment at: include/llvm/LTO/Caching.h:32
@@ +31,3 @@
+  SmallString<128> TempFilename; ///< Path to an optional temporary file.
+  /// User supplied callback.
+  AddBufferFn AddBuffer;
----------------
Needs better description. E.g. "User-supplied callback to write the buffer loaded from cache into the native output object."

================
Comment at: include/llvm/LTO/Caching.h:49
@@ +48,3 @@
+
+  // Try loading from a possible cache first, return true on cache hit.
+  bool tryLoadFromCache(StringRef Key) override;
----------------
"return true and set EntryPath on cache hit."

================
Comment at: include/llvm/LTO/Caching.h:53
@@ +52,3 @@
+  // Returns true if a cache is available
+  bool isCachingEnabled() const override { return !CacheDirectoryPath.empty(); }
+};
----------------
Is it valid to create a CacheObjectOutput with an empty CacheDirectoryPath? Perhaps the constructor could assert that it is non-empty, and this could simply return true.

================
Comment at: lib/LTO/Caching.cpp:30
@@ +29,3 @@
+CacheObjectOutput::~CacheObjectOutput() {
+  if (EntryPath.empty())
+    return;
----------------
Suggest comment like "Return if client did not attempt caching via tryLoadFromCache".

================
Comment at: lib/LTO/Caching.cpp:32
@@ +31,3 @@
+    return;
+  if (!TempFilename.empty()) {
+    // Rename to final destination (hopefully race condition won't matter here)
----------------
I think TempFilename would only be empty here if tryLoadFromCache() returned true (so we didn't need to write a new cache entry). A comment would be helpful. E.g. something like "TempFilename will be non-empty if a new cache entry was created, so save it to EntryPath in the cache."

================
Comment at: lib/LTO/Caching.cpp:50
@@ +49,3 @@
+      OS << (*ReloadedBufferOrErr)->getBuffer();
+      AddBuffer(std::move(*ReloadedBufferOrErr));
+    }
----------------
It seems wrong that we don't have a return after this, otherwise in the case where a new cache entry is written AddBuffer() will be called twice. I guess it is simply inefficient but will work - you will load again from EntryPath that was just written, and rewrite to the output file.

================
Comment at: lib/LTO/Caching.cpp:53
@@ +52,3 @@
+  }
+  auto ReloadedBufferOrErr = MemoryBuffer::getFile(EntryPath);
+  if (auto EC = ReloadedBufferOrErr.getError())
----------------
Add a comment above here that this is the handling when tryLoadFromCache returned true and we can simply load the existing entry from the cache.

================
Comment at: lib/LTO/Caching.cpp:62
@@ +61,3 @@
+std::unique_ptr<raw_pwrite_stream> CacheObjectOutput::getStream() {
+  if (EntryPath.empty())
+    return nullptr;
----------------
This would happen if the caching client didn't first call tryLoadFromCache before passing along the created CacheObjectOutput. Seems like it would be clearer to make this assert here, as that seems to be wrong usage of this class.

================
Comment at: lib/LTO/LTO.cpp:48
@@ +47,3 @@
+// Returns a unique hash for the Module considering the current list of
+// export/import. The hash is produced in \p Key.
+static void computeCacheKey(
----------------
add "and other global analysis results" after "export/import".

================
Comment at: lib/LTO/LTO.cpp:55
@@ +54,3 @@
+    const GVSummaryMapTy &DefinedGlobals) {
+  // Compute the unique hash for this entry
+  // This is based on the current compiler version, the module itself, the
----------------
Add "." at end.

================
Comment at: lib/LTO/LTO.cpp:88
@@ +87,3 @@
+
+  // Include the hash for the preserved symbols.
+  for (auto &GS : DefinedGlobals) {
----------------
This is what the old version was doing, but here we don't have the list of preserved symbols here. Wouldn't this be the ExportedGUIDs set created in runThinLTO? That should presumably be passed down here too and used as in your existing cache key computation in ThinLTOCodeGenerator.

================
Comment at: lib/LTO/LTO.cpp:92
@@ +91,3 @@
+    Hasher.update(
+        ArrayRef<uint8_t>((const uint8_t *)&Linkage, sizeof(Linkage)));
+  }
----------------
This should be hashing GUIDs not linkage.

================
Comment at: lib/LTO/LTO.cpp:501
@@ +500,3 @@
+    // The module may be cached, this helps handling it.
+    computeCacheKey(Key, CombinedIndex, ModuleIdentifier, ImportList,
+                    ExportList, ResolvedODR, DefinedGlobals);
----------------
As per offline discussion, move this block under isCachingEnabled()

================
Comment at: lib/LTO/LTO.cpp:514
@@ +513,3 @@
+      assert(Task == TaskId && "Unexpexted TaskId mismatch");
+      return std::move(Output);
+    };
----------------
What if caching not enabled? In that case we shouldn't call addOutput above, so we won't have an Output. It should presumably just call addOutput() here then before returning. Add a comment that this returns the cached output if enabled.

https://reviews.llvm.org/D23599