[PATCH] D63907: [clang-scan-deps] Implementation of dependency scanner over minimized sources

Tue Jul 30 14:08:38 PDT 2019

arphaman added inline comments.

================
Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:1
+//===- DependencyScanningFilesystem.h - clang-scan-deps fs ===---*- C++ -*-===//
+//
----------------
aganea wrote:
> General comment for this file and the implementation: it seems there's nothing specific to the dependency scanning, except for replacing the content with minimized content? This cached FS could be very well used to compile a project with parallel workers. Could this be part of the `VirtualFileSystem` infrastructure? ( `llvm/include/llvm/Support/VirtualFileSystem.h`) If yes, can you possibly create a separate patch for this? @JDevlieghere @sammccall 
I think it could be possible to separate out the cache, but we definitely need a subclass of a VFS to handle some Clang specific logic for how to determine how to deal with module files for instance. I wouldn't be opposed to factoring it out if people are interested. I think that can be done as a follow-up if there's an interest though.

================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:84
+    DependencyScanningFilesystemSharedCache() {
+  NumShards = 8;
+  CacheShards = llvm::make_unique<CacheShard[]>(NumShards);
----------------
aganea wrote:
> This line needs a comment. Is this value based on empirical results across different hardwares? (I can't recall your answer at the LLVM conf) I am wondering what would be the good strategy here? The more cores/HT in your PC, the more chances that you'll hit the same shard, thus locking. But the bigger we make this value, the more `StringMaps` we will have, and more cache misses possibly.
> Should it be something like `llvm::hardware_concurrency() / some_fudge`? It'd be interesting to subsequently profile on high core count machines (or maybe you have done that).
I rewrote it to use a heuristic we settled on after doing empirical testing on an 18 core / 36 thread machine ( max(2, thread_count / 4) ), and added a comment. This was the number `9` (36 / 4) I mentioned at the talk, so we only got to it because of a heuristic. I think now after some SDK/OS updates the effect from adding more shards is less pronounced, so it mostly flatlines past some number between 5-10. A better heuristic would probably be OS specific to take the cost of lock contention into account.

Note that the increased number of shards does not increase the number of cache misses, because the shard # is determined by the filename (we don't actually have global cache misses, as the cache is designed to have only one miss per file when it's first accessed)! It's just that an increased number of shards doesn't improve performance after hitting some specific limit, so we want to find a point where we can taper it off.

It would still be definitely interesting to profile it on other high core machines with different OSes to see if it's a reasonable heuristic for those scenarios too.

================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp:148
+  RealFS = new ProxyFileSystemWithoutChdir(llvm::vfs::getRealFileSystem());
+  if (Service.getMode() == ScanningMode::MinimizedSourcePreprocessing)
+    DepFS = new DependencyScanningFilesystem(Service.getSharedCache(), RealFS);
----------------
aganea wrote:
> Can we not use caching all the time?
We want to have a mode where it's as close to the regular `clang -E` invocation as possible for correctness CI and testing. We also haven't evaluated if the cost of keeping non-minimized sources in memory, so it might be too expensive for practical use? I can add a third option that caches but doesn't minimize though as a follow-up patch though 

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63907/new/

https://reviews.llvm.org/D63907