[PATCH] D81719: [clangd] Drop usage of PreambleStatCache in scanPreamble

Tue Jun 16 07:10:19 PDT 2020

kadircet added a comment.

In D81719#2092589 <https://reviews.llvm.org/D81719#2092589>, @sammccall wrote:

> Thanks for all this investigation!
>
> >   80.71    0.002330           5       394       374 openat
>
> I'm curious what the 400 attempts and 20 successes are (I've seen this before but don't remember now). Probably not worth digging into though unless you happen to have the strace logs.

This is mostly gcc installation scanning for libc and such, biggest call site in https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/ToolChains/Gnu.cpp#L2408, which is called multiple times from https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/ToolChains/Gnu.cpp#L1907.

>> buildCompilerInvocation usage inside scanPreamble doesn't need any access to any files, so I suggest we just pass empty FS
> 
> I guess this makes sense, My only worry is the driver getting into a different state if probing or cwd or something fails. But this really shouldn't affect preamble scanning. If it's safe, this seems worth doing just to have more isolation.

Changing this patch to do that instead.

>> we need a different cache for buildCompilerInvocation, one that caches dir_begin() failures
> 
> Yeah this is complicated - worthwhile if the IO is actually adding ~20ms. Easiest way to tell if tracing tools aren't helping might be to use an empty FS and ignore all the resulting problems - timing for buildCompilerInvocation should be correct.
>  If needed, maybe the record/replay FSes used for lldb reproducers are usable? Nice to avoid that complexity if possible though.

Benchmarked with an empty inmemoryfs and real filesystem using fallback commands(`clang a.cc`). there seems to be about a 6 times speed up when buildCompilerInvocation is run without IO. 
empty fs takes about 0.17 ms on average, whereas real file system takes 1.01 ms on average.

Changed the compile command to something google3 sized (~400 args):
empty fs takes about 1.4ms, whereas the real IO takes about 1.8ms.

So shaving off some IO might help a lot for trivial command lines, but for complicated commands we need to improve command line parsing or start caching the result.

>> 48.73    2.244680          56     39747 tolower
> 
> How many per call to buildCompilerInvocation? Maybe arg parsing is doing something dumb...

this is only a single call to buildCompilerInvocation :(

Just for fun, top 5 library calls in a complicated command line case (~400 args):

  % time     seconds  usecs/call     calls      function
  ------ ----------- ----------- --------- --------------------
   50.61    3.815022          54     69820 tolower
   25.35    1.911093          55     34416 strlen
    9.35    0.704789          53     13067 bcmp
    5.09    0.383536          56      6773 _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm
    3.69    0.278408          56      4935 _ZdlPv

so calls to tolower/strlen seems to be scaling sub-linearly (previous command line had only 2 args, so there's about 200x increase whereas call counts seem to have only doubled).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81719/new/

https://reviews.llvm.org/D81719