[PATCH] D81719: [clangd] Drop usage of PreambleStatCache in scanPreamble
Kadir Cetinkaya via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Jun 16 07:10:19 PDT 2020
kadircet added a comment.
In D81719#2092589 <https://reviews.llvm.org/D81719#2092589>, @sammccall wrote:
> Thanks for all this investigation!
>
> > 80.71 0.002330 5 394 374 openat
>
> I'm curious what the 400 attempts and 20 successes are (I've seen this before but don't remember now). Probably not worth digging into though unless you happen to have the strace logs.
This is mostly gcc installation scanning for libc and such, biggest call site in https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/ToolChains/Gnu.cpp#L2408, which is called multiple times from https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/ToolChains/Gnu.cpp#L1907.
>> buildCompilerInvocation usage inside scanPreamble doesn't need any access to any files, so I suggest we just pass empty FS
>
> I guess this makes sense, My only worry is the driver getting into a different state if probing or cwd or something fails. But this really shouldn't affect preamble scanning. If it's safe, this seems worth doing just to have more isolation.
Changing this patch to do that instead.
>> we need a different cache for buildCompilerInvocation, one that caches dir_begin() failures
>
> Yeah this is complicated - worthwhile if the IO is actually adding ~20ms. Easiest way to tell if tracing tools aren't helping might be to use an empty FS and ignore all the resulting problems - timing for buildCompilerInvocation should be correct.
> If needed, maybe the record/replay FSes used for lldb reproducers are usable? Nice to avoid that complexity if possible though.
Benchmarked with an empty inmemoryfs and real filesystem using fallback commands(`clang a.cc`). there seems to be about a 6 times speed up when buildCompilerInvocation is run without IO.
empty fs takes about 0.17 ms on average, whereas real file system takes 1.01 ms on average.
Changed the compile command to something google3 sized (~400 args):
empty fs takes about 1.4ms, whereas the real IO takes about 1.8ms.
So shaving off some IO might help a lot for trivial command lines, but for complicated commands we need to improve command line parsing or start caching the result.
>> 48.73 2.244680 56 39747 tolower
>
> How many per call to buildCompilerInvocation? Maybe arg parsing is doing something dumb...
this is only a single call to buildCompilerInvocation :(
Just for fun, top 5 library calls in a complicated command line case (~400 args):
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
50.61 3.815022 54 69820 tolower
25.35 1.911093 55 34416 strlen
9.35 0.704789 53 13067 bcmp
5.09 0.383536 56 6773 _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm
3.69 0.278408 56 4935 _ZdlPv
so calls to tolower/strlen seems to be scaling sub-linearly (previous command line had only 2 args, so there's about 200x increase whereas call counts seem to have only doubled).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D81719/new/
https://reviews.llvm.org/D81719
More information about the cfe-commits
mailing list