[PATCH] D96520: Reduce time spent parsing support files

Tue Mar 2 01:11:49 PST 2021

labath added a subscriber: teemperor.
labath added a comment.

Thanks for the explanation and the additional data, and sorry for the delay on my end.

Regarding the "standard" benchmarks, the only thing that comes close is this <https://teemperor.de/lldb-bench/static.html>. I'm not sure if it contains any test case which would demonstrate this (many compile units, many common support/include files), but maybe @teemperor could add one?

I'm afraid I think the new implementation is still not completely sound. The (include) directories in line table are not guaranteed to be absolute -- they can be relative (in DWARF <= 4) to the compilation directory (DW_AT_comp_dir) of the main unit. This is way `Prologue::getFileNameByIndex` takes an additional `compile_dir` argument. However, this directory is ignored in the computation of the cache key. So, although unlikely, one could run into the situation where a line table has the same file and directory entries, but they still refer to different files, because they are relative to different compilation directories.

Also, I can't escape the feeling that this should be achievable without an additional caching layer. In your benchmark, the hottest piece of code appears to be the `needsNormalization` function, which is a essentially a linear scan of the file name. However, one also needs a linear scan (maybe more than one) to implement std::map. What is it that makes `needsNormalization` so much slower? Is it the lack of vectorization due to the complex control flow? Could the function be rewritten to make it more optimizable?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96520/new/

https://reviews.llvm.org/D96520