[PATCH] D153171: [SpecialCaseList] Remove TrigramIndex
Vitaly Buka via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jun 18 21:20:57 PDT 2023
vitalybuka accepted this revision.
vitalybuka added a comment.
This revision is now accepted and ready to land.
In D153171#4430101 <https://reviews.llvm.org/D153171#4430101>, @MaskRay wrote:
>> Using perf I found that most of the runtime in TrigramIndex::isDefinitelyOut() comes from a division operation that seems to come from std::unordered_map: https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/include/llvm/Support/TrigramIndex.h#L62
>
> I guess it doesn't use `DenseMap` to protect the case that the key collides with the empty/tombstone keys, but `std::unordered_map` has very bad performance...
> Another problem with TrigramIndex is that we need to iterate over the full query, while with a regex engine I think we can often bail out early.
>
> I have tested TrigramIndex removal with a large `ubsan_ignorelist.txt` and the performance is better (1.10+ x as fast). We should give some time for #sanitizers <https://reviews.llvm.org/tag/sanitizers/> folks to respond.
I tried with artificially complex list for msan of ~1000 entry, and no-TrigramIndex is faster. By my estimate, regex matches costs ~2% of total compilation time. But it was artificial list. On the real list, about the same size, most entries isLiteralERE, so I can see regex cost at all. But If we have to optimize, maybe easy way is to split regexps into groups with fixed prefix, suffix, the rest... or even do not use regex at all.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D153171/new/
https://reviews.llvm.org/D153171
More information about the llvm-commits
mailing list