[PATCH] D153171: [SpecialCaseList] Remove TrigramIndex

Vitaly Buka via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jun 18 21:20:57 PDT 2023


vitalybuka accepted this revision.
vitalybuka added a comment.
This revision is now accepted and ready to land.

In D153171#4430101 <https://reviews.llvm.org/D153171#4430101>, @MaskRay wrote:

>> Using perf I found that most of the runtime in TrigramIndex::isDefinitelyOut() comes from a division operation that seems to come from std::unordered_map: https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/include/llvm/Support/TrigramIndex.h#L62
>
> I guess it doesn't use `DenseMap` to protect the case that the key collides with the empty/tombstone keys, but `std::unordered_map` has very bad performance...
> Another problem with TrigramIndex is that we need to iterate over the full query, while with a regex engine I think we can often bail out early.
>
> I have tested TrigramIndex removal with a large `ubsan_ignorelist.txt` and the performance is better (1.10+ x as fast). We should give some time for #sanitizers <https://reviews.llvm.org/tag/sanitizers/> folks to respond.

I tried with artificially complex list for msan of ~1000 entry, and no-TrigramIndex is faster. By my estimate, regex matches costs ~2% of total compilation time. But it was artificial list. On the real list, about the same size, most entries isLiteralERE, so I can see regex cost at all. But If we have to optimize, maybe easy way is to split regexps into groups with fixed prefix, suffix, the rest... or even do not use regex at all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153171/new/

https://reviews.llvm.org/D153171



More information about the llvm-commits mailing list