[all-commits] [llvm/llvm-project] 139067: [SpecialCaseList] Remove TrigramIndex

Ellis Hoag via All-commits all-commits at lists.llvm.org
Mon Jun 19 10:42:00 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 13906755427611fe2d1b1bf915e87b97ddffd236
      https://github.com/llvm/llvm-project/commit/13906755427611fe2d1b1bf915e87b97ddffd236
  Author: Ellis Hoag <ellis.sparky.hoag at gmail.com>
  Date:   2023-06-19 (Mon, 19 Jun 2023)

  Changed paths:
    M llvm/include/llvm/Support/SpecialCaseList.h
    R llvm/include/llvm/Support/TrigramIndex.h
    M llvm/lib/Support/CMakeLists.txt
    M llvm/lib/Support/SpecialCaseList.cpp
    R llvm/lib/Support/TrigramIndex.cpp
    M llvm/unittests/Support/CMakeLists.txt
    R llvm/unittests/Support/TrigramIndexTest.cpp
    M llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
    M llvm/utils/gn/secondary/llvm/unittests/Support/BUILD.gn

  Log Message:
  -----------
  [SpecialCaseList] Remove TrigramIndex

`TrigramIndex` was added back in https://reviews.llvm.org/D27188 as an optimization to make `SpecialCaseList::match()` faster. I've found that `TrigramIndex` actually makes the function slower and it has no functional use, so we can remove it.

I grabbed the list of queries passed to `SpecialCaseList::match()` on a random very large file (`AArch64ISelLowering.cpp`) and measured the runtime to call `match()` on all of them with [this line](https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/lib/Support/SpecialCaseList.cpp#L64) disabled and then enabled.

```
$ hyperfine --warmup 3 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests' 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests'
Benchmark 1: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests
  Time (mean ± σ):     575.9 ms ±  20.3 ms    [User: 573.1 ms, System: 2.7 ms]
  Range (min … max):   555.5 ms … 620.0 ms    10 runs

Benchmark 2: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests
  Time (mean ± σ):     283.4 ms ±   6.7 ms    [User: 280.3 ms, System: 3.0 ms]
  Range (min … max):   277.0 ms … 294.9 ms    10 runs

Summary
  'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests' ran
    2.03 ± 0.09 times faster than 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests'
```

Using `perf` I found that most of the runtime in `TrigramIndex::isDefinitelyOut()` comes from a division operation that seems to come from `std::unordered_map`: https://github.com/llvm/llvm-project/blob/8e1f820bb4eadf5c0704818f6063e0db1006e32d/llvm/include/llvm/Support/TrigramIndex.h#L62

Removing `TrigramIndex` will make it easier to potentially switch to using `GlobPattern` instead of a full regex for `SpecialCaseList`. See discussion in https://reviews.llvm.org/D152762 for details.

Reviewed By: MaskRay, #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D153171




More information about the All-commits mailing list