[all-commits] [llvm/llvm-project] 653de1: [Support] Optimize (.*) regex matches

Nikita Popov via All-commits all-commits at lists.llvm.org
Tue Apr 19 00:55:37 PDT 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 653de14f17215c540a7dc058acb3c29a28ef0f1c
      https://github.com/llvm/llvm-project/commit/653de14f17215c540a7dc058acb3c29a28ef0f1c
  Author: Nikita Popov <npopov at redhat.com>
  Date:   2022-04-19 (Tue, 19 Apr 2022)

  Changed paths:
    M llvm/lib/Support/regengine.inc

  Log Message:
  -----------
  [Support] Optimize (.*) regex matches

If capturing groups are used, the regex matcher handles something
like `(.*)suffix` by first doing a maximal match of `.*`, trying to
match `suffix` afterward, and then reducing the maximal stop
position one by one until this finally succeeds. This makes the
match quadratic in the length of the line (with large constant factors).

This is particularly problematic because regexes of this form are
ubiquitous in FileCheck (something like `[[VAR:%.*]] = ...` falls
in this category), making FileCheck executions much slower than
they have any right to be.

This implements a very crude optimization that checks if suffix
starts with a fixed character, and steps back to the last occurrence
of that character, instead of stepping back by one character at a
time. This drops FileCheck time on
clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg_mask.c from
7.3 seconds to 2.7 seconds.

An obvious further improvement would be to check more than one
character (once again, this is particularly relevant for FileCheck,
because the next character is usually a space, which happens to
have many occurrences).

This should help with https://github.com/llvm/llvm-project/issues/54821.




More information about the All-commits mailing list