[PATCH] D154014: [SpecialCaseList] Use Globs instead of Regex

Ellis Hoag via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jun 29 12:25:42 PDT 2023


ellis added a comment.

In D154014#4461076 <https://reviews.llvm.org/D154014#4461076>, @rupprecht wrote:

> In D154014#4457883 <https://reviews.llvm.org/D154014#4457883>, @MaskRay wrote:
>
>>> This is a breaking change since some SCLs might use .* or (abc|def) which are supported regexes but not valid globs. Since we have just cut clang 16.x this is a good time to make this change.
>>
>> My user has some ignore lists, but there is no `^[a-z]+:.*\(` or `^[a-z]+:\.` occurrence, so this change is likely safe for us.
>
> I think I'm looking at the same lists, and I see plenty of `.*` in the sanitizer exclusion lists. Also a few cases of `(abc|def|...)`. IIUC, those would both be broken by this -- instead of `.*` meaning "any character any number of times" (regex) it would mean "dot followed by any number of characters" (glob), right? And the `(abc|...)` would just be that literal text, not matching the individual parts?
>
> If my understanding of that is correct, I don't think this is a good change -- there's possibly plenty of configs out there that assume this is regex, and there doesn't seem to be sufficient motivation to just break those. But I can see that globs are useful for the examples posted in the patch description. Is it possible to have some middle ground, e.g. default to regex but allow a config at the top of sanitizer lists to interpret future patterns as globs instead?

If a sanitizer list currently has `.*` it will be replaced with `..*` here <https://github.com/llvm/llvm-project/blob/085845a2acbefd26d5c229338225dfd76e2c2df3/llvm/lib/Support/SpecialCaseList.cpp#L41-L45> meaning that it matches one or more characters, so those lists may already be broken. This is exactly the confusion we would like to avoid. And you are correct that `(abc|def)` will be treated as a literal with this change, the but fix to `{abc,def}` is trivial.

That's a nice idea to add a config at the top to specify glob or regex to give users more time to migrate their lists.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154014/new/

https://reviews.llvm.org/D154014



More information about the cfe-commits mailing list