[LLVMdev] Format of special case list for sanitizers
Justin Bogner
mail at justinbogner.com
Mon Apr 13 16:38:52 PDT 2015
Alexey Samsonov <vonosmas at gmail.com> writes:
> Hi Ryan,
>
> On Sun, Apr 5, 2015 at 4:47 PM, Ryan Govostes <rzg at apple.com> wrote:
>> The documentation for the sanitizer special case list format[0] says,
>>
>> > The meanining of * in regular expression for entity names is different
>> - it is treated as in shell wildcarding.
>>
>> In SpecialCaseList::parse, we see that this is just replacing * with .*:
>>
>> // Replace * with .*
>> for (size_t pos = 0; (pos = Regexp.find("*", pos)) != std::string::npos;
>> pos += strlen(".*")) {
>> Regexp.replace(pos, strlen("*"), ".*");
>> }
>>
>> This seems to introduce more problems than it solves, since (i) this
>> doesn’t really behave like a shell globbing wildcard as advertised, and
>> (ii) if the user tries to use * as a regex quantifier, this will match
>> incorrectly: A* matches the empty string and any number of As, while A.*
>> matches all strings that start with at least one A.
>>
>> If it’s forgivable to break compatibility here, we should do regular
>> expressions _or_ shell globbing, and not a hybrid format.
>
> I agree that the current format description is misleading, e.g. "foo*bar"
> will also match "fooxxx/yyy/zzzbar", which might be unexpected for user
> expecting a shell globbing. For now we should at least change documentation
> to reflect that. In retrospective, replacing "*" with ".*" doesn't look like
> a good idea at all, and for simplicity I'd prefer to just use regular
> expressions. Adding special case for "file paths" isn't nice:
> 1) as you mention, we have to do careful escaping
> 2) at the moment special case list format is generic and is not tied to "src"
> or "fun" entities, it's specific sanitizers that introduce logic on top of
> it. I'd prefer to treat all special case list entries in a similar way.
>
> However, I'm really afraid of breaking compatibility :( I know several users
> of blacklist that already use "*" meaning ".*", and it would be challenging
> to migrate them - e.g. you have to keep two different blacklist files for
> older and newer Clang...
TBH, this doesn't seem like that big of a deal to me. The "*" behaviour
is strange and confusing, and I'd expect most people to use sanitizers
mostly from their newer compiler if they use multiple - they work
better. Compiling with older compilers is important for a lot of use
cases, but I can't see why people would run sanitizers from two
different versions of the compiler at the same time.
> The only option I see is to introduce versioning to special case list format.
> That's unfortunate and extra complexity, but can be useful if we decide to
> make further changes.
>
> I’d prefer shell globbing for paths in src entities, but that isn’t as
> useful for function names. Most filenames will contain periods, which
> also need to be escaped properly as regular expressions. (This also
> limits the usefulness of treating literals separately.)
>
> (Just a note: the way that regular expressions are concatenated in
> ::parse appears to have a bug if a pattern contains a pipe.)
>
> Patches welcome :) You can also open a bug against me, but I can't guarantee
> I will get to this (or the suggestion above) in the nearest future.
>
>
> Ryan
>
> 0: http://clang.llvm.org/docs/SanitizerSpecialCaseList.html
>
> diff --git a/lib/Support/SpecialCaseList.cpp b/lib/Support/
> SpecialCaseList.cpp
> index c312cc1..2972cb1 100644
> --- a/lib/Support/SpecialCaseList.cpp
> +++ b/lib/Support/SpecialCaseList.cpp
> @@ -133,7 +133,7 @@ bool SpecialCaseList::parse(const MemoryBuffer *MB,
> std::string &Error) {
> // Add this regexp into the proper group by its prefix.
> if (!Regexps[Prefix][Category].empty())
> Regexps[Prefix][Category] += "|";
> - Regexps[Prefix][Category] += "^" + Regexp + "$";
> + Regexps[Prefix][Category] += "^(" + Regexp + ")$)";
> }
> return true;
> }
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> --
> Alexey Samsonov
> vonosmas at gmail.com
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list