[LLVMdev] Format of special case list for sanitizers

Justin Bogner mail at justinbogner.com
Mon Apr 13 16:38:52 PDT 2015


Alexey Samsonov <vonosmas at gmail.com> writes:
> Hi Ryan,
>
> On Sun, Apr 5, 2015 at 4:47 PM, Ryan Govostes <rzg at apple.com> wrote:
>> The documentation for the sanitizer special case list format[0] says,
>>
>> > The meanining of * in regular expression for entity names is different
>> - it is treated as in shell wildcarding.
>>
>> In SpecialCaseList::parse, we see that this is just replacing * with .*:
>>
>>     // Replace * with .*
>>     for (size_t pos = 0; (pos = Regexp.find("*", pos)) != std::string::npos;
>>          pos += strlen(".*")) {
>>       Regexp.replace(pos, strlen("*"), ".*");
>>     }
>>
>> This seems to introduce more problems than it solves, since (i) this
>> doesn’t really behave like a shell globbing wildcard as advertised, and
>> (ii) if the user tries to use * as a regex quantifier, this will match
>> incorrectly: A* matches the empty string and any number of As, while A.*
>> matches all strings that start with at least one A.
>>
>> If it’s forgivable to break compatibility here, we should do regular
>> expressions _or_ shell globbing, and not a hybrid format.
>
> I agree that the current format description is misleading, e.g. "foo*bar"
> will also match "fooxxx/yyy/zzzbar", which might be unexpected for user
> expecting a shell globbing. For now we should at least change documentation
> to reflect that. In retrospective, replacing "*" with ".*" doesn't look like
> a good idea at all, and for simplicity I'd prefer to just use regular
> expressions. Adding special case for "file paths" isn't nice:
> 1) as you mention, we have to do careful escaping
> 2) at the moment special case list format is generic and is not tied to "src"
> or "fun" entities, it's specific sanitizers that introduce logic on top of
> it. I'd prefer to treat all special case list entries in a similar way.
>
> However, I'm really afraid of breaking compatibility :( I know several users
> of blacklist that already use "*" meaning ".*", and it would be challenging
> to migrate them - e.g. you have to keep two different blacklist files for
> older and newer Clang...

TBH, this doesn't seem like that big of a deal to me. The "*" behaviour
is strange and confusing, and I'd expect most people to use sanitizers
mostly from their newer compiler if they use multiple - they work
better. Compiling with older compilers is important for a lot of use
cases, but I can't see why people would run sanitizers from two
different versions of the compiler at the same time.

> The only option I see is to introduce versioning to special case list format.
> That's unfortunate and extra complexity, but can be useful if we decide to
> make further changes.
>
>     I’d prefer shell globbing for paths in src entities, but that isn’t as
>     useful for function names. Most filenames will contain periods, which
>     also need to be escaped properly as regular expressions. (This also
>     limits the usefulness of treating literals separately.)
>    
>     (Just a note: the way that regular expressions are concatenated in
>     ::parse appears to have a bug if a pattern contains a pipe.)
>
> Patches welcome :) You can also open a bug against me, but I can't guarantee
> I will get to this (or the suggestion above) in the nearest future.
>  
>
>     Ryan
>
>     0: http://clang.llvm.org/docs/SanitizerSpecialCaseList.html
>
>     diff --git a/lib/Support/SpecialCaseList.cpp b/lib/Support/
>     SpecialCaseList.cpp
>     index c312cc1..2972cb1 100644
>     --- a/lib/Support/SpecialCaseList.cpp
>     +++ b/lib/Support/SpecialCaseList.cpp
>     @@ -133,7 +133,7 @@ bool SpecialCaseList::parse(const MemoryBuffer *MB,
>     std::string &Error) {
>          // Add this regexp into the proper group by its prefix.
>          if (!Regexps[Prefix][Category].empty())
>            Regexps[Prefix][Category] += "|";
>     -    Regexps[Prefix][Category] += "^" + Regexp + "$";
>     +    Regexps[Prefix][Category] += "^(" + Regexp + ")$)";
>        }
>        return true;
>      }
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> --
> Alexey Samsonov
> vonosmas at gmail.com
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list