[LLVMdev] Format of special case list for sanitizers

Alexey Samsonov vonosmas at gmail.com
Thu Apr 16 18:46:59 PDT 2015


On Mon, Apr 13, 2015 at 4:38 PM, Justin Bogner <mail at justinbogner.com>
wrote:

> Alexey Samsonov <vonosmas at gmail.com> writes:
> > Hi Ryan,
> >
> > On Sun, Apr 5, 2015 at 4:47 PM, Ryan Govostes <rzg at apple.com> wrote:
> >> The documentation for the sanitizer special case list format[0] says,
> >>
> >> > The meanining of * in regular expression for entity names is different
> >> - it is treated as in shell wildcarding.
> >>
> >> In SpecialCaseList::parse, we see that this is just replacing * with .*:
> >>
> >>     // Replace * with .*
> >>     for (size_t pos = 0; (pos = Regexp.find("*", pos)) !=
> std::string::npos;
> >>          pos += strlen(".*")) {
> >>       Regexp.replace(pos, strlen("*"), ".*");
> >>     }
> >>
> >> This seems to introduce more problems than it solves, since (i) this
> >> doesn’t really behave like a shell globbing wildcard as advertised, and
> >> (ii) if the user tries to use * as a regex quantifier, this will match
> >> incorrectly: A* matches the empty string and any number of As, while A.*
> >> matches all strings that start with at least one A.
> >>
> >> If it’s forgivable to break compatibility here, we should do regular
> >> expressions _or_ shell globbing, and not a hybrid format.
> >
> > I agree that the current format description is misleading, e.g. "foo*bar"
> > will also match "fooxxx/yyy/zzzbar", which might be unexpected for user
> > expecting a shell globbing. For now we should at least change
> documentation
> > to reflect that. In retrospective, replacing "*" with ".*" doesn't look
> like
> > a good idea at all, and for simplicity I'd prefer to just use regular
> > expressions. Adding special case for "file paths" isn't nice:
> > 1) as you mention, we have to do careful escaping
> > 2) at the moment special case list format is generic and is not tied to
> "src"
> > or "fun" entities, it's specific sanitizers that introduce logic on top
> of
> > it. I'd prefer to treat all special case list entries in a similar way.
> >
> > However, I'm really afraid of breaking compatibility :( I know several
> users
> > of blacklist that already use "*" meaning ".*", and it would be
> challenging
> > to migrate them - e.g. you have to keep two different blacklist files for
> > older and newer Clang...
>
> TBH, this doesn't seem like that big of a deal to me. The "*" behaviour
> is strange and confusing, and I'd expect most people to use sanitizers
> mostly from their newer compiler if they use multiple - they work
> better. Compiling with older compilers is important for a lot of use
> cases, but I can't see why people would run sanitizers from two
> different versions of the compiler at the same time.
>

Well, suppose you build your project with Clang 3.6 and have
my/project/asan_blacklist.txt
checked in your VCS. Now, if you will decide to switch to Clang 3.7 you'd
have to *atomically*
upgrade the contents of my/project/asan_blacklist.txt. If some developers
are still using Clang 3.6,
you'd have to keep the older version there, and do magic tricks in the
build system so that it
would choose between two versions of asan_blacklist.txt. I'm not saying
it's impossible
(things will be easier for you if you use default blacklist, for instance),
it's just painful.

You will also not be able to see the problem instantly - your old blacklist
will be treated
differently, and suddenly some functions blacklisted long time ago would be
silently un-blacklisted,
triggering long-closed bugs.

-fsanitize-old-blacklist-format, which Nick suggests below, is an option.
However, that would
also require passing different flags with different compiler versions. And,
again, we could decide
to add more blacklist features in the future. Or GCC people could implement
some version of it...

Another option is to add versioning to the file format (we had to do smth.
similar for coverage
file format, for instance). E.g. if old file was:
----
fun:foo*bar
src:*baz.cpp
----

new file will be:

----
### 0002 ####
# First line contains 4-digit string specifying the version number
fun:foo.*bar
src:.*baz.cpp
----



>
> > The only option I see is to introduce versioning to special case list
> format.
> > That's unfortunate and extra complexity, but can be useful if we decide
> to
> > make further changes.
> >
> >     I’d prefer shell globbing for paths in src entities, but that isn’t
> as
> >     useful for function names. Most filenames will contain periods, which
> >     also need to be escaped properly as regular expressions. (This also
> >     limits the usefulness of treating literals separately.)
> >
> >     (Just a note: the way that regular expressions are concatenated in
> >     ::parse appears to have a bug if a pattern contains a pipe.)
> >
> > Patches welcome :) You can also open a bug against me, but I can't
> guarantee
> > I will get to this (or the suggestion above) in the nearest future.
> >
> >
> >     Ryan
> >
> >     0: http://clang.llvm.org/docs/SanitizerSpecialCaseList.html
> >
> >     diff --git a/lib/Support/SpecialCaseList.cpp b/lib/Support/
> >     SpecialCaseList.cpp
> >     index c312cc1..2972cb1 100644
> >     --- a/lib/Support/SpecialCaseList.cpp
> >     +++ b/lib/Support/SpecialCaseList.cpp
> >     @@ -133,7 +133,7 @@ bool SpecialCaseList::parse(const MemoryBuffer
> *MB,
> >     std::string &Error) {
> >          // Add this regexp into the proper group by its prefix.
> >          if (!Regexps[Prefix][Category].empty())
> >            Regexps[Prefix][Category] += "|";
> >     -    Regexps[Prefix][Category] += "^" + Regexp + "$";
> >     +    Regexps[Prefix][Category] += "^(" + Regexp + ")$)";
> >        }
> >        return true;
> >      }
> >
> >     _______________________________________________
> >     LLVM Developers mailing list
> >     LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> > --
> > Alexey Samsonov
> > vonosmas at gmail.com
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>



-- 
Alexey Samsonov
vonosmas at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150416/76d00f70/attachment.html>


More information about the llvm-dev mailing list