[PATCH] D50534: [libc++] Fix handling of negated character classes in regex

Louis Dionne via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Aug 10 12:29:25 PDT 2018


ldionne added a comment.

In https://reviews.llvm.org/D50534#1194664, @timshen wrote:

> I'm not fully equipped with the context right now, but something doesn't add up. if `__neg_chars_.empty()` check is removed, the `(__neg_mask_ == 0)` above should be removed too. They have to be consistent.
>
> However, there is more weirdness in it. The comment above describes the intention:
>
>   union(complement(union(__neg_chars_, __neg_mask_)), other cases...)
>   
>
> With the `__neg_chars_.empty()` and `(__neg_mask_ == 0)` removed, I believe that the code exactly matches the comment. Let's see what happens when users don't specify any negative class or chars. `__neg_chars_` and `__neg_mask_` will be empty sets, and `union(complement(union(__neg_chars_, __neg_mask_)), other cases...)` always evaluate to full set, which means it always matches all characters. This can't be right.
>
> It's likely that the comment description doesn't fully describe the intended behavior. I think we need to figure that out first.


Yes, I think you were right. I think what was missing is that the comment does not apply when there's no `neg_mask` and no `neg_chars`. Otherwise we get into that situation where we're taking the complement of an empty set, and we match everything, which is not what we want.


Repository:
  rCXX libc++

https://reviews.llvm.org/D50534





More information about the cfe-commits mailing list