[PATCH] D27463: Change llvm::Regex to expose a fallible constructor.

Wed Dec 7 14:44:29 PST 2016

dlj added a comment.

In https://reviews.llvm.org/D27463#616195, @vsk wrote:

> >> That's the end goal I'm aiming for in the current change, but by a different route: for regexes compiled from static char[] constants, I'm just letting the Expected<Regex> propagate the error.
> > 
> > err, rather, *not* propagate the error -- and crash with asserts enabled.
>
> I think it'd be better to add in 'no error' asserts to the methods in Regex. The advantages are that (1) we can catch broken regexes / broken tests up front, instead of waiting for downstream projects to opt-in to the new constructor,

So just to be clear... you're proposing a different breaking change, such that an invalid regex causes an assertion failure, instead of silently clamping to "no match?" That still cements the need for the error field, though; and it also seems to block making match() const -- the semantics of isValid would silently change, such that isValid() would stop reporting errors that arise during match().

The alternative proposal also has a different, more philosophical problem: it opens a broad window for validation escape in new uses of Regex. I already see plenty of rollbacks for things that only fail with asserts enabled. Without a stronger contract of "never match an invalid regex," it would just add another sharp edge, and I'm worried it would start to cut. Making the error check part of the type seems like a reasonable speed bump to ensure that the right assumptions are being made.

As for the broken users: I'm fixing the ones within the LLVM projects now. Once those patches are in, I can get rid of the constructor (probably needs a short intermediate state for out-of-tree users to notice, though). It seems better to do the cleanup work now instead of "kicking the can" on the stateful error.

> (2) it doesn't require any code changes outside of the Regex class (except to fix broken code),

This smells slightly like a false economy to me... there aren't many call sites, and I'm finding bugs already. I get why we wouldn't want a ton of churn, but the end state seems worth it to me (i.e., Regex becomes const-correct and has no internal error state).

> (3) it doesn't introduce any unchecked uses of Expected, which IMO is an abuse of the error handling API.

Unfortunately, there just isn't a concise, usable alternative. Code with existing error paths are fine, but otherwise a debug-only assertion crash is probably the best alternative. (In fact, that's probably what I would do in most of those cases, anyhow).

One example (but not the only example) are AST matchers: the macro matchers are basically painted into an API corner, so an assertion really seems like the best option. I could probably add DEBUG() checks, but at least they can explicitly clamp to false in a release build.

Class constructors are another (plentiful) example: if there's a reasonable way to defer errors, I can do that now; but in many cases there isn't, so signalling the failure *somehow* -- even if it's only with debug assertions -- seems strictly superior.

https://reviews.llvm.org/D27463