[Lldb-commits] [PATCH] D132307: [lldb] Switch RegularExpression from llvm::Regex to std::regex

Pavel Labath via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Tue Aug 23 05:05:42 PDT 2022


labath added inline comments.


================
Comment at: lldb/source/Utility/CMakeLists.txt:29
+  PROPERTIES COMPILE_OPTIONS
+  "-fcxx-exceptions"
+)
----------------
kastiglione wrote:
> labath wrote:
> > kastiglione wrote:
> > > kastiglione wrote:
> > > > JDevlieghere wrote:
> > > > > kastiglione wrote:
> > > > > > the `std::regex` constructor throws `std::regex_error` if the pattern is invalid. For this reason, exceptions are enabled for this one file.
> > > > > What happens when exceptions are disabled? What does it mean to have this enabled for a single file? I don't know if it's part of the LLVM developer guide, but LLVM is supposed to build without RTTI and without exceptions. 
> > > > > What happens when exceptions are disabled?
> > > > 
> > > > This cmake config enables exceptions for this one file, independent of `LLVM_ENABLE_EH`. No other source files will be allowed to catch or throw exceptions.
> > > > 
> > > > > What does it mean to have this enabled for a single file?
> > > > 
> > > > It means this file can compile with a try/catch, and that inside this file, exceptions can be caught.
> > > > 
> > > > > I don't know if it's part of the LLVM developer guide, but LLVM is supposed to build without RTTI and without exceptions.
> > > > 
> > > > llvm has `LLVM_ENABLE_EH` which allows llvm to be built with exceptions support enabled. Similarly, `LLVM_ENABLE_RTTI` allows RTTI to be enabled. It seems that both are disabled as a default, but not as a hard requirement.
> > > > 
> > > > I wondered if enabling RTTI would needed for exceptions, but at least for this code, the answer is no. The `catch (const std::regex_error &e)` block is exercised by `TestBreakpointRegexError.py`, so we know this code can and does catch an exception of that type, and accesses the error's member functions.
> > > > 
> > > > 
> > > What makes me believe this use of exceptions is ok is that the API boundary continues to be exception free, callers continue to be exception disabled. Only the internal implementation, knows about exceptions.
> > That is definitely not ok. ODR madness awaits therein. The standard library is full of functions that effectively do
> > ```
> > #ifdef EXCEPTIONS
> >   something();
> > #else
> >   something_else();
> > #endif
> > ```
> > Compiling just one file with exceptions enabled can cause two different versions of those functions to appear. It is sufficient that this file instantiates one function whose behavior is dependent on exceptions being available. When linking, the linker has to choose one of the two versions, and there's no telling which one will it use. This means that exceptions can creep in into the supposedly exception-free code, or vice-versa.
> > 
> > The only way this would be safe is if this code was in a shared library, and we took care to restrict the visibility of all symbols except the exported api of that library. And I don't think that's worth it.
> @labath thanks for pointing that out. I had incomplete understanding and thought it could still work, but I see now that it could be an issue.
> 
> > The only way this would be safe is if this code was in a shared library, and we took care to restrict the visibility of all symbols except the exported api of that library. And I don't think that's worth it.
> 
> what are your concerns with this approach?
> 
> btw I spoke to Louis Dionne of libc++ and he said that it could be possible to include whether `-fno-exceptions` is used in the libc++ ABI tag, which would allow mixing files built with and without exceptions.
> > The only way this would be safe is if this code was in a shared library, and we took care to restrict the visibility of all symbols except the exported api of that library. And I don't think that's worth it.
> 
> what are your concerns with this approach?

Well.. I don't even know where to start.

For one, since a shared library is a shippable artifact, it requires adjustment to all distribution methods up- and down-stream. I know this would definitely be an issue for us, since google's internal build system has a strong distaste for shared libraries. This is more on the extreme end, but I expect that most of the other build/deployment systems would need to be adjusted as well.

Now you should shrug that off as a "downstream problem", but there's also something to be said about consistency. Even as things stand now, LLDB is pretty much the only llvm component, which _requires_ to be built with shared libraries. For this case, we at least have a fairly good reason for it -- python extension modules must be shared libraries (although one could still imagine building a extension-free lldb in static mode). Everything else in llvm works both in static and in dynamic modes (for better or worse, there are two dynamic modes).

For this feature, I think the reasoning is pretty weak, because we're essentially doing it to work around _another_ llvm policy. And unlike the shared library thing, this one is explicitly [[ https://llvm.org/docs/CodingStandards.html#do-not-use-rtti-or-exceptions | spelled out ]], in no uncertain terms. So we would immediately become outliers, not in one, but two ways. Actually, three, if you count the fact that we would be using a different regex implementation than the rest of llvm.

Why don't you try proposing an llvm-wide llvm::Regex->std::regex replacement? I expect the answer to be a resounding NO, but who knows, maybe I'll be surprised, and someone will even come up with a way to do this safely...

> 
> btw I spoke to Louis Dionne of libc++ and he said that it could be possible to include whether `-fno-exceptions` is used in the libc++ ABI tag, which would allow mixing files built with and without exceptions.

I think that would be a very cool feature of libc++, but given that libc++ is not the only host c++ library we use with lldb/llvm, I don't think it will help in this case.

However, I think it might be tractable to excise the regex component out of libc++ and pull it into llvm as a separate, independent library. There is already a precedent for that -- the demangler library. The reason we're doing that is because we don't want to maintain two demangler libraries (one for the runtime and one for the tools), but that is precisely the situation that we're in with llvm::Regex and libc++ std::regex (admittedly the demangler is a more central component of llvm than the regex engine). And since this internal library would not need to conform to any specific API, we could make it usable even without exceptions. I'm not saying this would be a walk in the park, but I think you could gather enough support to pull this off.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132307/new/

https://reviews.llvm.org/D132307



More information about the lldb-commits mailing list