[clang] [dataflow] Parse formulas from text (PR #66424)

Sam McCall via cfe-commits cfe-commits at lists.llvm.org
Tue Sep 19 02:53:40 PDT 2023


================
@@ -95,4 +98,94 @@ BoolValue &Arena::makeBoolValue(const Formula &F) {
   return *It->second;
 }
 
+namespace {
+const Formula *parse(Arena &A, llvm::StringRef &In) {
+  auto EatWhitespace = [&] { In = In.ltrim(' '); };
+  EatWhitespace();
+
+  if (In.consume_front("!")) {
+    if (auto *Arg = parse(A, In))
+      return &A.makeNot(*Arg);
+    return nullptr;
+  }
+
+  if (In.consume_front("(")) {
+    auto *Arg1 = parse(A, In);
+    if (!Arg1)
+      return nullptr;
+
+    EatWhitespace();
+    decltype(&Arena::makeOr) Op;
+    if (In.consume_front("|"))
+      Op = &Arena::makeOr;
+    else if (In.consume_front("&"))
+      Op = &Arena::makeAnd;
+    else if (In.consume_front("=>"))
+      Op = &Arena::makeImplies;
+    else if (In.consume_front("="))
+      Op = &Arena::makeEquals;
+    else
+      return nullptr;
+
+    auto *Arg2 = parse(A, In);
+    if (!Arg2)
+      return nullptr;
+
+    EatWhitespace();
+    if (!In.consume_front(")"))
+      return nullptr;
+
+    return &(A.*Op)(*Arg1, *Arg2);
+  }
+
+  if (In.consume_front("V")) {
----------------
sam-mccall wrote:

Agree it's too restrictive, but solving it is a slightly bigger scope and I'm not sure of the details.

I do think it's important that if you parse a formula that uses certain names for the variables, then you can print it out again with the same names. (Parsing is mostly useful when formulas are large and have many variables...). `V0` is the default name of `Atom{0}` etc, so these variables "round-trip" OK.

To round-trip other variables we need to store their names somewhere. A map output-param works (and we have support for printing this way) but it's not very ergonomic, and is easy to forget to plumb around.
I've been thinking variable names should live on the arena (`Arena::makeAtom() -> Atom`, `Arena::nameAtom(Atom, Twine)` or so). This would provide a fairly centralized and natural way to e.g. name flow condition vars. And `parse()` could create an atom and name it when it encounters an unknown variable name.

However there's some stuff to work out here: is name tracking always on? do we also want to name values/locations, which might help provide atom names? what about Formula's `operator<<` which won't find the names?

Added a `FIXME` to handle other names.

https://github.com/llvm/llvm-project/pull/66424


More information about the cfe-commits mailing list