[Lldb-commits] [PATCH] D66634: Postfix: move more code out of the PDB plugin

Tue Aug 27 01:54:42 PDT 2019

labath marked 3 inline comments as done.
labath added a comment.

Please find my replies inline (including my yesterday's reply to Aleksandr, which I apparently failed to submit).

================
Comment at: lldb/trunk/source/Plugins/SymbolFile/NativePDB/PdbFPOProgramToDWARFExpression.cpp:61

+  for (auto it = parsed.begin(), end = parsed.end(); it != end; ++it) {
     // Emplace valid dependent subtrees to make target assignment independent
----------------
amccarth wrote:
> I would recommend making `parsed` a `const` vector.  The lambda captures the iterator, and while the code currently looks fine, I'd hate for something to change in the future that could invalidate the captured iterator.
Making it `const` won't work because the symbol resolution process can actually modify the contents (not the size) of the vector -- if the RHS is a `SymbolNode *` directly, then we may need to replace that with a different kind of a node.

I don't think that the invalidation problem is really specific to the lambda -- even without the lambda, if anything modified the vector size while we were iterating over it, we would blow up.

================
Comment at: lldb/trunk/source/Plugins/SymbolFile/NativePDB/PdbFPOProgramToDWARFExpression.cpp:69
+              return pair.second;
+          }
+
----------------
amccarth wrote:
> This looks O(n^2).  Will that matter here or will the expressions always be short?
I don't think that should matter, the expressions will typically have three of four assignments (frame base, aligned frame base, esp, eip). It may have additional entries for spilled registers, but there aren't that many of those either.

================
Comment at: lldb/trunk/source/Symbol/PostfixExpression.cpp:101
+    if (!rhs)
+      return {};
+    result.emplace_back(lhs, rhs);
----------------
amccarth wrote:
> Is it correct to return an empty vector here rather than `result`?  If there's one bad expression, you'll consider the entire FPO program empty.  That's sounds plausible, but I thought I'd ask.
Yeah, that was intentional here. If there's an error parsing one of the expressions, I am not sure we want to trust  the parse of the preceding partial results either. I  wouldn't really use the term "correct" here, because normally we should never fail to parse the expression, and the failure would either mean a bug in the parser, or the producer. And it's hard to say what is the "correct" thing to do in those cases without knowing the specifics...

================
Comment at: source/Plugins/SymbolFile/NativePDB/PdbFPOProgramToDWARFExpression.cpp:58
                              llvm::BumpPtrAllocator &alloc) {
-  llvm::DenseMap<llvm::StringRef, Node *> dependent_programs;
-
-  size_t cur = 0;
-  while (true) {
-    size_t assign_index = program.find('=', cur);
-    if (assign_index == llvm::StringRef::npos) {
-      llvm::StringRef tail = program.slice(cur, llvm::StringRef::npos);
-      if (!tail.trim().empty()) {
-        // missing assign operator
-        return nullptr;
-      }
-      break;
-    }
-    llvm::StringRef assignment_program = program.slice(cur, assign_index);
-
-    llvm::StringRef lvalue_name;
-    Node *rvalue_ast = nullptr;
-    if (!ParseFPOSingleAssignmentProgram(assignment_program, alloc, lvalue_name,
-                                         rvalue_ast)) {
-      return nullptr;
-    }
-
-    lldbassert(rvalue_ast);
+  std::vector<std::pair<llvm::StringRef, Node *>> parsed =
+      postfix::ParseFPOProgram(program, alloc);
----------------
aleksandr.urakov wrote:
> Do I understand right, you use a vector of pairs instead of a map due to the small number of expressions in a program (then a search on a small vector will be faster than on a map)?
Not exactly. The main reason for using a vector is that the order of expressions (according to my understanding) in these expressions is important. So `a b = b c =` would mean something different than `b c = a b =`. And once I already have them in a vector, I didn't see a point in putting them also into a map to get fast lookups (because the number of elements is small).

The previous implementation avoided that by parsing and resolving in the same loop, but here I first parse and then resolve stuff. This made it easier to implement, and I doubt the performance loss is noticeable due to the size of the data involved.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66634/new/

https://reviews.llvm.org/D66634