[PATCH] D49546: [clangd] Proof-of-concept query iterators for Dex symbol index

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 25 07:19:36 PDT 2018


ioeric added a comment.

Implementation is in a good shape. I only have nits ;)



================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:18
+namespace clangd {
+namespace dex {
+
----------------
Please put all local classes and helpers in an anonymous namespace.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:20
+
+/// Implements Iterator over a PostingList.
+class DocumentIterator : public Iterator {
----------------
nit: We could elaborate a bit. This is the most basic or "leaf" iterator.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:30
+  ///
+  /// Complexity: O(1).
+  void advance() override {
----------------
nit: The complexity is obvious, so I'd probably drop it as this is not public.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:39
+  ///
+  /// Complexity: O(log(N)), N is the size of underlying PostingList.
+  void advanceTo(DocID ID) override {
----------------
Again, this should also be obvious given the short implementation.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:68
+///
+/// AndIterator advances through items which can be pointed to by each child
+/// iterator. It becomes exhausted as soon as any child becomes exhausted. After
----------------
I'm not exactly sure what this means. Do you mean "AndIterator iterates through common items among all children"?


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:75
+  AndIterator(std::vector<std::unique_ptr<Iterator>> AllChildren)
+      : Children(std::move(AllChildren)), ReachedEnd(false) {
+    assert(!Children.empty() && "AndIterator should have at least one child.");
----------------
nit: `ReachedEnd` is reset in `sync()`. Maybe just make it default value for the field?


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:112
+  /// Restores class invariants: each child should point to the same element,
+  /// ReachedEnd indicates whether any child is exhausted.
+  void sync() {
----------------
nit: documentation for `ReachedEnd` is already covered below.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:114
+  void sync() {
+    ReachedEnd = Children.front()->reachedEnd();
+    if (ReachedEnd)
----------------
nit: `ReachedEnd &= ...;` would keep me from worrying about true ReachedEnd getting overridden.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:117
+      return;
+    auto ID = Children.front()->peek();
+    bool NeedsAdvance = false;
----------------
nit: `SyncID`  or `Sync` might be clearer.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:118
+    auto ID = Children.front()->peek();
+    bool NeedsAdvance = false;
+    do {
----------------
Maybe add a comment `NeedsAdvance` indicates whether `advanceTo(ID)` should be rerun from the beginning. 


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:145
+  std::vector<std::unique_ptr<Iterator>> Children;
+  /// Local state, which indicates whether any child is exhausted. It is cheaper
+  /// to maintain and update a local state, rather than traversing the whole
----------------
nit: why "local state"?  (All instance fields are local states)


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:166
+  bool reachedEnd() const override {
+    for (const auto &Child : Children)
+      if (!Child->reachedEnd())
----------------
use `std::any_of(...)`?


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:176
+           "OrIterator must have at least one child to advance().");
+    const auto HighestID = peek();
+    for (const auto &Child : Children) {
----------------
The comment says `smallest DocID`, but here we call it `highest` :( 


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:178
+    for (const auto &Child : Children) {
+      if (!Child->reachedEnd() && Child->peek() == HighestID) {
+        Child->advance();
----------------
nit: no braces for one-liners.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.cpp:218
+private:
+  // FIXME(kbobyrev): Would storing Children in min-heap be faster?
+  std::vector<std::unique_ptr<Iterator>> Children;
----------------
I think this FIXME should live in `peek()` as it's an optimization for `peek()`.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:10
+//
+// This file defines interface for Query Tree Nodes - Iterators, which process
+// posting lists and yield the result of contracted search query.
----------------
I think we should give more context here. For example, it's unclear what `Query Tree Nodes` are. `Posting list` is  not a widely known concept. What is the idea behind iterators? What problem do they solve? Roughly what iterators are provided? An example like ascii graph you have in the test can be useful here as well.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:21
+#include <vector>
+
+#include "llvm/ADT/ArrayRef.h"
----------------
nit: remove spaces between #includes so that clang-format can sort all #includes properly. same for other files.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:34
+/// certain criteria, i.e. containing a Search Token. PostingLists are keys for
+/// the inverted index.
+using PostingList = std::vector<DocID>;
----------------
nit:
> PostingLists are keys ...
Do you mean `values`? Isn't inverted index a mapping from search tokens to positing lists? 


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:44
+// before iterator exhaustion.
+class Iterator {
+public:
----------------
Could you also add documentation for `Iterator` class?


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:47
+  /// Returns true if all valid DocIDs were processed and hence advance() and
+  /// advanceTo() call are invalid, false otherwise.
+  virtual bool reachedEnd() const = 0;
----------------
nit: `calls`.


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:52
+  ///
+  /// Note: calls to advance() are invalid if reachedEnd().
+  virtual void advance() = 0;
----------------
What does `invalid` mean? Would there be any side effect? Would it trigger a crash? 


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:59
+  virtual void advanceTo(DocID ID) = 0;
+  /// Returns current symbol under cursor.
+  virtual DocID peek() const = 0;
----------------
micro nit: `under cursor` might be ambiguous.

Would `peek()` be valid when `reachedEnd()` is true?


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:85
+
+/// Returns an iterator over given PostingList.
+std::unique_ptr<Iterator> create(PostingListRef Documents);
----------------
nit: a document iterator?


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:88
+
+/// Returns AndIterator, an iterator which performs the intersection of the
+/// PostingLists of its children.
----------------
nit: `Returns an AND iterator which ...` (same below)

`AndIterator` is the name for internal implementation and not exposed. 




================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:98
+
+template <size_t Size>
+std::unique_ptr<Iterator>
----------------
Documentation for the templated interfaces? e.g. why are they needed?


================
Comment at: clang-tools-extra/unittests/clangd/DexIndexTests.cpp:47
+
+TEST(DexIndexIterators, AndIterator) {
+  const PostingList EmptyList;
----------------
This tests multiple things in one test case, and it's unclear what they are testing. Also, the long function body makes the list definitions hard to read from where they are used.

I think this can be split into: 
- a test case for iterator dumping.
- a test case for AND of empty list and non-empty list.
- a test case for AND of two lists.
- a test case for AND of three lists.


================
Comment at: clang-tools-extra/unittests/clangd/DexIndexTests.cpp:48
+TEST(DexIndexIterators, AndIterator) {
+  const PostingList EmptyList;
+  const PostingList FirstList = {0, 5, 7, 10, 42, 320, 9000};
----------------
nit: these can be `L0, L1, L2, L3`. Save you some typing ;)


================
Comment at: clang-tools-extra/unittests/clangd/DexIndexTests.cpp:66
+
+  EXPECT_EQ(
+      llvm::to_string(*And),
----------------
Could you split checks for iterator dumping into a separate test? It's relatively independent. And you wouldn't need to update multiple places in case you update elements in posting lists.


================
Comment at: clang-tools-extra/unittests/clangd/DexIndexTests.cpp:165
+  //  +-------v-----+ +----v-----+           +--v--+    +-V--+    +---v---+
+  //  |0, 3, 5, 8, 9| |0, 5, 7, 9|           |Empty|    |0, 5|    |0, 1, 5|
+  //  +-------------+ +----------+           +-----+    +----+    +-------+
----------------
s/0/1/ for AND lists?


================
Comment at: clang-tools-extra/unittests/clangd/DexIndexTests.cpp:186
+  Root->advanceTo(5);
+  Root->advanceTo(9000);
+}
----------------
These two operations are not meaningful if we don't check anything after them.


https://reviews.llvm.org/D49546





More information about the cfe-commits mailing list