[PATCH] D58418: [clang][DirectoryWatcher] Upstream DirectoryWatcher

Dmitri Gribenko via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 15 06:25:50 PDT 2019


gribozavr requested changes to this revision.
gribozavr added inline comments.
This revision now requires changes to proceed.


================
Comment at: clang/include/clang/DirectoryWatcher/DirectoryWatcher.h:27
+/// be invoked even if the directory is empty.
+class DirectoryWatcher {
+public:
----------------
I feel like since there's nothing Clang-specific about it, it should go into libSupport in LLVM, what do you think?


================
Comment at: clang/include/clang/DirectoryWatcher/DirectoryWatcher.h:33
+    /// A file gets moved into the directory and replaces an existing file
+    /// with the same name will trigger an 'Added' event but no 'Removed' event.
+    /// If a file gets replaced multiple times within a short time period, it
----------------
"A file being moved into ... and replacing ... "


================
Comment at: clang/include/clang/DirectoryWatcher/DirectoryWatcher.h:43
+    Removed,
+    /// A file was modified.
+    Modified,
----------------
"File content was modified." ?  In other words, metadata was not modified?


================
Comment at: clang/include/clang/DirectoryWatcher/DirectoryWatcher.h:46
+    /// The watched directory got deleted. No more events will follow.
+    DirectoryDeleted,
+  };
----------------
`DirectoryRemoved` (for consistency with `Removed`)

Also, how about `TopDirectoryRemoved`?  Otherwise it sounds like some random directory was removed.


================
Comment at: clang/include/clang/DirectoryWatcher/DirectoryWatcher.h:51
+    EventKind Kind;
+    std::string Filename;
+  };
----------------
Please document if it is going to be absolute or relative path, or just the file name.



================
Comment at: clang/include/clang/DirectoryWatcher/DirectoryWatcher.h:55
+  typedef std::function<void(ArrayRef<Event> Events, bool isInitial)>
+      EventReceiver;
+
----------------
Users are not going to use this typedef in their code, so I feel like it is obfuscating code -- could you expand it in places where it is used?


================
Comment at: clang/include/clang/DirectoryWatcher/DirectoryWatcher.h:61
+                                                  EventReceiver Receiver,
+                                                  bool waitInitialSync,
+                                                  std::string &Error);
----------------
`WaitForInitialSync` (everywhere in the patch)


================
Comment at: clang/include/clang/DirectoryWatcher/DirectoryWatcher.h:62
+                                                  bool waitInitialSync,
+                                                  std::string &Error);
+
----------------
Why not return `llvm::ErrorOr<std::unique_ptr<DirectoryWatcher>>`?

I also don't understand why we need unique_ptr.  If you change `Implementation &Impl;` to `std::unique_ptr<Implementation> Impl;`, `DirectoryWatcher` becomes a move-only type, and `create` can return `llvm::ErrorOr<DirectoryWatcher>`.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h:1
+//===- DirectoryWatcher-linux.inc.h - Linux-platform directory listening --===//
+//
----------------
Please drop the `.h` from the name, for consistency with the rest of LLVM (there are no `.inc.h` files, only `.inc`).


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h:93
+      30 * (sizeof(struct inotify_event) + NAME_MAX + 1);
+  char buf[EventBufferLength] __attribute__((aligned(8)));
+
----------------
Please use `AlignedCharArray` from `llvm/include/llvm/Support/AlignOf.h`.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h:93
+      30 * (sizeof(struct inotify_event) + NAME_MAX + 1);
+  char buf[EventBufferLength] __attribute__((aligned(8)));
+
----------------
gribozavr wrote:
> Please use `AlignedCharArray` from `llvm/include/llvm/Support/AlignOf.h`.
NAME_MAX is 255, add some for inotify_event, and multiply by 30... we get almost 8 Kb on the stack.  Should we allocate on the heap instead?


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h:137
+
+bool DirectoryWatcher::Implementation::initialize(StringRef Path,
+                                                  EventReceiver Receiver,
----------------
Return `llvm::Error`?


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h:162
+  std::thread watchThread(
+      std::bind(runWatcher, pathToWatch, inotifyFD, evtQueue));
+  watchThread.detach();
----------------
Use a lambda instead of bind to improve readability?  https://clang.llvm.org/extra/clang-tidy/checks/modernize-avoid-bind.html


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h:175
+    std::thread scanThread(runScan);
+    scanThread.detach();
+  }
----------------
Same comments as for macOS: we need to buffer events from inotify until the initial scan completes.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:1
+//===- DirectoryWatcher-mac.inc.h - Mac-platform directory listening ------===//
+//
----------------
Please drop the `.h` from the name, for consistency with the rest of LLVM (there are no `.inc.h` files, only `.inc`).


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:14
+                  std::string &Error);
+  ~Implementation() { stopFSEventStream(); };
+
----------------
No semicolon after "}".


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:19
+
+  bool setupFSEventStream(StringRef path, EventReceiver receiver,
+                          dispatch_queue_t queue,
----------------
"Path", "Receiver" etc. (throughout the patch for all variables, please)


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:112
+    StringRef path, EventReceiver receiver, dispatch_queue_t queue,
+    std::shared_ptr<DirectoryScan> initialScanPtr) {
+  if (path.empty())
----------------
No need to encode the type name in variable names:

initialScanPtr => InitialScan


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:118
+      CFArrayCreateMutable(nullptr, 0, &kCFTypeArrayCallBacks);
+  CFStringRef cfPathStr =
+      CFStringCreateWithBytes(nullptr, (const UInt8 *)path.data(), path.size(),
----------------
"cfPath"  (no need to repeat that it is a string, it is in the type)


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:130
+    char Buffer[PATH_MAX];
+    // Use ::realpath to get the real path name
+    if (::realpath(P.begin(), Buffer) != nullptr)
----------------
Please don't repeat code in comments.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:135
+      realPath = path;
+  }
+
----------------
Move this block into a separate function?

```
std::string Realpath(StringRef Path) { ... }
```


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:141
+  context.version = 0;
+  context.info = ctxData;
+  context.retain = nullptr;
----------------
```
context.info = new EventStreamContextData(
      std::move(realPath), std::move(receiver), std::move(initialScanPtr));
```

and remove the `ctxData` variable.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:168
+
+bool DirectoryWatcher::Implementation::initialize(StringRef Path,
+                                                  EventReceiver Receiver,
----------------
Why not return `llvm::Error`?


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:179
+
+  std::string copiedPath = Path;
+  dispatch_retain(initScanSema);
----------------
"StrongPath"?  "OwnedPath"?  We don't care that it is copied, we care that the variable owns the value (AKA has a strong reference to it).


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:185
+    // to make sure we won't miss any events.
+    dispatch_semaphore_wait(setupFSEventsSema, DISPATCH_TIME_FOREVER);
+    initialScan->scanDirectory(copiedPath);
----------------
I think `setupFSEventsSema` can be eliminated if the call to `setupFSEventStream()` was moved above this `dispatch_async`.  WDYT?


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:187
+    initialScan->scanDirectory(copiedPath);
+    Receiver(initialScan->getAsFileEvents(), /*isInitial=*/true);
+    dispatch_semaphore_signal(initScanSema);
----------------
The fs event stream is already started, and it might have sent some events.  And now we're sending the initial scan.  So, for example, the client could see a file remove event (an fs event) before they see the file add event (from the initial scan).

I think this race can be fixed by buffering the fs event stream until the initial scan is complete.  What do you think?


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:47
+                                const FSEventStreamEventFlags eventFlags[],
+                                const FSEventStreamEventId eventIds[]) {
+  auto *ctx = static_cast<EventStreamContextData *>(clientCallBackInfo);
----------------
This function does not handle the `kFSEventStreamEventFlagMustScanSubDirs` flag.  I think it should, otherwise the client would miss silently events.  WDYT?

https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/UsingtheFSEventsFramework/UsingtheFSEventsFramework.html

> If an event in a directory occurs at about the same time as one or more events in a subdirectory of that directory, the events may be coalesced into a single event. In this case, you will receive an event with the kFSEventStreamEventFlagMustScanSubDirs flag set. When you receive such an event, you must recursively rescan the path listed in the event. The additional changes are not necessarily in an immediate child of the listed path.

Also from the docs:

> A path might be "/" if ether of these flags is set for the event: kFSEventStreamEventFlagUserDropped, kFSEventStreamEventFlagKernelDropped.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:91
+      // For the first time we get the events, check that we haven't already
+      // sent the 'added' event at the initial scan.
+      if (ctx->InitialScan->FileIDSet.count(statusOpt->getUniqueID())) {
----------------
I don't think it is guaranteed that any duplicate events will be received in the first call to this callback.  This is another reason to buffer events during the initial scan.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:99
+    DirectoryWatcher::Event Evt{K, path};
+    Events.push_back(Evt);
+  }
----------------
Events.push_back(DirectoryWatcher::Event{K, path});


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:148
+      nullptr, eventStreamCallback, &context, pathsToWatch,
+      kFSEventStreamEventIdSinceNow, latency,
+      kFSEventStreamCreateFlagFileEvents | kFSEventStreamCreateFlagNoDefer);
----------------
`/*latency=*/0.0`, and remove the `latency` variable.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher-mac.inc.h:155
+  FSEventStreamSetDispatchQueue(EventStream, queue);
+  FSEventStreamStart(EventStream);
+  return false;
----------------
Please check the return code from `FSEventStreamStart`.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:9
+/// \file
+/// \brief Utility class for listening for file system changes in a directory.
+//===----------------------------------------------------------------------===//
----------------
Please don't duplicate doc comments between headers and implementation files -- they will only become stale and confusing.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:31
+namespace llvm {
+// Specialize DenseMapInfo for sys::fs::UniqueID.
+template <> struct DenseMapInfo<sys::fs::UniqueID> {
----------------
Please don't repeat code in comments.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:32
+// Specialize DenseMapInfo for sys::fs::UniqueID.
+template <> struct DenseMapInfo<sys::fs::UniqueID> {
+  static sys::fs::UniqueID getEmptyKey() {
----------------
This specialization should go into `llvm/include/llvm/Support/FileSystem.h`, otherwise we are going to run into ODR violations if another file in LLVM defines the same specialization.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:59
+/// Note that the caller must ensure serial access to it. It is not thread safe
+/// to access it without additional protection.
+struct DirectoryScan {
----------------
"Note: this class is not thread-safe." should be enough.  However, all classes are not thread-safe by default, do we really need to highlight this fact?


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:60
+/// to access it without additional protection.
+struct DirectoryScan {
+  DenseSet<sys::fs::UniqueID> FileIDSet;
----------------
`InitialDirectoryScanner` for a more clear name, and then you don't need to explain in the comment what this struct is used for.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:61
+struct DirectoryScan {
+  DenseSet<sys::fs::UniqueID> FileIDSet;
+  std::vector<std::tuple<std::string, sys::TimePoint<>>> Files;
----------------
`FoundFileIDs`


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:62
+  DenseSet<sys::fs::UniqueID> FileIDSet;
+  std::vector<std::tuple<std::string, sys::TimePoint<>>> Files;
+
----------------
Tuple is not readable.  One has to read the implementation to understand what is stored in it.  Please use a struct and name the fields.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:62
+  DenseSet<sys::fs::UniqueID> FileIDSet;
+  std::vector<std::tuple<std::string, sys::TimePoint<>>> Files;
+
----------------
gribozavr wrote:
> Tuple is not readable.  One has to read the implementation to understand what is stored in it.  Please use a struct and name the fields.
`FoundFiles`


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:68
+    std::error_code EC;
+    for (auto It = fs::directory_iterator(Path, EC),
+              End = fs::directory_iterator();
----------------
Since this enumeration is not recursive, I would assume that `DirectoryWatcher` is not recursive either?  That would be very important to mention in its doc comments -- my default expectation is that such notifications should be recursive.  Or even in the name, `NonrecursiveDirectoryWatcher`.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:83
+    for (const auto &info : Files) {
+      DirectoryWatcher::Event Event{DirectoryWatcher::EventKind::Added,
+                                    std::get<0>(info)};
----------------
I find it weird to say that a file that was already existing was added to the directory (according to the doc comment on `EventKind::Added`).


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:99
+#include "DirectoryWatcher-linux.inc.h"
+#else
+
----------------
The usual way to do this is not through CMake, but by using the predefined preprocessor macros -- `__APPLE__`, `__linux__` etc.

```
#if defined(__APPLE__)
#include "DirectoryWatcher-mac.inc"
#endif

#if defined(__linux__)
#include "DirectoryWatcher-linux.inc"
#endif
```

See `llvm/lib/Support/Unix/Path.inc` for example.

I don't see an advantage of doing it through CMake. CMakeLists.txt does exactly the same OS check, and then checks if the include files exist.  However, there's no real implementation selection happening.  It is not like we would ever see CoreServices missing on macOS.  And even if it is missing, there's no other implementation suitable for macOS.  Same for Linux.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:111
+
+DirectoryWatcher::DirectoryWatcher() : Impl(*new Implementation()) {}
+
----------------
unique_ptr for Implementation should work, as long as you keep the constructor and the destructor of `DirectoryWatcher` defined out of line in the `.cpp` file.


================
Comment at: clang/lib/DirectoryWatcher/DirectoryWatcher.cpp:141
+  std::unique_ptr<DirectoryWatcher> DirWatch;
+  DirWatch.reset(new DirectoryWatcher());
+  auto &Impl = DirWatch->Impl;
----------------
Use llvm::make_unique.


================
Comment at: clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:43
+                 ArrayRef<DirectoryWatcher::EventKind> kinds,
+                 ArrayRef<file_status> stats) const {
+    assert(filenames.size() == kinds.size());
----------------
Are these parallel arrays?  I think it would be nicer if we had an array of structs instead.


================
Comment at: clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:45
+    assert(filenames.size() == kinds.size());
+    assert(filenames.size() == stats.size());
+    SmallVector<DirectoryWatcher::Event, 6> evts = Events;
----------------
"stats" is only used in the assertion, intentional?


================
Comment at: clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:111
+public:
+  void init() {
+    SmallString<128> pathBuf;
----------------
Why not constructor?


================
Comment at: clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:262
+  EXPECT_TRUE(evt->Events.empty());
+  return;
+
----------------
Accidental return?


================
Comment at: clang/unittests/DirectoryWatcher/DirectoryWatcherTest.cpp:303
+
+    EXPECT_TRUE(coll.hasEvents(std::vector<StringRef>{"a", "b", "c"},
+                               std::vector<DirectoryWatcher::EventKind>{
----------------
Google Test supports custom assertions.  For example, see `PrintedStmtMatches` in `clang/unittests/AST/StmtPrinterTest.cpp`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58418/new/

https://reviews.llvm.org/D58418





More information about the cfe-commits mailing list