[PATCH] D82938: [clangd] Implement path and URI translation for remote index

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 2 01:34:05 PDT 2020


sammccall added a comment.

Generally I think this is missing high-level documentation describing the translation that happens at each level - I had to refer back to the notes from our meeting to understand what the scheme was.
I think Marshalling.h is probably a good place to document how this works. (Not exactly what prefixes get stripped by which function, but rather what the backing index contains, what goes over the wire, and what the clangd on the client side sees).



================
Comment at: clang-tools-extra/clangd/index/remote/Client.cpp:89
   std::unique_ptr<remote::SymbolIndex::Stub> Stub;
+  llvm::StringRef ProjectRoot;
 };
----------------
this should be a std::string - no need to rely on the caller to keep the value alive


================
Comment at: clang-tools-extra/clangd/index/remote/Client.h:21
 /// at \p Address. The client allows synchronous RPC calls.
+/// \p ProjectRoot will be prepended to all relative paths that are received
+/// from the remote server.
----------------
This is a bit low-level - it describes what is being done but not really why. As a caller of this function it doesn't really tell me what I should pass.

Suggest something like "ProjectRoot is an absolute path on the local machine to the source tree described by the remote index. Paths returned by the index will be treated as relative to this directory."

I'd also suggest calling this IndexRoot instead of ProjectRoot, because the latter is a fairly overloaded concept and they need not coincide.


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:31
 
+/// Translates \p RelativePath into the absolute path and builds URI for the
+/// user machine.
----------------
mention somewhere that this is code used on the *client* side to do translation, and that RelativePath came from the server


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:35
+                              llvm::StringRef ProjectRoot,
+                              llvm::StringRef Scheme) {
+  if (RelativePath.empty())
----------------
scheme is always "file", no need to make it a parameter?


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:36
+                              llvm::StringRef Scheme) {
+  if (RelativePath.empty())
+    return "";
----------------
we should also bail out if RelativePath is an absolute path.


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:38
+    return "";
+  llvm::SmallString<20> FullPath = ProjectRoot;
+  llvm::sys::path::append(FullPath, RelativePath);
----------------
20 is too small. Suggest 256 or so (SmallString is on the stack, so it's cheap)


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:40
+  llvm::sys::path::append(FullPath, RelativePath);
+  auto Result = URI::create(FullPath, Scheme);
+  if (!Result) {
----------------
you can use URI::createFile which does less work and never fails


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:46
+  }
+  return static_cast<std::string>(Result->toString());
+}
----------------
this looks like a no-op cast


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:49
+
+/// Strips \p IndexedProjectRoot from \p URI body to get file path that is
+/// relative to the project root. The stripped path will be transferred to the
----------------
Again this is a bit low level - "translates a URI from the server's backing index to a relative path suitable to send over the wire to the client"?


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:52
+/// client.
+std::string relativePath(llvm::StringRef URI,
+                         llvm::StringRef IndexedProjectRoot) {
----------------
uriToRelativePath for consistency?


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:58
+  if (!ParsedURI)
+    return "";
+  llvm::SmallString<20> Result = ParsedURI->body();
----------------
you're crashing here with an unhandled Expected. Add a test?
(I'd suggest elog or log, too)


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:61
+  if (IndexedProjectRoot.empty())
+    return static_cast<std::string>(Result);
+  llvm::sys::path::replace_path_prefix(Result, IndexedProjectRoot, "");
----------------
static_cast is a pretty surprising/uncommon way to write this conversion....
I'd suggest either Result.str.str() or std::string(result)


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:62
+    return static_cast<std::string>(Result);
+  llvm::sys::path::replace_path_prefix(Result, IndexedProjectRoot, "");
+  return static_cast<std::string>(Result);
----------------
need to handle the case where this returns false. Probably log and return ""?

Again, we should have a test for this.


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:62
+    return static_cast<std::string>(Result);
+  llvm::sys::path::replace_path_prefix(Result, IndexedProjectRoot, "");
+  return static_cast<std::string>(Result);
----------------
sammccall wrote:
> need to handle the case where this returns false. Probably log and return ""?
> 
> Again, we should have a test for this.
we really should ensure the result has unix slashes.


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:124
   HeaderWithReferences Result;
   Result.set_header(IncludeHeader.IncludeHeader.str());
   Result.set_references(IncludeHeader.References);
----------------
this needs to do path translation in the case where the header is a URI


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:132
   return clangd::Symbol::IncludeHeaderWithReferences{Message.header(),
                                                      Message.references()};
 }
----------------
and here


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h:26
 clangd::FuzzyFindRequest fromProtobuf(const FuzzyFindRequest *Request);
+/// Deserializes Clangd symbols and translates relative paths into
+/// machine-native URIs.
----------------
nit: Clangd -> clangd


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h:41
 
-Ref toProtobuf(const clangd::Ref &From);
-Symbol toProtobuf(const clangd::Symbol &From);
+/// Serializes Clangd and strips \p IndexedProjectRoot from the file paths
+/// because they are specific to the indexing machine. For example,
----------------
what does "serializes clangd" mean?


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h:41
 
-Ref toProtobuf(const clangd::Ref &From);
-Symbol toProtobuf(const clangd::Symbol &From);
+/// Serializes Clangd and strips \p IndexedProjectRoot from the file paths
+/// because they are specific to the indexing machine. For example,
----------------
sammccall wrote:
> what does "serializes clangd" mean?
The `Ref` marshaller isn't an appropriate place to put the detailed documentation on how path translation works.
It could go in the file documentation, or we could wrap the marshalling methods into a class we could put it there.
(The latter is somewhat appealing because it means you don't need to explicitly pass the root around everywhere, you can do normalization and have a place to store it, etc)


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h:42
+/// Serializes Clangd and strips \p IndexedProjectRoot from the file paths
+/// because they are specific to the indexing machine. For example,
+/// "/remote/machine/project/lib/HelloWorld.cpp" will be translated into
----------------
Somewhere we need to describe what the format of ProjectRoot is: trailing slash or not, unix slashes or native, etc.
(Even if the current code works in any variation, it's easier to reason about if we have a tightly "normalized" form)


================
Comment at: clang-tools-extra/clangd/index/remote/server/Server.cpp:34
 
+llvm::cl::opt<std::string> IndexPrefixPath(llvm::cl::desc("<PROJECT ROOT>"),
+                                           llvm::cl::Positional,
----------------
nit: call this variable IndexRoot rather than IndexPrefixPath, for consistency?


================
Comment at: clang-tools-extra/clangd/index/remote/server/Server.cpp:64
       LookupReply NextMessage;
-      *NextMessage.mutable_stream_result() = toProtobuf(Sym);
+      *NextMessage.mutable_stream_result() = toProtobuf(Sym, IndexPrefixPath);
       Reply->Write(NextMessage);
----------------
for hygiene, I think we should pass this in as a constructor parameter rather than accessing the flag directly.

You probably also want to add a little validation: this needs to be an absolute path, and you may need to convert slashes to unix-style, too.


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:43
   for (auto &Sym : Symbols) {
-    const auto ProtobufMeessage = toProtobuf(Sym);
-    const auto SymToProtobufAndBack = fromProtobuf(ProtobufMeessage, &Strings);
+    const auto ProtobufMessage = toProtobuf(Sym, "");
+    const auto SymToProtobufAndBack = fromProtobuf(
----------------
Is passing the empty string here actually valid? I think you probably want to pass testRoot() or some unixified equivalent


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:45
+    const auto SymToProtobufAndBack = fromProtobuf(
+        ProtobufMessage, &Strings, std::string(testRoot()) + '/', "unittest");
     EXPECT_TRUE(SymToProtobufAndBack.hasValue());
----------------
this is `testPath("unittest")`, isn't it?

I mean, forward vs backslashes are going to differ, but the current expression is going to produce `C:\test/unittest` which doesn't seem better.


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:93
+
+TEST(URITranslation, Marshalling) {
+  llvm::BumpPtrAllocator Arena;
----------------
test name is backwards - TEST(RemoteMarshallingTest, URITranslation) ?


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:93
+
+TEST(URITranslation, Marshalling) {
+  llvm::BumpPtrAllocator Arena;
----------------
sammccall wrote:
> test name is backwards - TEST(RemoteMarshallingTest, URITranslation) ?
I think this test is written in a way where it won't work on windows.
I'd suggest making heavier use of testPath() instead.


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:98
+  const std::string RemoteIndexPrefix = "/remote/machine/project";
+  const std::string RelativePath = "/llvm-project/clang-tools-extra/clangd/"
+                                   "unittests/remote/MarshallingTests.cpp";
----------------
this is not (lexically) a relative path, it's an absolute path.
For the way it's used in this test it doesn't matter that much (just confusing)


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:103
+  Ref Serialized = toProtobuf(Original, RemoteIndexPrefix);
+  EXPECT_EQ(Serialized.location().file_path(), RelativePath);
+  const std::string LocalIndexPrefix = "/local/machine/myproject";
----------------
OK, this assertion does actually show that we're sending absolute paths over the wire and calling them relative.
The confusion here is more serious because it's in the wire format and harder to address later.
There are other technical reasons: true relative paths still leave us the absolute-path namespace to represent paths outside the source tree if we want that for some reason.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82938/new/

https://reviews.llvm.org/D82938





More information about the cfe-commits mailing list