[PATCH] D82938: [clangd] Implement path and URI translation for remote index

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jul 7 08:40:07 PDT 2020


sammccall added a comment.

This looks better to me but:

- i think we need to consider what happens when paths are not under the expected path, and handle that in some appropriate way
- I find it hard to see what the tests are actually testing - would be good to make them more explicit, using a few minimal helpers, and avoiding fancy abstractions like URI schemes



================
Comment at: clang-tools-extra/clangd/index/remote/Index.proto:102
   Position end = 2;
-  string file_uri = 3;
+  // clangd::SymbolLocation storees FileURI, but the protocol transmits only a
+  // part of its body. Because paths are different on the remote and local
----------------
storees -> stores

isn't it conceptually simpler to talk about a "relative path" rather than a "part of a URI body"?
(apart from anything else, no need to worry about whether it's escaped or not...)


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:36
+/// provided by the client. \p IndexRoot is expected to have UNIX slashes.
+std::string relativePathToURI(llvm::StringRef RelativePath,
+                              llvm::StringRef IndexRoot) {
----------------
mention error return (or return Optional)


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:52
+/// to send over the wire to the client.
+std::string uriToRelativePath(llvm::StringRef URI,
+                              llvm::StringRef IndexedProjectRoot) {
----------------
this should check the scheme too - it must be file
(we can't reasonably treat IndexedProjectRoot as scheme-relative... but not to any *particular* scheme!)


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:119
+SymbolLocation toProtobuf(const clangd::SymbolLocation &Location,
+                          llvm::StringRef IndexedProjectRoot) {
   remote::SymbolLocation Result;
----------------
nit: indexedprojectroot -> indexroot, unless there's some reason for the asymmetry? it seems like the same concept in the opposite direction


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:123
   *Result.mutable_end() = toProtobuf(Location.End);
-  *Result.mutable_file_uri() = Location.FileURI;
+  *Result.mutable_file_path() =
+      uriToRelativePath(Location.FileURI, IndexedProjectRoot);
----------------
how do you want to handle the case where the path is in fact not relative stored under the root?


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:133
+  const std::string Header = IncludeHeader.IncludeHeader.str();
+  auto URI = URI::parse(Header);
+  Result.set_header(URI ? uriToRelativePath(IncludeHeader.IncludeHeader.str(),
----------------
Don't try to parse if it's not a URI (starts with < or ").
As written this code will crash in debug mode if it's not a URI, due to unhandled error. Please add a test!


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.cpp:145
+  std::string Header = Message.header();
+  if (llvm::sys::path::is_relative(Header))
+    Header = relativePathToURI(Header, IndexRoot);
----------------
you seem to have this check in a few places - I'd suggest it should be in the relativePathToURI function and there should be some plan for handling errors.


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h:15
+// on indexing machine and client machine
+// ("/remote/machine/projects/llvm/include/HelloWorld.h" versus
+// "/usr/local/username/llvm/include/HelloWorld.h"), they need to be converted
----------------
nit: llvm -> llvm-project


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h:42
 clangd::FuzzyFindRequest fromProtobuf(const FuzzyFindRequest *Request);
+/// Deserializes clangd types and translates relative paths into
+/// machine-native URIs.
----------------
again, one particular overload is not really the place for this comment, I think. (File comment looks good)


================
Comment at: clang-tools-extra/clangd/index/remote/marshalling/Marshalling.h:52
 LookupRequest toProtobuf(const clangd::LookupRequest &From);
 FuzzyFindRequest toProtobuf(const clangd::FuzzyFindRequest &From);
 RefsRequest toProtobuf(const clangd::RefsRequest &From);
----------------
this also needs the index root, for the proximity path (I'm not sure why they're not URIs, that's unfortunate...)


================
Comment at: clang-tools-extra/clangd/index/remote/server/Server.cpp:50
+      : Index(std::move(Index)) {
+    llvm::SmallString<256> NativePath = llvm::StringRef(IndexRoot);
+    llvm::sys::path::native(NativePath);
----------------
please pass this in as a constructor param rather than accessing the flag directly


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:25
 
+const char *unittestURIToFilesystem(const char *UnittestURI,
+                                    llvm::UniqueStringSaver &Strings) {
----------------
The File scheme is special. Using a different URI scheme here when we only support `file` in production is confusing and seems like a last resort to be used when we can't make it work any other way.
(For example we use it in lit tests because we must specify input filenames and the presence/absence of drive letters caused problems)
What's being solved there that the much smaller hammer of testPath() plus a local testPathURI() helper can't solve?


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:122
+  clangd::Ref Original;
+  const auto RemoteIndexPrefix = testPath("remote/machine/project/");
+  const auto RelativePath = llvm::sys::path::convert_to_slash(
----------------
auto -> std::string (here and in other places where the type is neither overly unreadable nor spelled in the function name)


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:122
+  clangd::Ref Original;
+  const auto RemoteIndexPrefix = testPath("remote/machine/project/");
+  const auto RelativePath = llvm::sys::path::convert_to_slash(
----------------
sammccall wrote:
> auto -> std::string (here and in other places where the type is neither overly unreadable nor spelled in the function name)
it's not great practice to be assembling the test data from prefix + relative if the code we're testing is splitting it into prefix + relative - easy for the same bugs to cancel each other out.

Is testPath("remote/dir/MarshallingTests.cpp") directly too long?


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:126
+      "MarshallingTests.cpp");
+  const auto URI = URI::createFile(RemoteIndexPrefix + RelativePath);
+  Original.Location.FileURI = Strings.save(URI.toString()).begin();
----------------
(I'd suggest a helper testPathURI("remote/dir/MarshallingTests.cpp") for this)


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:130
+  EXPECT_EQ(Serialized.location().file_path(), RelativePath);
+  // Local index prefix should have UNIX slashes since the relative path in
+  // Protobuf message will.
----------------
hmm, I don't think that's the reason.
Either that's the contract of fromProtobuf, or it's not required and we shouldn't do it.


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:146
+  Ref WithAbsolutePath;
+  *WithAbsolutePath.mutable_location()->mutable_file_path() = "/usr/local/bin/";
+  Deserialized = fromProtobuf(WithAbsolutePath, &Strings, LocalIndexPrefix);
----------------
nit: use a file path rather than directory?


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:34
+
 TEST(RemoteMarshallingTest, SymbolSerialization) {
   const auto *Header = R"(
----------------
At this point I really think we should be writing these symbols out instead of trying to parse them from C++ - it's pretty hard to see exactly what fields are set and what is tested when the test is so highly factored.

If you like this approach think everything new is covered in other tests, I'd just remove the changes to this one.


================
Comment at: clang-tools-extra/clangd/unittests/remote/MarshallingTests.cpp:45
+    const auto SymToProtobufAndBack = fromProtobuf(
+        ProtobufMessage, &Strings, std::string(testRoot()) + '/', "unittest");
     EXPECT_TRUE(SymToProtobufAndBack.hasValue());
----------------
kbobyrev wrote:
> sammccall wrote:
> > this is `testPath("unittest")`, isn't it?
> > 
> > I mean, forward vs backslashes are going to differ, but the current expression is going to produce `C:\test/unittest` which doesn't seem better.
> I don't understand: `std::string(testRoot()) + '/'` is `/clangd-test/` on unix and `testPath("unittest")` is `/clangd-test/unittest`. I understand the slashes argument, but could you please elaborate on the former?
sorry, I guess I meant testRoot("/").


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82938/new/

https://reviews.llvm.org/D82938





More information about the cfe-commits mailing list