[PATCH] D48441: [clangd] Incorporate transitive #includes into code complete proximity scoring.
Sam McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Jun 27 02:39:20 PDT 2018
sammccall added inline comments.
================
Comment at: clangd/FileDistance.cpp:35
+ : Opts(Opts) {
+ for (const auto &R : Roots) {
+ auto Canonical = canonicalize(R.getKey());
----------------
Note there was a nasty bug in FileDistance::FileDistance that failed to consider down edges. Fixed and test case added.
================
Comment at: clangd/FileDistance.cpp:86
+ return R.first->getSecond();
+ if (auto U = clangd::URI::parse(URI)) {
+ LLVM_DEBUG(dbgs() << "distance(" << URI << ") = distance(" << U->body()
----------------
ioeric wrote:
> This is done for every index symbol, so I wonder if we could avoid parsing URIs by assuming some URI structures.
(note we cache the URI string, so it's at most once per file)
We could, at the cost of potentially subtle bugs/dependencies. (what if the data doesn't make exactly the same choices regarding escaping?)
parse() is actually *fairly* cheap. Unlike resolve() or create(), it doesn't touch the URI scheme plugins at all, so nothing's virtual. It's a few calls to percentDecode() and a few string allocations.
I think there's room for optimization there (scheme should be a smallstring for one) but wouldn't go hacking around it unless it turns up on a profile.
================
Comment at: clangd/FileDistance.h:56
+
+ FileDistance(llvm::StringMap<unsigned> Roots,
+ const FileDistanceOptions &Opts = {});
----------------
ioeric wrote:
> nit: It's unclear what the values of Roots are (it can be confused with tree roots or directory roots).
Changed roots -> sources everywhere, which is less ambiguous.
================
Comment at: clangd/Headers.h:62
+ std::vector<Inclusion> MainFileIncludes;
+ llvm::StringMap<unsigned> includeDepth(llvm::StringRef MainFileName) const;
+
----------------
ioeric wrote:
> Does this return all transitive includes in the entire TU or just those from `MainFileName`? And what's `MainFileName` here? Is it the same as the main file in a TU?
It's transitive... there was actually a comment but it was in the wrong place!
Wrote a better comment and generalized slightly to make this seem less magic. You can actually ask for the files reachable starting at any file (MainFile is usually the one you want though).
================
Comment at: clangd/Headers.h:67
+ llvm::StringRef IncludedName,
+ llvm::StringRef IncludedRealName);
+
----------------
ioeric wrote:
> The real name can be empty. How do we handle empty path?
We attempt to record it every time we see it.
Fixed includeDepth() to exclude files from the output if we never get a real name for them.
This shouldn't be the case in practice - real name is missing for FileEntry when using a preamble, but we should see their names when *building* the preamble.
================
Comment at: unittests/clangd/FileDistanceTests.cpp:36
+ // Ancestor (up+up+up+up)
+ EXPECT_EQ(D.distance("/"), 22u);
+ // Sibling (up+down)
----------------
ioeric wrote:
> I think this is an interesting case. IIUC, 22u is contributed by 4 ups (4*5) and 1 include (1*2, via `llvm/ADT/StringRef.h`).
>
> Suppose `a/b/c/d/e/f.cc` includes `base/x.h`, then the shortest path into directory `other/` is likely to go via `base/x.h`. This might become a problem if `base/x.h` is very commonly used. One solution is probably to penalize edits that hit the root.
Yeah, this could be a weakness. I don't think this is specific to #includes, or to root directories.
I think the generalization is that there exist directories (like /usr/include) whose siblings are unrelated. Down-traversals in these directories should be more expensive. I've added a Caveats section to the FileDistance documentation, but I would like to avoid adding special cases until we can see how much difference they make.
Repository:
rCTE Clang Tools Extra
https://reviews.llvm.org/D48441
More information about the cfe-commits
mailing list