[PATCH] D42640: [clangd] collect symbol #include & insert #include in global code completion.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Feb 12 07:34:22 PST 2018


sammccall added a comment.

Insertion still LG (couple of nits, inline).

For indexing, my biggest questions:

- I worry CanonicalIncludes doesn't get enough information to make good decisions - passing the include stack in some form may be better
- CanonicalIncludes has slightly weird patterns of reads/writes - most writes are permanent and a few are transient, and it's not totally obvious how it works in a multithreading context (though your code is correct). I'm not sure whether this is worth fixing.



================
Comment at: clangd/ClangdServer.cpp:370
+/// File, by matching \p Header against all include search directories for \p
+/// File via `clang::HeaderSearch`.
+///
----------------
ioeric wrote:
> sammccall wrote:
> > sammccall wrote:
> > > maybe drop "via clang::HeaderSearch" (it's doc'd in the implementation) and add an example?
> > > 
> > > It might be worth explicitly stating that the result is an #include string (with quotes) - you just call it a "path" here.
> > > 
> > > "shortest" makes sense as a first implementation, but we may want to document something more like "best" - I know there are codebases we care about where file-relative paths are banned. Also I suspect we don't give "../../../../Foo.h" even if it's shortest :-)
> > > "shortest" makes sense as a first implementation, but we may want to document something more like "best" - I know there are codebases we care about where file-relative paths are banned. Also I suspect we don't give "../../../../Foo.h" even if it's shortest :-)
> > 
> > I think this part wasn't addressed
> I added a comment for this in the header. But I might be misunderstanding you suggestion. Did you mean we need a better name for the function?
I think the function name is fine as a shorthand, just that the comment is a bit overspecified and possibly inaccurate compared to e.g. "Determines the preferred way to #include a file, taking into account the search path. Usually this will prefer a shorter representation like 'Foo/Bar.h' over a longer one like 'Baz/include/Foo/Bar.h'".

I don't think the `../../../Foo.h` case is worth explicitly calling out - I just meant it as an example of why we *don't* want an overly-specific contract here.


================
Comment at: clangd/CodeComplete.cpp:290
+        // We only insert #include for items with details, since we can't tell
+        // weather the file URI of the canonical declaration would be the
+        // canonical #include without checking IncludeHeader in the detail.
----------------
whether


================
Comment at: clangd/CodeComplete.cpp:301
+          Command Cmd;
+          // Command title is not added since this is not a user-facting
+          // command.
----------------
user-facing. unwrap?


================
Comment at: clangd/clients/clangd-vscode/package.json:1
 {
     "name": "vscode-clangd",
----------------
phab says you have ws-only changes in this file, which you might want to revert


================
Comment at: clangd/global-symbol-builder/CMakeLists.txt:1
 include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../)
 
----------------
(and here - ws-only changes?)


================
Comment at: clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp:99
+    CollectorOpts.CollectIncludePath = true;
+    auto Includes = llvm::make_unique<CanonicalIncludes>();
+    addStandardLibraryMapping(Includes.get());
----------------
Do we create one of these per TU or per thread? The former is "clean" but seems potentially wasteful (compiling all those system header regexes for each TU). The latter is "fast" but potentially non-hermetic (can't think of a triggering case though).

Maybe we should have a split between transient mappings (IWYU) and permanent ones?


================
Comment at: clangd/index/CanonicalIncludes.cpp:51
+                               PP.getSourceManager(), PP.getLangOpts());
+      size_t Pos = Text.find(IWYUPragma);
+      if (Pos == StringRef::npos)
----------------
I think this is

if (!Text.consume_front(IWYUPragma))
 return false; 


================
Comment at: clangd/index/CanonicalIncludes.cpp:72
+  static const std::vector<std::pair<const char *, const char *>>
+      STLPostfixHeaderMap = {
+          {"include/__stddef_max_align_t.h$", "<cstddef>"},
----------------
"STL" :-)


================
Comment at: clangd/index/CanonicalIncludes.cpp:74
+          {"include/__stddef_max_align_t.h$", "<cstddef>"},
+          {"include/__wmmintrin_aes.h$", "<wmmintrin.h>"},
+          {"include/__wmmintrin_pclmul.h$", "<wmmintrin.h>"},
----------------
these aren't standard library - deserves a comment?

in general it looks like there's a bunch of standard library stuff, also posix stuff, and some compiler intrinsics.
If this is the case, maybe "system headers" is a better descriptor than "standard library".

Can we document approximately which standard libraries, which compiler extensions, and other standards (posix, but I guess windows one day) are included?


================
Comment at: clangd/index/CanonicalIncludes.h:10
+//
+// This file defines functionalities for remapping #include header for an index
+// symbol. For example, we can collect a mapping accoring to IWYU pragma
----------------
Can we split out the main ideas a bit? I think these are: a) what include mapping is, b) IWYU pragmas, c) standard library.
We should also probably call out the relationship with the stuff in clangd/Headers.h.

e.g.
```At indexing time, we decide which file to #included for a symbol.
Usually this is the file with the canonical decl, but there are exceptions:

- private headers may have pragmas pointing to the matching public header.
  (These are "IWYU" pragmas, named after the include-what-you-use tool).
- the standard library is implemented in many files, without any pragmas. 
  We have a lookup table for common standard library implementations.
  libstdc++ puts char_traits in bits/char_traits.h, but we #include <string>.

The insert-time logic in clang/Headers.h chooses how to spell the
filename we emit here; this depends on the search path at the insertion site.
```

(I think .inc files are conceptually similar and should also be handled and mentioned here, commented below)


================
Comment at: clangd/index/CanonicalIncludes.h:32
+
+/// \brief CanonicalIncludes collects all remapping header files. This maps
+/// complete header names or header name regex patterns to a given canonical
----------------
nit: \brief needed?

I'm not sure this class actually collects anything - that's the handler returned by collectIWYUHeaderMaps.

Maybe "Maps a definition location onto an #include file, based on a set of filename rules."?


================
Comment at: clangd/index/CanonicalIncludes.h:44
+
+  /// Adds a regex-to-string mapping from \p RE to \p CanonicalPath.
+  void addRegexMapping(llvm::StringRef RE, llvm::StringRef CanonicalPath);
----------------
Maps all files matching \p RE to \p CanonicalPath?


================
Comment at: clangd/index/CanonicalIncludes.h:50
+  /// \param Header A header name.
+  /// \return \p Header itself if there is no mapping for it; otherwise, return
+  /// a canonical header name.
----------------
Just this comment is probably enough for the whole function...


================
Comment at: clangd/index/CanonicalIncludes.h:52
+  /// a canonical header name.
+  llvm::StringRef mapHeader(llvm::StringRef Header) const;
+
----------------
So I'm a bit concerned this is too narrow an interface, and we really want to deal with SourceLocation here which would give us the include stack.

Evidence #1: .inc handling really is the same job, but because this class has a single-file interface, we have to push it into the caller.
Evidence #2: I was thinking about a more... principled way of determining system headers. One option would be to have a whitelist of public headers, and walk up the include stack to see if you hit one. (I'm not 100% sure this would work, but still...) This isn't implementable with the current interface.

One compromise would be to pass in a stack<StringRef> or something. Efficiency doesn't really matter because the caller can cache based on the top element.


================
Comment at: clangd/index/CanonicalIncludes.h:56
+  /// A string-to-string map saving the mapping relationship.
+  llvm::StringMap<std::string> HeaderMappingTable;
+
----------------
do we need both tables? it seems the system headers (which are regex-based) will typically outnumber the IWYU pragmas by a bunch.
And if we care about performance at all, we can cache the results of mapHeader?


================
Comment at: clangd/index/CanonicalIncludes.h:67
+
+/// Returns a CommentHandler that parses pragma comment on include files to
+/// determine when we should include a different header from the header that
----------------
Explicitly mention that the mappings are registered with *Includes?

(interaction isn't totally obvious here because Includes has read and write methods)


================
Comment at: clangd/index/SymbolCollector.h:46
+    /// different #include path.
+    CanonicalIncludes *Includes = nullptr;
   };
----------------
const?


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D42640





More information about the cfe-commits mailing list