[PATCH] D42640: [clangd] collect symbol #include & insert #include in global code completion.

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Feb 12 10:13:53 PST 2018


ioeric added inline comments.


================
Comment at: clangd/ClangdServer.cpp:370
+/// File, by matching \p Header against all include search directories for \p
+/// File via `clang::HeaderSearch`.
+///
----------------
sammccall wrote:
> ioeric wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > maybe drop "via clang::HeaderSearch" (it's doc'd in the implementation) and add an example?
> > > > 
> > > > It might be worth explicitly stating that the result is an #include string (with quotes) - you just call it a "path" here.
> > > > 
> > > > "shortest" makes sense as a first implementation, but we may want to document something more like "best" - I know there are codebases we care about where file-relative paths are banned. Also I suspect we don't give "../../../../Foo.h" even if it's shortest :-)
> > > > "shortest" makes sense as a first implementation, but we may want to document something more like "best" - I know there are codebases we care about where file-relative paths are banned. Also I suspect we don't give "../../../../Foo.h" even if it's shortest :-)
> > > 
> > > I think this part wasn't addressed
> > I added a comment for this in the header. But I might be misunderstanding you suggestion. Did you mean we need a better name for the function?
> I think the function name is fine as a shorthand, just that the comment is a bit overspecified and possibly inaccurate compared to e.g. "Determines the preferred way to #include a file, taking into account the search path. Usually this will prefer a shorter representation like 'Foo/Bar.h' over a longer one like 'Baz/include/Foo/Bar.h'".
> 
> I don't think the `../../../Foo.h` case is worth explicitly calling out - I just meant it as an example of why we *don't* want an overly-specific contract here.
I see. Thanks!


================
Comment at: clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp:99
+    CollectorOpts.CollectIncludePath = true;
+    auto Includes = llvm::make_unique<CanonicalIncludes>();
+    addStandardLibraryMapping(Includes.get());
----------------
sammccall wrote:
> Do we create one of these per TU or per thread? The former is "clean" but seems potentially wasteful (compiling all those system header regexes for each TU). The latter is "fast" but potentially non-hermetic (can't think of a triggering case though).
> 
> Maybe we should have a split between transient mappings (IWYU) and permanent ones?
This is created per TU now. In an earlier revision, this was one-per-program because we statically constructed a regex map and passed the map into the `CanonicalIncludes` via the constructor, like we do in include-fixer (https://github.com/llvm-mirror/clang-tools-extra/blob/master/include-fixer/find-all-symbols/STLPostfixHeaderMap.cpp#L15).  But with the current  design, I didn't do this because one-per-TU approach seems to be cleaner, and the regex construction time seems to be relatively small comparing to the time spent on actually compiling a TU.

If we really want to squeeze performance here, we would probably need either an interface that takes a static regex mapping or one that takes pre-computed `llvm::Regex`. But I'm not really sure if it would be worth it since this is not a performance critical code path. 


================
Comment at: clangd/index/CanonicalIncludes.cpp:74
+          {"include/__stddef_max_align_t.h$", "<cstddef>"},
+          {"include/__wmmintrin_aes.h$", "<wmmintrin.h>"},
+          {"include/__wmmintrin_pclmul.h$", "<wmmintrin.h>"},
----------------
sammccall wrote:
> these aren't standard library - deserves a comment?
> 
> in general it looks like there's a bunch of standard library stuff, also posix stuff, and some compiler intrinsics.
> If this is the case, maybe "system headers" is a better descriptor than "standard library".
> 
> Can we document approximately which standard libraries, which compiler extensions, and other standards (posix, but I guess windows one day) are included?
Switched both function and variable to "system headers" and updated the documentation.

 


================
Comment at: clangd/index/CanonicalIncludes.h:10
+//
+// This file defines functionalities for remapping #include header for an index
+// symbol. For example, we can collect a mapping accoring to IWYU pragma
----------------
sammccall wrote:
> Can we split out the main ideas a bit? I think these are: a) what include mapping is, b) IWYU pragmas, c) standard library.
> We should also probably call out the relationship with the stuff in clangd/Headers.h.
> 
> e.g.
> ```At indexing time, we decide which file to #included for a symbol.
> Usually this is the file with the canonical decl, but there are exceptions:
> 
> - private headers may have pragmas pointing to the matching public header.
>   (These are "IWYU" pragmas, named after the include-what-you-use tool).
> - the standard library is implemented in many files, without any pragmas. 
>   We have a lookup table for common standard library implementations.
>   libstdc++ puts char_traits in bits/char_traits.h, but we #include <string>.
> 
> The insert-time logic in clang/Headers.h chooses how to spell the
> filename we emit here; this depends on the search path at the insertion site.
> ```
> 
> (I think .inc files are conceptually similar and should also be handled and mentioned here, commented below)
Thanks for the suggestion!

> We should also probably call out the relationship with the stuff in clangd/Headers.h.
I'm a bit inclined to not call out the relationship here as this doesn't seem to be a better place than the code where they interact, and the doc could easily be outdated if the interaction is changed from other libraries. 


================
Comment at: clangd/index/CanonicalIncludes.h:52
+  /// a canonical header name.
+  llvm::StringRef mapHeader(llvm::StringRef Header) const;
+
----------------
sammccall wrote:
> So I'm a bit concerned this is too narrow an interface, and we really want to deal with SourceLocation here which would give us the include stack.
> 
> Evidence #1: .inc handling really is the same job, but because this class has a single-file interface, we have to push it into the caller.
> Evidence #2: I was thinking about a more... principled way of determining system headers. One option would be to have a whitelist of public headers, and walk up the include stack to see if you hit one. (I'm not 100% sure this would work, but still...) This isn't implementable with the current interface.
> 
> One compromise would be to pass in a stack<StringRef> or something. Efficiency doesn't really matter because the caller can cache based on the top element.
> Evidence #1: .inc handling really is the same job, but because this class has a single-file interface, we have to push it into the caller.
I think this would depend on how you define the scope of this class. `.inc` handling is a subtle case that I'm a bit hesitated to build into the interface here.

> Evidence #2: ....
This is actually very similar to how the hardcoded mapping was generated. I had a tool that examined include stacks for a library (e.g. STL) and applied a similar heuristic - treat the last header in the include stack within the library boundary as the "exporting" public header for a leaf include header, if there is no other public header that has shorter distance to that include. For example, if we see a stack like `stl/bits/internal.h -> stl/bits/another.h -> stl/public_1.h -> user_code.cc`, we say `public_1.h` exports `bits/internal.h` and add a mapping from `bits/internal.h$` to `public_1.h`. But if we see another (shorter) include stack like `stl/bits/internal.h -> stl/public_2.h -> user_code.cc`, we say `stl/public_2.h` exports `stl/bits/internal.h`. This heuristic works well for many cases. However, this may produce wrong mappings when an internal header is used by multiple public headers. For example, if we have two include stacks with the same length e.g. `bits/internal.h -> public_1.h -> user.cc` and `bits/inernal.h -> public_2.h -> user.cc`, the result mapping would depend on the order in which we see these two stacks; thus, we had to do some manual adjustment to make sure bits/internal.h is mapped to the correct header according to the standard.

I am happy to discuss better solution here. But within the scope of this patch, I'd prefer to stick to interfaces that work well for the current working solution instead of designing for potential future solutions. I should be easy to iterate on the interfaces as these interfaces aren't going to be widely used in clangd after all. WDYT?


================
Comment at: clangd/index/CanonicalIncludes.h:56
+  /// A string-to-string map saving the mapping relationship.
+  llvm::StringMap<std::string> HeaderMappingTable;
+
----------------
sammccall wrote:
> do we need both tables? it seems the system headers (which are regex-based) will typically outnumber the IWYU pragmas by a bunch.
> And if we care about performance at all, we can cache the results of mapHeader?
Makes sense.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D42640





More information about the cfe-commits mailing list