[PATCH] D58345: [clangd] Using symbol name to map includes for STL symbols.

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 12 07:24:26 PDT 2019


ioeric added inline comments.


================
Comment at: clangd/StdGen/StdGen.py:2
+#!/usr/bin/env python
+#===- StdGen.py -  -------------------------------------------*- python -*--===#
+#
----------------
I'd avoid abbreviation in the file name and the new directory name. `StdGen` sounds a lot like a library ;)

I'd suggest something like `std-include-mapping/generate.py`


================
Comment at: clangd/StdGen/StdGen.py:14
+
+Caveats and FIXMEs:
+  - only symbols directly in "std::namespace" are added
----------------
we should also mention what the tool's behaviors are for these cases.


================
Comment at: clangd/StdGen/StdGen.py:15
+Caveats and FIXMEs:
+  - only symbols directly in "std::namespace" are added
+  - symbols with multiple variants, or defined in multiple headers aren't added
----------------
nit: s/"std::namespace"/std:: namespace/?


================
Comment at: clangd/StdGen/StdGen.py:16
+  - only symbols directly in "std::namespace" are added
+  - symbols with multiple variants, or defined in multiple headers aren't added
+
----------------
it's not trivial what these are. could provide an example here. 


================
Comment at: clangd/StdGen/StdGen.py:18
+
+Usage:
+  1. Install BeautifulSoup dependency
----------------
Suggestion: it would be nice to provide the common commands for each step. E.g. `pip install <beautiful soup>`. 


================
Comment at: clangd/StdGen/StdGen.py:23
+  3. Run the command:
+       StdGen.py -cppreference-doc <path/to/cppreference-doc> > StdGen.h
+"""
----------------
what is the `<path/to/cppreference-doc>` here? is it the downloaded zip, the unzipped (e.g. `cppreference-doc-20181028/`), or some specific html doc in the directory?


================
Comment at: clangd/StdGen/StdGen.py:23
+  3. Run the command:
+       StdGen.py -cppreference-doc <path/to/cppreference-doc> > StdGen.h
+"""
----------------
ioeric wrote:
> what is the `<path/to/cppreference-doc>` here? is it the downloaded zip, the unzipped (e.g. `cppreference-doc-20181028/`), or some specific html doc in the directory?
Again, `StdGen` is not great name. I'd suggest `StdHeaderMapping`.

nit: s/.h/.inc/


================
Comment at: clangd/StdGen/StdGen.py:32
+
+STDGEN_CODE = """\
+//===-- StdGen'erated file --------------------------------------*- C++ -*-===//
----------------
nit: maybe `STDGEN_CODE_HEADER`?


================
Comment at: clangd/StdGen/StdGen.py:48
+def ParseSymbolPage(symbol_page_html):
+  """Parse the symbol page and retrieve the include header defined in this page.
+  Returns a list of headers.
----------------
could you provide a sample html snippet in the comment? it would make the code easier to understand. thanks!

(same for `ParseIndexPage`)


================
Comment at: clangd/StdGen/StdGen.py:94
+  cpp_symbol_root = os.path.join(cpp_reference_root, "en", "cpp")
+  if not os.path.exists(cpp_reference_root):
+    exit("Path %s doesn't exist!" % cpp_reference_root)
----------------
why not check `exists(index_page_path)` instead?


================
Comment at: clangd/StdGen/StdGen.py:116
+    for name, headers in sorted(symbols.items(), key=lambda t : t[0]):
+      # FIXME: support ambiguous symbols
+      if len(headers) > 1:
----------------
move this FIXME into the `if` body to make it clearer that this is for `len(headers) > 1`. 

nit: it's not very clear whether `ambiguous symbols` = `symbols with multiple headers`. Maybe be more specific?


================
Comment at: clangd/StdSymbolMap.inc:8
+//===----------------------------------------------------------------------===//
+// Used to build a lookup table (qualified names => include headers) for C++
+// Standard Library symbols.
----------------
can we include the information about the HTML archive (e.g. STL version) from which this table is generated? This would be useful for future maintenance.


================
Comment at: clangd/StdSymbolMap.inc:11
+//
+// Automatically generated file, do not edit.
+//===----------------------------------------------------------------------===//
----------------
Add a pointer to the python script?


================
Comment at: clangd/index/CanonicalIncludes.cpp:123
 
   static const std::vector<std::pair<const char *, const char *>>
       SystemHeaderMap = {
----------------
hokein wrote:
> sammccall wrote:
> > hokein wrote:
> > > ioeric wrote:
> > > > Can we remove the suffix header mapping now? Is it for the `std::chrono::` symbols? What are the reasons not to include them in this patch? 
> > > Ideally, we could remove it once we get a perfect symbol mapping but I'd prefer to keep it (it serves as a fallback when we don't find symbols in the symbol mapping), so that we could add new symbol mapping increasingly without changing the current behavior.
> > > 
> > > Removing it should be done carefully without introducing regression, and is outside the scope of this patch.
> > We do need fallback heuristics. We should conisder cases:
> >  - for known `std::` symbols mapped to one system header, we don't need a fallback
> >  - for known `std::` mapped to multiple system headers (`std::move`), we need some fallback. There are probably few enough of these that it doesn't justify listing *all* system header paths.
> >  - for known `std::` symbols with a primary template in one file and specializations/overloads in another (swap?) always inserting the primary seems fine 
> >  - For "random" unknown symbols from system header `foo/bits/_bar.h`, we can not insert, correct to `<bar>`, or use the full path name. I don't have a strong preference here, maybe we should do what's easiest.
> >  - for c standard library (`::printf` --> `<stdio.h>` etc) we probably want mappings just like in this patch, I think cppreference has them.
> >  - other "standardish" headers (POSIX etc) - hmm, unclear.
> > 
> > Thinking about all these cases, I'm thinking the nicest solution would be having the symbol -> header mapping, having a small (symbol,header) -> header mapping **only** for ambiguous cases, and not inserting "system" headers that aren't in the main mapping at all. Then we can improve coverage of posix, windows etc headers over time.
> > Question is, how can we recognize "system" headers?
> I think we were talking about C++ std symbols.
> 
> > for known std:: symbols mapped to one system header, we don't need a fallback
> > for known std:: mapped to multiple system headers (std::move), we need some fallback. There are probably few enough of these that it doesn't justify listing *all* system header paths.
> > for known std:: symbols with a primary template in one file and specializations/overloads in another (swap?) always inserting the primary seems fine
> 
> As discussed in this patch, we have other ways to disambiguate these symbols (rather than relying on the header mapping), so it is possible to remove STL headers mapping, but we'll lose correct headers for STL implement-related symbols (e.g. `__hash_code`), ideally we should drop these symbols in the indexer...
> 
> > for c standard library (::printf --> <stdio.h> etc) we probably want mappings just like in this patch, I think cppreference has them.
> 
> Yes, cppreference has them.
> 
> > other "standardish" headers (POSIX etc) - hmm, unclear.
> 
> I think we should still keep the header mapping. Not sure whether there are some sources like cppreference that list all symbols.
> 
> 
So, do we have a plan on how to handle ambiguous symbols properly? If so, could you please summarize this in the FIXME comment so that we don't lose track of it?

> I think we should still keep the header mapping. Not sure whether there are some sources like cppreference that list all symbols.
What's the reasoning?

I am +1 on Sam's idea about skipping include insertion for std:: symbols that are not on cppreference. It's fine to keep the suffix header mapping in the short term while we are working out ambiguous symbols. But I don't think we would want to maintain both mappings in the long term. 

Could you maybe add comment for the header mapping indicating that they are now the fallback mapping for std:: symbols?


Repository:
  rCTE Clang Tools Extra

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58345/new/

https://reviews.llvm.org/D58345





More information about the cfe-commits mailing list