[PATCH] D142092: [include-mapping] Allow multiple headers for the same symbol. Choose the first header of available ones.

Haojian Wu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 24 05:22:23 PST 2023


hokein added inline comments.


================
Comment at: clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc:106
-SYMBOL(atomic_exchange_explicit, std::, <atomic>)
-SYMBOL(atomic_fetch_add, std::, <atomic>)
-SYMBOL(atomic_fetch_add_explicit, std::, <atomic>)
----------------
Looks like the regex filters too many symbols -- e.g. atomic_fetch_add is not a variant symbol, we should keep it instead. The same to other following symbols as well.


================
Comment at: clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc:100
 SYMBOL(atoll, std::, <cstdlib>)
+SYMBOL(atomic, std::, <atomic>)
+SYMBOL(atomic, std::, <memory>)
----------------
VitaNuo wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > hokein wrote:
> > > > Conceptually, this (and other `atomic_*` symbols) doesn't feel correct:
> > > > - `<atomic>` provides the generic template `template<class T> struct atomic;`
> > > > -  `<memory>` provides partial template specializations for `std::shared_ptr` and `std::weak_ptr`  
> > > > 
> > > > They are variant symbols (ideally, they should be treat as the `std::move()`). The issue here is that unlike `std::move` which has two individual entries in the index page, we only have one entry for `std::atomic` (extending the cppreference_parser.py script to perfectly distinguish these two cases in the [page](https://en.cppreference.com/w/cpp/atomic/atomic) seems non-trivial).  Some options:
> > > > 
> > > > 1) treat them as multiple-header symbols (like what this patch does now)
> > > > 2) special-case these symbols like `std::move()`
> > > > 3) always prefer the header providing generic templates
> > > > 
> > > > @kadircet, what do you think?
> > > right, i believe this patch is definitely adding keeping multiple headers for a symbol around, but mixing it with keeping headers for variants (e.g. overloads provided by different headers, or specializations as mentioned here).
> > > 
> > > we definitely need some more changes in parser to make sure we're only preserving alternatives for **the same symbol** and not for any other variants (overloads, specializations). IIRC there are currently 2 places these variants could come into play:
> > > - first is the symbol index page itself, symbols that have ` (foo)` next to them have variants and should still be ignored (e.g. std::remove variant of cstdio shouldn't be preserved in the scope of this patch)
> > > - second is inside the detail pages for symbols, e.g. [std::atomic](http://go/std::atomic), we can have headers for different declarations, they're clearly different variants so we shouldn't add such symbols into the mapping `_ParseSymbolPage` already does some detection of declarations in between headers.
> > > 
> > > in the scope of this patch, we should keep ignoring both.
> > I suggest to special-case the overloads for now, just not to solve all the problems in the same patch.
> The first group are already ignored. We need to take a lot at how to ignore the second one.
Ideally, we should tweak the `_ParseSymbolPage` to handle this case, but I don't see a easy way to do it (the `atomic` [case](https://en.cppreference.com/w/cpp/atomic/atomic) is quite tricky where the symbol name is identical, only the template argument is different, and the simple text-match heuristic in `_ParseSymbolPage` fails in this case).

so specialcasing these (there are not too many of them) looks fine to me.


================
Comment at: clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp:58
+    int SymIndex = NextAvailSymIndex;
+    if (NSSymbolMap *NSSymbols = NamespaceSymbols->lookup(NS)) {
+      auto It = NSSymbols->find(Name);
----------------
VitaNuo wrote:
> hokein wrote:
> > Given the fact that multiple-header symbols are grouped in the .inc file, we could simplify the code of checking a new symbol by looking at the last available SymIndex:
> > 
> > ```
> > if (NextAvailSymIndex > 0 && SymbolNames[NextAvailSymIndex-1].first == NS && SymbolNames[NextAvailSymIndex-1].second == Name) {
> >    // this is not a new symbol.
> > }
> > ```
> I know this is easier in terms of code, but this heavily relies on the order in the generated files. I would prefer to keep the library and the generator as decoupled as possible, even if it means slightly more complex code here. Overall, it seems more future-proof in case of unrelated generator changes (bugs?) that might change the order.
> but this heavily relies on the order in the generated files.

Yeah, this is true. I actually think we should make it as an invariant (multiple-header symbols are grouped) of the generated .inc files. This invariant is important and useful, it is much easier for human to read and justify. We can probably guarantee it in the generator side.


================
Comment at: clang/tools/include-mapping/gen_std.py:112
   for symbol in symbols:
-    if len(symbol.headers) == 1:
-      # SYMBOL(unqualified_name, namespace, header)
-      print("SYMBOL(%s, %s, %s)" % (symbol.name, symbol.namespace,
-                                    symbol.headers[0]))
-    elif len(symbol.headers) == 0:
+    if re.match("^div$|^remove$|atomic.*", symbol.name) and symbol.namespace == "std::":
+      continue
----------------
I'd use a list of exact symbol names (the number of them should not be huge), to avoid filtering out symbols by accident (see my other comment).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142092/new/

https://reviews.llvm.org/D142092



More information about the cfe-commits mailing list