[PATCH] D142092: [include-mapping] Allow multiple headers for the same symbol. Choose the first header of available ones.

Viktoriia Bakalova via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jan 27 08:43:50 PST 2023


VitaNuo marked an inline comment as done.
VitaNuo added a comment.

Thanks for the comments! AFAICS I've addressed all of them.
Re tests, thanks for the reminder @hokein! I've fixed them now, everything is green.



================
Comment at: clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc:205
 SYMBOL(basic_syncbuf, std::, <syncstream>)
 SYMBOL(begin, std::, <iterator>)
 SYMBOL(bernoulli_distribution, std::, <random>)
----------------
kadircet wrote:
> i think we should have other providers here, https://en.cppreference.com/w/cpp/iterator/begin. do we know why they're dropped?
Yes, this list is the regeneration of the 2018 html book. It's explicitly _not_ generated from the 2022 book in order to have a clear diff related to the topic of this patch.
In the 2018 book, the only provider for `std::begin` is `iterator`.


================
Comment at: clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc:1162
 SYMBOL(subtract_with_carry_engine, std::, <random>)
+SYMBOL(swap, std::, <algorithm>)
 SYMBOL(swap_ranges, std::, <algorithm>)
----------------
hokein wrote:
> Is this intended? it seems to me the cppparser doesn't handle this case correctly, in the swap symbol page, we have two headers in a single `t-dsc-header` tr, the parser should consider both (like the above `size_t`).
> 
> ```
> Defined in header <algorithm>
> Defined in header <utility>
> ```
You're right. There's another random class name (`t-dcl-rev-aux`) that needs to be skipped. I've changed the condition in line 81 of cppreference_parser.py (the loop that skips unneeded rows between the header block and the symbol block) to negation in order to make it more robust and avoid listing all these unneeded classes unnecessarily.  


================
Comment at: clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp:57
+    // ordered alpahbetically in the map.
+    unsigned SymIndex = NextAvailSymIndex;
+    if (NextAvailSymIndex > 0 &&
----------------
hokein wrote:
> We can avoid a local variable by
> 
> ```
> auto Add = [.. SymIndex(-1)] (....) {
> if (SymIndex >=0 &&
>     SymbolNames[SymIndex].first == NS && ...) {
> } else {
>     // the first symbol, or new symbol.
>     ++SymIndex;
> }
> 
> SymbolNames[SymIndex] = {NS, Name};
> ....
> }
> ```
Ok, sure.


================
Comment at: clang/tools/include-mapping/cppreference_parser.py:39
 
-def _ParseSymbolPage(symbol_page_html, symbol_name):
+def _ParseSymbolPage(symbol_page_html, symbol_name, header_to_accept):
   """Parse symbol page and retrieve the include header defined in this page.
----------------
hokein wrote:
> If the `header_to_accept` is set, I think we can just return {header_to_accept} directly.
We'd need to add `<>` then. It needs a couple of extra lines here or there anyways, I don't think it makes a difference.


================
Comment at: clang/tools/include-mapping/cppreference_parser.py:96
+        if symbol_name in found_symbols:
+          # FIXME: only update symbol headers on first symbol discovery, assume
+          # same symbol occurence in a subsequent header block is an overload.
----------------
hokein wrote:
> IIUC, this is the major and intended change of the function, this is not a FIXME.
I guess it depends on how you view it: I meant it as FIXME in the context of overloads, i.e., we might want to remove this and add infrastructure to differentiate between overloads explicitly. 
I can remove the FIXME part, no problem.


================
Comment at: clang/tools/include-mapping/cppreference_parser.py:105
+
+  # If the symbol was never named, consider all named headers.      
+  # FIXME: only consider first symbol, assume further symbols are overloads
----------------
hokein wrote:
> The comment doesn't match the current behavior. IIUC, iof the symbol was never named, we only consider the first header now.
I've removed all the hacking with `all_headers` now, so the comment is valid again.


================
Comment at: clang/tools/include-mapping/cppreference_parser.py:107
+  # FIXME: only consider first symbol, assume further symbols are overloads
+  all_headers = sorted(list(all_headers))
+  if len(all_headers) > 0:
----------------
hokein wrote:
> why sort the headers?
This was a hack to make the generator return `atomic` rather than `memory` for `std::atomic.*` symbols. But I've noticed now that this adds some trouble to the C symbol map. I think now that cutting `all_headers` to the first element is wrong.
I will special-case the `std::atomic.*` symbols instead, because so far I don't see a way to solve this programmatically without other collateral effects.

Note that the problem is not `std::atomic` itself (for which Kadir extensively described the solution in some of the other comments), but rather `atomic_bool`, `atomic_char` etc. mentioned further in the page. The headers for them come from `all_headers` rather than `symbol_headers`, and there seems to be no way to limit them to `<atomic>` without other collateral effects.


================
Comment at: clang/tools/include-mapping/cppreference_parser.py:165
       # FIXME: use these as a fallback rather than ignoring entirely.
-      variants_for_symbol = variants_to_accept.get(
-          (namespace or "") + symbol_name, ())
-      if variant and variant not in variants_for_symbol:
+      header_to_accept = variants_to_accept.get(
+          (namespace or "") + symbol_name, "")
----------------
kadircet wrote:
> `variant` is not necessarily the `header`, eg:
> ```
> acos()
> acos<>() (std::complex) (since C++11)
> acos<>() (std::valarray)
> ```
> 
> in this case variants are `std::complex` and `std::valarray`. 
> 
> hence we're not trying to "infer" the header we want to preserve but rather decide on which symbol page we want to parse. we should still accept "all the headers" mentioned in that variant symbol page.
Ah ok, I was mostly looking at `std::remove (algorithm)`. Ok, I will use the term "variant" as before. Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142092/new/

https://reviews.llvm.org/D142092



More information about the cfe-commits mailing list