[PATCH] D142092: [include-mapping] Allow multiple headers for the same symbol. Choose the first header of available ones.

Viktoriia Bakalova via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 31 04:26:12 PST 2023


VitaNuo added a comment.

Thanks for the comments!



================
Comment at: clang/tools/include-mapping/cppreference_parser.py:196
+      "std::remove$": "algorithm",
+      "std::atomic.*": "atomic",
+
----------------
kadircet wrote:
> there's no variant of "std::atomic.*" called "atomic", in https://en.cppreference.com/w/cpp/symbol_index. is it something in 2018 version of the html book? otherwise there's only the unnamed variant and std::shared_ptr variant, and we should be preferring unnamed variant.
This is obsolete. As discussed offline, I've removed all the special-casing logic. All the variants are skipped now without exceptions.


================
Comment at: clang/tools/include-mapping/cppreference_parser.py:107
+  # FIXME: only consider first symbol, assume further symbols are overloads
+  all_headers = sorted(list(all_headers))
+  if len(all_headers) > 0:
----------------
kadircet wrote:
> VitaNuo wrote:
> > hokein wrote:
> > > why sort the headers?
> > This was a hack to make the generator return `atomic` rather than `memory` for `std::atomic.*` symbols. But I've noticed now that this adds some trouble to the C symbol map. I think now that cutting `all_headers` to the first element is wrong.
> > I will special-case the `std::atomic.*` symbols instead, because so far I don't see a way to solve this programmatically without other collateral effects.
> > 
> > Note that the problem is not `std::atomic` itself (for which Kadir extensively described the solution in some of the other comments), but rather `atomic_bool`, `atomic_char` etc. mentioned further in the page. The headers for them come from `all_headers` rather than `symbol_headers`, and there seems to be no way to limit them to `<atomic>` without other collateral effects.
> > I think now that cutting all_headers to the first element is wrong.
> 
> What's exactly breaking once you do that? (I guess by `first element` you still mean "all the headers we've seen until the first declaration in the page")
> 
> >  but rather atomic_bool, atomic_char etc. mentioned further in the page. The headers for them come from all_headers rather than symbol_headers, and there seems to be no way to limit them to <atomic> without other collateral effects.
> 
> I guess this is happening because we're trying to find headers that are for the declaration block containing the interesting symbol name. Any idea if that logic is winning us anything? e.g. if we dropped the `        if symbol_name in found_symbols:` what does change?
> I guess by first element you still mean "all the headers we've seen until the first declaration in the page"

No, I mean the first element of the `all_headers` collection. `all_headers` is a collection that collects _all_ the headers in the page and, in case a symbol is not defined in a table with an individual header block, `all_headers` in the page are returned as an approximation.

> What's exactly breaking once you do that?
It is essentially arbitrary to limit `all_headers` to the first element (at the moment it would be the first element alphabetically). E.g., for the `INT8_C` and other constants [here](https://en.cppreference.com/w/cpp/types/integer), the result would be `inttypes.h` and `stdint.h` would be cut, so this result doesn't make sense.

> if we dropped the  `if symbol_name in found_symbols` what does change?
Then all the headers in the table will be attributed to all the symbols in the table. Consider [this page](https://en.cppreference.com/w/cpp/numeric/math/div): `std::imaxdiv` would be reported as defined both in `cstdlib` and `cinttypes`, which is wrong. 

> I guess this is happening because we're trying to find headers that are for the declaration block containing the interesting symbol name.

Not really.  Headers for `atomic_bool`, `atomic_char` etc. come from `all_headers` rather than `symbol_headers`. The reason for that is they are defined in a table which doesn't have its own header block, so we are trying to approximate the headers by naming all the headers in the file.


================
Comment at: clang/tools/include-mapping/cppreference_parser.py:165
       # FIXME: use these as a fallback rather than ignoring entirely.
-      variants_for_symbol = variants_to_accept.get(
-          (namespace or "") + symbol_name, ())
-      if variant and variant not in variants_for_symbol:
+      header_to_accept = variants_to_accept.get(
+          (namespace or "") + symbol_name, "")
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > `variant` is not necessarily the `header`, eg:
> > > ```
> > > acos()
> > > acos<>() (std::complex) (since C++11)
> > > acos<>() (std::valarray)
> > > ```
> > > 
> > > in this case variants are `std::complex` and `std::valarray`. 
> > > 
> > > hence we're not trying to "infer" the header we want to preserve but rather decide on which symbol page we want to parse. we should still accept "all the headers" mentioned in that variant symbol page.
> > Ah ok, I was mostly looking at `std::remove (algorithm)`. Ok, I will use the term "variant" as before. Thanks.
> sorry i wasn't just trying to nitpick on the name of the variable/parameters.
> 
> i was trying to say `_ReadSymbolPage` should not be a function of `variant_to_accept`.
> we should "completely skip" parsing of symbol pages for variants we don't want to take in (that's what this `continue` here is trying to achieve)
> 
> later on when we're parsing a symbol page, the name of the variant shouldn't effect the headers we're picking in any way (especially in the way you're relying right now that says variant name and the header name will be the same, as i pointed out this might as well be `std::complex`).
> 
> Is this trying to work around a different problem? (especially around `std::atomic_.*` I assume, i'll follow up on that thread)
This is obsolete. As discussed offline, I've removed all the special-casing logic. All the variants are skipped now without exceptions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142092/new/

https://reviews.llvm.org/D142092



More information about the cfe-commits mailing list